You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

180 lines
5.0 KiB

# rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
5 years ago
5 years ago
rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types. rga wraps the awesome [ripgrep] and enables it to search in pdf, docx, sqlite, jpg, movie subtitles (mkv, mp4), etc.
5 years ago
5 years ago
[![github repo](](
[![Linux build status](](
[![fearless concurrency](](
5 years ago
For more detail, see this introductory blogpost:
rga will recursively descend into archives and match text in every file type it knows.
Here is an [example directory]( with different file types:
├── greeting.mkv
├── hello.odt
├── hello.sqlite3
├── dir
│ ├── greeting.docx
│ └── inner.tar.gz
│ └── greeting.pdf
└── greeting.epub
![rga output](doc/demodir.png)
## Available Adapters
rga --rga-list-adapters
- ffmpeg
Uses ffmpeg to extract video metadata/chapters and subtitles
Extensions: .mkv, .mp4, .avi
- pandoc
Uses pandoc to convert binary/unreadable text documents to plain markdown-like text
Extensions: .epub, .odt, .docx, .fb2, .ipynb
- poppler
Uses pdftotext (from poppler-utils) to extract plain text from PDF files
Extensions: .pdf
- zip
Reads a zip file as a stream and recurses down into its contents
Extensions: .zip
- tar
Reads a tar file as a stream and recurses down into its contents
Extensions: .tar, .tar.gz, .tar.bz2, .tar.xz, .tar.zst
- sqlite
Uses sqlite bindings to convert sqlite databases into a simple plain text format
Extensions: .db, .db3, .sqlite, .sqlite3
Mime Types: application/x-sqlite3
The following adapters are disabled by default, and can be enabled using `--rga-adapters=+pdfpages,tesseract`:
- pdfpages
Converts a pdf to it's individual pages as png files. Only useful in combination with tesseract
Extensions: .pdf
- tesseract
Uses tesseract to run OCR on images to make them searchable. May need -j1 to prevent overloading the system. Make sure you have tesseract installed.
Extensions: .jpg, .png
> rga \[FLAGS\] \[OPTIONS\] PATTERN \[PATH ...\]
> Use more accurate but slower matching by mime type
> By default, rga will match files using file extensions. Some programs,
> such as sqlite3, don\'t care about the file extension at all, so users
> sometimes use any or no extension at all. With this flag, rga will try
> to detect the mime type of input files using the magic bytes (similar
> to the \`file\` utility), and use that to choose the adapter.
> Detection is only done on the first 8KiB of the file, since we can\'t
> always seek on the input (in archives).
**-h**, **\--help**
> Prints help information
> List all known adapters
> Disable caching of results
> By default, rga caches the extracted text to a database in
> \~/.cache/rga if it is small enough. This way, repeated searches on
> the same set of files will be much faster. If you pass this flag, all
> caching will be disabled.
> Show help for ripgrep itself
> Show version of ripgrep itself
**-V**, **\--version**
> Prints version information
> Change which adapters to use and in which priority order (descending)
> \"foo,bar\" means use only adapters foo and bar. \"-bar,baz\" means
> use all default adapters except for bar and baz. \"+bar,baz\" means
> use all default adapters and also bar and baz.
> \[default: 12\]
**\--rga-cache-max-blob-len** \<cache-max-blob-len\>
> Max compressed size to cache
> Longest byte length (after compression) to store in cache. Longer
> adapter outputs will not be cached and recomputed every time.
> \[default: 2000000\]
> Maximum nestedness of archives to recurse into \[default: 4\]
**-h** shows a concise overview, **\--help** shows more detail and
advanced options.
All other options not shown here are passed directly to rg, especially
\[PATTERN\] and \[PATH \...\]
5 years ago
5 years ago
## Development
To enable debug logging:
export RUST_LOG=debug
Also rember to disable caching with `--rga-no-cache` or clear the cache in `~/.cache/rga` to debug the adapters.