readme

5 years ago · 47a59ea751
parent b3de05cc59
commit 47a59ea751
1 changed files with 24 additions and 10 deletions
--- a/README.md
+++ b/README.md
@ -36,37 +36,49 @@ rga --rga-list-adapters

 Adapters:

-   ffmpeg
+Adapters:
+
+-   **ffmpeg**

    Uses ffmpeg to extract video metadata/chapters and subtitles

    Extensions: .mkv, .mp4, .avi

-   pandoc
+*   **pandoc**

    Uses pandoc to convert binary/unreadable text documents to plain markdown-like text

    Extensions: .epub, .odt, .docx, .fb2, .ipynb

-   poppler
+-   **poppler**

    Uses pdftotext (from poppler-utils) to extract plain text from PDF files

    Extensions: .pdf

-   zip
+*   **zip**

    Reads a zip file as a stream and recurses down into its contents

    Extensions: .zip

-   tar
+    Mime Types: application/zip
+
+*   **decompress**
+
+    Reads compressed file as a stream and runs a different extractor on the contents.
+
+    Extensions: .tgz, .tbz, .tbz2, .gz, .bz2, .xz, .zst
+
+    Mime Types: application/gzip, application/x-bzip, application/x-xz, application/zstd
+
+*   **tar**

    Reads a tar file as a stream and recurses down into its contents

-    Extensions: .tar, .tar.gz, .tar.bz2, .tar.xz, .tar.zst
+    Extensions: .tar

-   sqlite
+-   **sqlite**

    Uses sqlite bindings to convert sqlite databases into a simple plain text format

@ -74,14 +86,16 @@ Adapters:

    Mime Types: application/x-sqlite3

-The following adapters are disabled by default, and can be enabled using `--rga-adapters=+pdfpages,tesseract`:
+The following adapters are disabled by default, and can be enabled using '--rga-adapters=+pdfpages,tesseract':
+
+-   **pdfpages**

-   pdfpages
    Converts a pdf to it's individual pages as png files. Only useful in combination with tesseract

    Extensions: .pdf

-   tesseract
+*   **tesseract**
+
    Uses tesseract to run OCR on images to make them searchable. May need -j1 to prevent overloading the system. Make sure you have tesseract installed.

    Extensions: .jpg, .png