docs: Update README

This commit is contained in:
namuan 2023-01-08 14:57:19 +00:00
parent e3b0303376
commit cec3c36288
4 changed files with 67 additions and 3 deletions

View File

@ -4,6 +4,7 @@ repos:
hooks:
- id: check-ast
- id: check-added-large-files
exclude: "^docs/dr-doc-search-github-demo.gif"
- id: check-merge-conflict
- id: check-case-conflict
- id: check-docstring-first

View File

@ -4,6 +4,11 @@
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dr-doc-search?style=flat-square)](https://pypi.python.org/pypi/dr-doc-search/)
[![PyPI - License](https://img.shields.io/pypi/l/dr-doc-search?style=flat-square)](https://pypi.python.org/pypi/dr-doc-search/)
Converse with a book (PDF)
![](docs/dr-doc-search-github-demo.gif)
See [tweet](https://twitter.com/deskriders_twt/status/1612088387984588802) for full demo.
---
@ -15,8 +20,6 @@
---
Converse with an ebook (PDF)
## Pre-requisites
- [Tessaract OCR](https://github.com/tesseract-ocr/tesseract)
@ -30,6 +33,65 @@ pip install dr-doc-search
## Example Usage
There are two steps to use this application:
**1.** First, you need to create the index and generate embeddings for the PDF file.
Here I'm using a PDF file generated from this page [Parable of a Monetary Economy
](http://heteconomist.com/parable-of-a-monetary-economy/)
Before running this, you need to set up your OpenAI API key. You can get it from [OpenAI](https://beta.openai.com/account/api-keys).
```shell
export OPENAI_API_KEY=<your-openai-api-key>
```
The run the following command to start the training process:
```shell
dr-doc-search --train -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf
```
The training process generates some temporary files in the `OutputDir/dr-doc-search/<pdf-name>` folder under your home directory.
Here is what it looks like:
```text
~/OutputDir/dr-doc-search/parable-of-a-monetary-economy-heteconomist
$ tree
.
├── images
│ ├── output-1.png
│ ├── output-10.png
│ ├── output-11.png
...
│ └── output-9.png
├── index
│ ├── docsearch.index
│ └── index.pkl
├── parable-of-a-monetary-economy-heteconomist.pdf
└── scanned
├── output-1.txt
...
└── output-9.txt
```
> **Note:**
> It is possible to change the base of the output directory by providing the `--app-dir` argument.
**2.** Now that we have the index, we can use it to start asking questions.
```shell
dr-doc-search -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf --input-question "How did the attempt to reduce the debut resulted in decrease in employment?"
```
Or You can open up a web interface (on port :5006) to ask questions:
```shell
dr-doc-search --web-app -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf
```
There are more options for choose the start and end pages for the PDF file.
See the help for more details:
```shell
dr-doc-search --help
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 MiB

View File

@ -1,5 +1,6 @@
"""
Indexes a PDF file and generate OpenAI Embeddings
Indexes a PDF file and generate OpenAI Embeddings.
Also allow user to ask question using the command line interface or the web app.
"""
from argparse import ArgumentParser, Namespace, RawDescriptionHelpFormatter
from pathlib import Path