langchain/templates/rag-semi-structured/README.md
Harrison Chase 8b5e879171
add a template for the package readme (#12499)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-30 16:39:39 -07:00

1.2 KiB

Semi structured RAG

This template performs RAG on semi-structured data (e.g., a PDF with text and tables).

See this blog post for useful background context.

Data loading

We use partition_pdf from Unstructured to extract both table and text elements.

This will require some system-level package installations, e.g., on Mac:

brew install tesseract poppler

Chroma

Chroma is an open-source vector database.

This template will create and add documents to the vector database in chain.py.

These documents can be loaded from many sources.

LLM

Be sure that OPENAI_API_KEY is set in order to the OpenAI models.

Adding the template

Create your LangServe app:

langchain app new my-app
cd my-app

Add template:

langchain app add rag-semi-structured

Start server:

langchain serve

See Jupyter notebook rag_semi_structured for various way to connect to the template.