mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
ebf998acb6
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Jacob Lee <jacoblee93@gmail.com> |
||
---|---|---|
.. | ||
docs | ||
rag_semi_structured | ||
tests | ||
LICENSE | ||
poetry.lock | ||
pyproject.toml | ||
rag_semi_structured.ipynb | ||
README.md |
Semi structured RAG
This template performs RAG on semi-structured data (e.g., a PDF with text and tables).
See this blog post for useful background context.
Data loading
We use partition_pdf from Unstructured to extract both table and text elements.
This will require some system-level package installations, e.g., on Mac:
brew install tesseract poppler
Chroma
Chroma is an open-source vector database.
This template will create and add documents to the vector database in chain.py
.
These documents can be loaded from many sources.
LLM
Be sure that OPENAI_API_KEY
is set in order to the OpenAI models.
Adding the template
Create your LangServe app:
langchain serve new my-app
cd my-app
Add template:
langchain serve add rag-semi-structured
Start server:
langchain start
See Jupyter notebook rag_semi_structured
for various way to connect to the template.