mirror of
https://github.com/hwchase17/langchain
synced 2024-10-31 15:20:26 +00:00
8b5e879171
Co-authored-by: Erick Friis <erick@langchain.dev>
1.2 KiB
1.2 KiB
Semi structured RAG
This template performs RAG on semi-structured data (e.g., a PDF with text and tables).
See this blog post for useful background context.
Data loading
We use partition_pdf from Unstructured to extract both table and text elements.
This will require some system-level package installations, e.g., on Mac:
brew install tesseract poppler
Chroma
Chroma is an open-source vector database.
This template will create and add documents to the vector database in chain.py
.
These documents can be loaded from many sources.
LLM
Be sure that OPENAI_API_KEY
is set in order to the OpenAI models.
Adding the template
Create your LangServe app:
langchain app new my-app
cd my-app
Add template:
langchain app add rag-semi-structured
Start server:
langchain serve
See Jupyter notebook rag_semi_structured
for various way to connect to the template.