mirror of
https://github.com/hwchase17/langchain
synced 2024-11-18 09:25:54 +00:00
templates: add RAG template for Intel Xeon Scalable Processors (#18424)
**Description:** This template utilizes Chroma and TGI (Text Generation Inference) to execute RAG on the Intel Xeon Scalable Processors. It serves as a demonstration for users, illustrating the deployment of the RAG service on the Intel Xeon Scalable Processors and showcasing the resulting performance enhancements. **Issue:** None **Dependencies:** The template contains the poetry project requirements to run this template. CPU TGI batching is WIP. **Twitter handle:** None --------- Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
parent
d4673a3507
commit
0175906437
97
templates/intel-rag-xeon/README.md
Normal file
97
templates/intel-rag-xeon/README.md
Normal file
@ -0,0 +1,97 @@
|
|||||||
|
# RAG example on Intel Xeon
|
||||||
|
This template performs RAG using Chroma and Text Generation Inference on Intel® Xeon® Scalable Processors.
|
||||||
|
Intel® Xeon® Scalable processors feature built-in accelerators for more performance-per-core and unmatched AI performance, with advanced security technologies for the most in-demand workload requirements—all while offering the greatest cloud choice and application portability, please check [Intel® Xeon® Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html).
|
||||||
|
|
||||||
|
## Environment Setup
|
||||||
|
To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Intel® Xeon® Scalable Processors, please follow these steps:
|
||||||
|
|
||||||
|
|
||||||
|
### Launch a local server instance on Intel Xeon Server:
|
||||||
|
```bash
|
||||||
|
model=Intel/neural-chat-7b-v3-3
|
||||||
|
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
|
||||||
|
|
||||||
|
docker run --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model
|
||||||
|
```
|
||||||
|
|
||||||
|
For gated models such as `LLAMA-2`, you will have to pass -e HUGGING_FACE_HUB_TOKEN=\<token\> to the docker run command above with a valid Hugging Face Hub read token.
|
||||||
|
|
||||||
|
Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token ans export `HUGGINGFACEHUB_API_TOKEN` environment with the token.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export HUGGINGFACEHUB_API_TOKEN=<token>
|
||||||
|
```
|
||||||
|
|
||||||
|
Send a request to check if the endpoint is wokring:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl localhost:8080/generate -X POST -d '{"inputs":"Which NFL team won the Super Bowl in the 2010 season?","parameters":{"max_new_tokens":128, "do_sample": true}}' -H 'Content-Type: application/json'
|
||||||
|
```
|
||||||
|
|
||||||
|
More details please refer to [text-generation-inference](https://github.com/huggingface/text-generation-inference).
|
||||||
|
|
||||||
|
|
||||||
|
## Populating with data
|
||||||
|
|
||||||
|
If you want to populate the DB with some example data, you can run the below commands:
|
||||||
|
```shell
|
||||||
|
poetry install
|
||||||
|
poetry run python ingest.py
|
||||||
|
```
|
||||||
|
|
||||||
|
The script process and stores sections from Edgar 10k filings data for Nike `nke-10k-2023.pdf` into a Chroma database.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
To use this package, you should first have the LangChain CLI installed:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
pip install -U langchain-cli
|
||||||
|
```
|
||||||
|
|
||||||
|
To create a new LangChain project and install this as the only package, you can do:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
langchain app new my-app --package intel-rag-xeon
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want to add this to an existing project, you can just run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
langchain app add intel-rag-xeon
|
||||||
|
```
|
||||||
|
|
||||||
|
And add the following code to your `server.py` file:
|
||||||
|
```python
|
||||||
|
from intel_rag_xeon import chain as xeon_rag_chain
|
||||||
|
|
||||||
|
add_routes(app, xeon_rag_chain, path="/intel-rag-xeon")
|
||||||
|
```
|
||||||
|
|
||||||
|
(Optional) Let's now configure LangSmith. LangSmith will help us trace, monitor and debug LangChain applications. LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/). If you don't have access, you can skip this section
|
||||||
|
|
||||||
|
```shell
|
||||||
|
export LANGCHAIN_TRACING_V2=true
|
||||||
|
export LANGCHAIN_API_KEY=<your-api-key>
|
||||||
|
export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to "default"
|
||||||
|
```
|
||||||
|
|
||||||
|
If you are inside this directory, then you can spin up a LangServe instance directly by:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
langchain serve
|
||||||
|
```
|
||||||
|
|
||||||
|
This will start the FastAPI app with a server is running locally at
|
||||||
|
[http://localhost:8000](http://localhost:8000)
|
||||||
|
|
||||||
|
We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
|
||||||
|
We can access the playground at [http://127.0.0.1:8000/intel-rag-xeon/playground](http://127.0.0.1:8000/intel-rag-xeon/playground)
|
||||||
|
|
||||||
|
We can access the template from code with:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from langserve.client import RemoteRunnable
|
||||||
|
|
||||||
|
runnable = RemoteRunnable("http://localhost:8000/intel-rag-xeon")
|
||||||
|
```
|
BIN
templates/intel-rag-xeon/data/nke-10k-2023.pdf
Normal file
BIN
templates/intel-rag-xeon/data/nke-10k-2023.pdf
Normal file
Binary file not shown.
49
templates/intel-rag-xeon/ingest.py
Normal file
49
templates/intel-rag-xeon/ingest.py
Normal file
@ -0,0 +1,49 @@
|
|||||||
|
import os
|
||||||
|
|
||||||
|
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
||||||
|
from langchain_community.document_loaders import UnstructuredFileLoader
|
||||||
|
from langchain_community.embeddings import HuggingFaceEmbeddings
|
||||||
|
from langchain_community.vectorstores import Chroma
|
||||||
|
from langchain_core.documents import Document
|
||||||
|
|
||||||
|
|
||||||
|
def ingest_documents():
|
||||||
|
"""
|
||||||
|
Ingest PDF to Redis from the data/ directory that
|
||||||
|
contains Edgar 10k filings data for Nike.
|
||||||
|
"""
|
||||||
|
# Load list of pdfs
|
||||||
|
data_path = "data/"
|
||||||
|
doc = [os.path.join(data_path, file) for file in os.listdir(data_path)][0]
|
||||||
|
|
||||||
|
print("Parsing 10k filing doc for NIKE", doc)
|
||||||
|
|
||||||
|
text_splitter = RecursiveCharacterTextSplitter(
|
||||||
|
chunk_size=1500, chunk_overlap=100, add_start_index=True
|
||||||
|
)
|
||||||
|
loader = UnstructuredFileLoader(doc, mode="single", strategy="fast")
|
||||||
|
chunks = loader.load_and_split(text_splitter)
|
||||||
|
|
||||||
|
print("Done preprocessing. Created", len(chunks), "chunks of the original pdf")
|
||||||
|
|
||||||
|
# Create vectorstore
|
||||||
|
embedder = HuggingFaceEmbeddings(
|
||||||
|
model_name="sentence-transformers/all-MiniLM-L6-v2"
|
||||||
|
)
|
||||||
|
|
||||||
|
documents = []
|
||||||
|
for chunk in chunks:
|
||||||
|
doc = Document(page_content=chunk.page_content, metadata=chunk.metadata)
|
||||||
|
documents.append(doc)
|
||||||
|
|
||||||
|
# Add to vectorDB
|
||||||
|
_ = Chroma.from_documents(
|
||||||
|
documents=documents,
|
||||||
|
collection_name="xeon-rag",
|
||||||
|
embedding=embedder,
|
||||||
|
persist_directory="/tmp/xeon_rag_db",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
ingest_documents()
|
62
templates/intel-rag-xeon/intel_rag_xeon.ipynb
Normal file
62
templates/intel-rag-xeon/intel_rag_xeon.ipynb
Normal file
@ -0,0 +1,62 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "681a5d1e",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Connect to RAG App\n",
|
||||||
|
"\n",
|
||||||
|
"Assuming you are already running this server:\n",
|
||||||
|
"```bash\n",
|
||||||
|
"langserve start\n",
|
||||||
|
"```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "d774be2a",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langserve.client import RemoteRunnable\n",
|
||||||
|
"\n",
|
||||||
|
"gaudi_rag = RemoteRunnable(\"http://localhost:8000/intel-rag-xeon\")\n",
|
||||||
|
"\n",
|
||||||
|
"print(gaudi_rag.invoke(\"What was Nike's revenue in 2023?\"))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "07ae0005",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(gaudi_rag.invoke(\"How many employees work at Nike?\"))"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.10.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
3
templates/intel-rag-xeon/intel_rag_xeon/__init__.py
Normal file
3
templates/intel-rag-xeon/intel_rag_xeon/__init__.py
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
from intel_rag_xeon.chain import chain
|
||||||
|
|
||||||
|
__all__ = ["chain"]
|
72
templates/intel-rag-xeon/intel_rag_xeon/chain.py
Normal file
72
templates/intel-rag-xeon/intel_rag_xeon/chain.py
Normal file
@ -0,0 +1,72 @@
|
|||||||
|
from langchain.callbacks import streaming_stdout
|
||||||
|
from langchain_community.embeddings import HuggingFaceEmbeddings
|
||||||
|
from langchain_community.llms import HuggingFaceEndpoint
|
||||||
|
from langchain_community.vectorstores import Chroma
|
||||||
|
from langchain_core.output_parsers import StrOutputParser
|
||||||
|
from langchain_core.prompts import ChatPromptTemplate
|
||||||
|
from langchain_core.pydantic_v1 import BaseModel
|
||||||
|
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
|
||||||
|
from langchain_core.vectorstores import VectorStoreRetriever
|
||||||
|
|
||||||
|
|
||||||
|
# Make this look better in the docs.
|
||||||
|
class Question(BaseModel):
|
||||||
|
__root__: str
|
||||||
|
|
||||||
|
|
||||||
|
# Init Embeddings
|
||||||
|
embedder = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
|
||||||
|
|
||||||
|
knowledge_base = Chroma(
|
||||||
|
persist_directory="/tmp/xeon_rag_db",
|
||||||
|
embedding_function=embedder,
|
||||||
|
collection_name="xeon-rag",
|
||||||
|
)
|
||||||
|
query = "What was Nike's revenue in 2023?"
|
||||||
|
docs = knowledge_base.similarity_search(query)
|
||||||
|
print(docs[0].page_content)
|
||||||
|
retriever = VectorStoreRetriever(
|
||||||
|
vectorstore=knowledge_base, search_type="mmr", search_kwargs={"k": 1, "fetch_k": 5}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Define our prompt
|
||||||
|
template = """
|
||||||
|
Use the following pieces of context from retrieved
|
||||||
|
dataset to answer the question. Do not make up an answer if there is no
|
||||||
|
context provided to help answer it.
|
||||||
|
|
||||||
|
Context:
|
||||||
|
---------
|
||||||
|
{context}
|
||||||
|
|
||||||
|
---------
|
||||||
|
Question: {question}
|
||||||
|
---------
|
||||||
|
|
||||||
|
Answer:
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
prompt = ChatPromptTemplate.from_template(template)
|
||||||
|
|
||||||
|
|
||||||
|
ENDPOINT_URL = "http://localhost:8080"
|
||||||
|
callbacks = [streaming_stdout.StreamingStdOutCallbackHandler()]
|
||||||
|
model = HuggingFaceEndpoint(
|
||||||
|
endpoint_url=ENDPOINT_URL,
|
||||||
|
max_new_tokens=512,
|
||||||
|
top_k=10,
|
||||||
|
top_p=0.95,
|
||||||
|
typical_p=0.95,
|
||||||
|
temperature=0.01,
|
||||||
|
repetition_penalty=1.03,
|
||||||
|
streaming=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
# RAG Chain
|
||||||
|
chain = (
|
||||||
|
RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
|
||||||
|
| prompt
|
||||||
|
| model
|
||||||
|
| StrOutputParser()
|
||||||
|
).with_types(input_type=Question)
|
5693
templates/intel-rag-xeon/poetry.lock
generated
Normal file
5693
templates/intel-rag-xeon/poetry.lock
generated
Normal file
File diff suppressed because it is too large
Load Diff
51
templates/intel-rag-xeon/pyproject.toml
Normal file
51
templates/intel-rag-xeon/pyproject.toml
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
[tool.poetry]
|
||||||
|
name = "intel-rag-xeon"
|
||||||
|
version = "0.0.1"
|
||||||
|
description = "Run a RAG app on Intel Xeon Scalable Processors"
|
||||||
|
authors = [
|
||||||
|
"Liang Lv <liang1.lv@intel.com>",
|
||||||
|
]
|
||||||
|
readme = "README.md"
|
||||||
|
|
||||||
|
[tool.poetry.dependencies]
|
||||||
|
python = ">=3.9,<3.13"
|
||||||
|
langchain = "^0.1"
|
||||||
|
fastapi = "^0.104.0"
|
||||||
|
sse-starlette = "^1.6.5"
|
||||||
|
sentence-transformers = "2.2.2"
|
||||||
|
tiktoken = ">=0.5.1"
|
||||||
|
chromadb = ">=0.4.14"
|
||||||
|
beautifulsoup4 = ">=4.12.2"
|
||||||
|
|
||||||
|
[tool.poetry.dependencies.unstructured]
|
||||||
|
version = "^0.10.27"
|
||||||
|
extras = [
|
||||||
|
"pdf",
|
||||||
|
]
|
||||||
|
|
||||||
|
[tool.poetry.group.dev.dependencies]
|
||||||
|
poethepoet = "^0.24.1"
|
||||||
|
langchain-cli = ">=0.0.21"
|
||||||
|
|
||||||
|
[tool.langserve]
|
||||||
|
export_module = "intel_rag_xeon.chain"
|
||||||
|
export_attr = "chain"
|
||||||
|
|
||||||
|
[tool.templates-hub]
|
||||||
|
use-case = "rag"
|
||||||
|
author = "Intel"
|
||||||
|
integrations = ["Intel", "HuggingFace"]
|
||||||
|
tags = ["vectordbs"]
|
||||||
|
|
||||||
|
[tool.poe.tasks.start]
|
||||||
|
cmd = "uvicorn langchain_cli.dev_scripts:create_demo_server --reload --port $port --host $host"
|
||||||
|
args = [
|
||||||
|
{ name = "port", help = "port to run on", default = "8000" },
|
||||||
|
{ name = "host", help = "host to run on", default = "127.0.0.1" },
|
||||||
|
]
|
||||||
|
|
||||||
|
[build-system]
|
||||||
|
requires = [
|
||||||
|
"poetry-core",
|
||||||
|
]
|
||||||
|
build-backend = "poetry.core.masonry.api"
|
0
templates/intel-rag-xeon/tests/__init__.py
Normal file
0
templates/intel-rag-xeon/tests/__init__.py
Normal file
Loading…
Reference in New Issue
Block a user