mirror of
https://github.com/hwchase17/langchain
synced 2024-11-18 09:25:54 +00:00
templates: add RAG template for Intel Xeon Scalable Processors (#18424)
**Description:** This template utilizes Chroma and TGI (Text Generation Inference) to execute RAG on the Intel Xeon Scalable Processors. It serves as a demonstration for users, illustrating the deployment of the RAG service on the Intel Xeon Scalable Processors and showcasing the resulting performance enhancements. **Issue:** None **Dependencies:** The template contains the poetry project requirements to run this template. CPU TGI batching is WIP. **Twitter handle:** None --------- Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
parent
d4673a3507
commit
0175906437
97
templates/intel-rag-xeon/README.md
Normal file
97
templates/intel-rag-xeon/README.md
Normal file
@ -0,0 +1,97 @@
|
||||
# RAG example on Intel Xeon
|
||||
This template performs RAG using Chroma and Text Generation Inference on Intel® Xeon® Scalable Processors.
|
||||
Intel® Xeon® Scalable processors feature built-in accelerators for more performance-per-core and unmatched AI performance, with advanced security technologies for the most in-demand workload requirements—all while offering the greatest cloud choice and application portability, please check [Intel® Xeon® Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html).
|
||||
|
||||
## Environment Setup
|
||||
To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Intel® Xeon® Scalable Processors, please follow these steps:
|
||||
|
||||
|
||||
### Launch a local server instance on Intel Xeon Server:
|
||||
```bash
|
||||
model=Intel/neural-chat-7b-v3-3
|
||||
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
|
||||
|
||||
docker run --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model
|
||||
```
|
||||
|
||||
For gated models such as `LLAMA-2`, you will have to pass -e HUGGING_FACE_HUB_TOKEN=\<token\> to the docker run command above with a valid Hugging Face Hub read token.
|
||||
|
||||
Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token ans export `HUGGINGFACEHUB_API_TOKEN` environment with the token.
|
||||
|
||||
```bash
|
||||
export HUGGINGFACEHUB_API_TOKEN=<token>
|
||||
```
|
||||
|
||||
Send a request to check if the endpoint is wokring:
|
||||
|
||||
```bash
|
||||
curl localhost:8080/generate -X POST -d '{"inputs":"Which NFL team won the Super Bowl in the 2010 season?","parameters":{"max_new_tokens":128, "do_sample": true}}' -H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
More details please refer to [text-generation-inference](https://github.com/huggingface/text-generation-inference).
|
||||
|
||||
|
||||
## Populating with data
|
||||
|
||||
If you want to populate the DB with some example data, you can run the below commands:
|
||||
```shell
|
||||
poetry install
|
||||
poetry run python ingest.py
|
||||
```
|
||||
|
||||
The script process and stores sections from Edgar 10k filings data for Nike `nke-10k-2023.pdf` into a Chroma database.
|
||||
|
||||
## Usage
|
||||
|
||||
To use this package, you should first have the LangChain CLI installed:
|
||||
|
||||
```shell
|
||||
pip install -U langchain-cli
|
||||
```
|
||||
|
||||
To create a new LangChain project and install this as the only package, you can do:
|
||||
|
||||
```shell
|
||||
langchain app new my-app --package intel-rag-xeon
|
||||
```
|
||||
|
||||
If you want to add this to an existing project, you can just run:
|
||||
|
||||
```shell
|
||||
langchain app add intel-rag-xeon
|
||||
```
|
||||
|
||||
And add the following code to your `server.py` file:
|
||||
```python
|
||||
from intel_rag_xeon import chain as xeon_rag_chain
|
||||
|
||||
add_routes(app, xeon_rag_chain, path="/intel-rag-xeon")
|
||||
```
|
||||
|
||||
(Optional) Let's now configure LangSmith. LangSmith will help us trace, monitor and debug LangChain applications. LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/). If you don't have access, you can skip this section
|
||||
|
||||
```shell
|
||||
export LANGCHAIN_TRACING_V2=true
|
||||
export LANGCHAIN_API_KEY=<your-api-key>
|
||||
export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to "default"
|
||||
```
|
||||
|
||||
If you are inside this directory, then you can spin up a LangServe instance directly by:
|
||||
|
||||
```shell
|
||||
langchain serve
|
||||
```
|
||||
|
||||
This will start the FastAPI app with a server is running locally at
|
||||
[http://localhost:8000](http://localhost:8000)
|
||||
|
||||
We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
|
||||
We can access the playground at [http://127.0.0.1:8000/intel-rag-xeon/playground](http://127.0.0.1:8000/intel-rag-xeon/playground)
|
||||
|
||||
We can access the template from code with:
|
||||
|
||||
```python
|
||||
from langserve.client import RemoteRunnable
|
||||
|
||||
runnable = RemoteRunnable("http://localhost:8000/intel-rag-xeon")
|
||||
```
|
BIN
templates/intel-rag-xeon/data/nke-10k-2023.pdf
Normal file
BIN
templates/intel-rag-xeon/data/nke-10k-2023.pdf
Normal file
Binary file not shown.
49
templates/intel-rag-xeon/ingest.py
Normal file
49
templates/intel-rag-xeon/ingest.py
Normal file
@ -0,0 +1,49 @@
|
||||
import os
|
||||
|
||||
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
||||
from langchain_community.document_loaders import UnstructuredFileLoader
|
||||
from langchain_community.embeddings import HuggingFaceEmbeddings
|
||||
from langchain_community.vectorstores import Chroma
|
||||
from langchain_core.documents import Document
|
||||
|
||||
|
||||
def ingest_documents():
|
||||
"""
|
||||
Ingest PDF to Redis from the data/ directory that
|
||||
contains Edgar 10k filings data for Nike.
|
||||
"""
|
||||
# Load list of pdfs
|
||||
data_path = "data/"
|
||||
doc = [os.path.join(data_path, file) for file in os.listdir(data_path)][0]
|
||||
|
||||
print("Parsing 10k filing doc for NIKE", doc)
|
||||
|
||||
text_splitter = RecursiveCharacterTextSplitter(
|
||||
chunk_size=1500, chunk_overlap=100, add_start_index=True
|
||||
)
|
||||
loader = UnstructuredFileLoader(doc, mode="single", strategy="fast")
|
||||
chunks = loader.load_and_split(text_splitter)
|
||||
|
||||
print("Done preprocessing. Created", len(chunks), "chunks of the original pdf")
|
||||
|
||||
# Create vectorstore
|
||||
embedder = HuggingFaceEmbeddings(
|
||||
model_name="sentence-transformers/all-MiniLM-L6-v2"
|
||||
)
|
||||
|
||||
documents = []
|
||||
for chunk in chunks:
|
||||
doc = Document(page_content=chunk.page_content, metadata=chunk.metadata)
|
||||
documents.append(doc)
|
||||
|
||||
# Add to vectorDB
|
||||
_ = Chroma.from_documents(
|
||||
documents=documents,
|
||||
collection_name="xeon-rag",
|
||||
embedding=embedder,
|
||||
persist_directory="/tmp/xeon_rag_db",
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
ingest_documents()
|
62
templates/intel-rag-xeon/intel_rag_xeon.ipynb
Normal file
62
templates/intel-rag-xeon/intel_rag_xeon.ipynb
Normal file
@ -0,0 +1,62 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "681a5d1e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Connect to RAG App\n",
|
||||
"\n",
|
||||
"Assuming you are already running this server:\n",
|
||||
"```bash\n",
|
||||
"langserve start\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d774be2a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langserve.client import RemoteRunnable\n",
|
||||
"\n",
|
||||
"gaudi_rag = RemoteRunnable(\"http://localhost:8000/intel-rag-xeon\")\n",
|
||||
"\n",
|
||||
"print(gaudi_rag.invoke(\"What was Nike's revenue in 2023?\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "07ae0005",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(gaudi_rag.invoke(\"How many employees work at Nike?\"))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
3
templates/intel-rag-xeon/intel_rag_xeon/__init__.py
Normal file
3
templates/intel-rag-xeon/intel_rag_xeon/__init__.py
Normal file
@ -0,0 +1,3 @@
|
||||
from intel_rag_xeon.chain import chain
|
||||
|
||||
__all__ = ["chain"]
|
72
templates/intel-rag-xeon/intel_rag_xeon/chain.py
Normal file
72
templates/intel-rag-xeon/intel_rag_xeon/chain.py
Normal file
@ -0,0 +1,72 @@
|
||||
from langchain.callbacks import streaming_stdout
|
||||
from langchain_community.embeddings import HuggingFaceEmbeddings
|
||||
from langchain_community.llms import HuggingFaceEndpoint
|
||||
from langchain_community.vectorstores import Chroma
|
||||
from langchain_core.output_parsers import StrOutputParser
|
||||
from langchain_core.prompts import ChatPromptTemplate
|
||||
from langchain_core.pydantic_v1 import BaseModel
|
||||
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
|
||||
from langchain_core.vectorstores import VectorStoreRetriever
|
||||
|
||||
|
||||
# Make this look better in the docs.
|
||||
class Question(BaseModel):
|
||||
__root__: str
|
||||
|
||||
|
||||
# Init Embeddings
|
||||
embedder = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
|
||||
|
||||
knowledge_base = Chroma(
|
||||
persist_directory="/tmp/xeon_rag_db",
|
||||
embedding_function=embedder,
|
||||
collection_name="xeon-rag",
|
||||
)
|
||||
query = "What was Nike's revenue in 2023?"
|
||||
docs = knowledge_base.similarity_search(query)
|
||||
print(docs[0].page_content)
|
||||
retriever = VectorStoreRetriever(
|
||||
vectorstore=knowledge_base, search_type="mmr", search_kwargs={"k": 1, "fetch_k": 5}
|
||||
)
|
||||
|
||||
# Define our prompt
|
||||
template = """
|
||||
Use the following pieces of context from retrieved
|
||||
dataset to answer the question. Do not make up an answer if there is no
|
||||
context provided to help answer it.
|
||||
|
||||
Context:
|
||||
---------
|
||||
{context}
|
||||
|
||||
---------
|
||||
Question: {question}
|
||||
---------
|
||||
|
||||
Answer:
|
||||
"""
|
||||
|
||||
|
||||
prompt = ChatPromptTemplate.from_template(template)
|
||||
|
||||
|
||||
ENDPOINT_URL = "http://localhost:8080"
|
||||
callbacks = [streaming_stdout.StreamingStdOutCallbackHandler()]
|
||||
model = HuggingFaceEndpoint(
|
||||
endpoint_url=ENDPOINT_URL,
|
||||
max_new_tokens=512,
|
||||
top_k=10,
|
||||
top_p=0.95,
|
||||
typical_p=0.95,
|
||||
temperature=0.01,
|
||||
repetition_penalty=1.03,
|
||||
streaming=True,
|
||||
)
|
||||
|
||||
# RAG Chain
|
||||
chain = (
|
||||
RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
|
||||
| prompt
|
||||
| model
|
||||
| StrOutputParser()
|
||||
).with_types(input_type=Question)
|
5693
templates/intel-rag-xeon/poetry.lock
generated
Normal file
5693
templates/intel-rag-xeon/poetry.lock
generated
Normal file
File diff suppressed because it is too large
Load Diff
51
templates/intel-rag-xeon/pyproject.toml
Normal file
51
templates/intel-rag-xeon/pyproject.toml
Normal file
@ -0,0 +1,51 @@
|
||||
[tool.poetry]
|
||||
name = "intel-rag-xeon"
|
||||
version = "0.0.1"
|
||||
description = "Run a RAG app on Intel Xeon Scalable Processors"
|
||||
authors = [
|
||||
"Liang Lv <liang1.lv@intel.com>",
|
||||
]
|
||||
readme = "README.md"
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = ">=3.9,<3.13"
|
||||
langchain = "^0.1"
|
||||
fastapi = "^0.104.0"
|
||||
sse-starlette = "^1.6.5"
|
||||
sentence-transformers = "2.2.2"
|
||||
tiktoken = ">=0.5.1"
|
||||
chromadb = ">=0.4.14"
|
||||
beautifulsoup4 = ">=4.12.2"
|
||||
|
||||
[tool.poetry.dependencies.unstructured]
|
||||
version = "^0.10.27"
|
||||
extras = [
|
||||
"pdf",
|
||||
]
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
poethepoet = "^0.24.1"
|
||||
langchain-cli = ">=0.0.21"
|
||||
|
||||
[tool.langserve]
|
||||
export_module = "intel_rag_xeon.chain"
|
||||
export_attr = "chain"
|
||||
|
||||
[tool.templates-hub]
|
||||
use-case = "rag"
|
||||
author = "Intel"
|
||||
integrations = ["Intel", "HuggingFace"]
|
||||
tags = ["vectordbs"]
|
||||
|
||||
[tool.poe.tasks.start]
|
||||
cmd = "uvicorn langchain_cli.dev_scripts:create_demo_server --reload --port $port --host $host"
|
||||
args = [
|
||||
{ name = "port", help = "port to run on", default = "8000" },
|
||||
{ name = "host", help = "host to run on", default = "127.0.0.1" },
|
||||
]
|
||||
|
||||
[build-system]
|
||||
requires = [
|
||||
"poetry-core",
|
||||
]
|
||||
build-backend = "poetry.core.masonry.api"
|
0
templates/intel-rag-xeon/tests/__init__.py
Normal file
0
templates/intel-rag-xeon/tests/__init__.py
Normal file
Loading…
Reference in New Issue
Block a user