langchain/templates/intel-rag-xeon
panwg3 9308bf32e5
spelling errors in words (#23559)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-06-27 17:16:22 +00:00
..
data templates: add RAG template for Intel Xeon Scalable Processors (#18424) 2024-03-29 14:37:32 -07:00
intel_rag_xeon templates: add RAG template for Intel Xeon Scalable Processors (#18424) 2024-03-29 14:37:32 -07:00
tests templates: add RAG template for Intel Xeon Scalable Processors (#18424) 2024-03-29 14:37:32 -07:00
ingest.py templates: add RAG template for Intel Xeon Scalable Processors (#18424) 2024-03-29 14:37:32 -07:00
intel_rag_xeon.ipynb templates: add RAG template for Intel Xeon Scalable Processors (#18424) 2024-03-29 14:37:32 -07:00
pyproject.toml templates: add RAG template for Intel Xeon Scalable Processors (#18424) 2024-03-29 14:37:32 -07:00
README.md spelling errors in words (#23559) 2024-06-27 17:16:22 +00:00

RAG example on Intel Xeon

This template performs RAG using Chroma and Text Generation Inference on Intel® Xeon® Scalable Processors. Intel® Xeon® Scalable processors feature built-in accelerators for more performance-per-core and unmatched AI performance, with advanced security technologies for the most in-demand workload requirements—all while offering the greatest cloud choice and application portability, please check Intel® Xeon® Scalable Processors.

Environment Setup

To use 🤗 text-generation-inference on Intel® Xeon® Scalable Processors, please follow these steps:

Launch a local server instance on Intel Xeon Server:

model=Intel/neural-chat-7b-v3-3
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model

For gated models such as LLAMA-2, you will have to pass -e HUGGING_FACE_HUB_TOKEN=<token> to the docker run command above with a valid Hugging Face Hub read token.

Please follow this link huggingface token to get the access token ans export HUGGINGFACEHUB_API_TOKEN environment with the token.

export HUGGINGFACEHUB_API_TOKEN=<token> 

Send a request to check if the endpoint is working:

curl localhost:8080/generate -X POST -d '{"inputs":"Which NFL team won the Super Bowl in the 2010 season?","parameters":{"max_new_tokens":128, "do_sample": true}}'   -H 'Content-Type: application/json'

More details please refer to text-generation-inference.

Populating with data

If you want to populate the DB with some example data, you can run the below commands:

poetry install
poetry run python ingest.py

The script process and stores sections from Edgar 10k filings data for Nike nke-10k-2023.pdf into a Chroma database.

Usage

To use this package, you should first have the LangChain CLI installed:

pip install -U langchain-cli

To create a new LangChain project and install this as the only package, you can do:

langchain app new my-app --package intel-rag-xeon

If you want to add this to an existing project, you can just run:

langchain app add intel-rag-xeon

And add the following code to your server.py file:

from intel_rag_xeon import chain as xeon_rag_chain

add_routes(app, xeon_rag_chain, path="/intel-rag-xeon")

(Optional) Let's now configure LangSmith. LangSmith will help us trace, monitor and debug LangChain applications. You can sign up for LangSmith here. If you don't have access, you can skip this section

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=<your-api-key>
export LANGCHAIN_PROJECT=<your-project>  # if not specified, defaults to "default"

If you are inside this directory, then you can spin up a LangServe instance directly by:

langchain serve

This will start the FastAPI app with a server is running locally at http://localhost:8000

We can see all templates at http://127.0.0.1:8000/docs We can access the playground at http://127.0.0.1:8000/intel-rag-xeon/playground

We can access the template from code with:

from langserve.client import RemoteRunnable

runnable = RemoteRunnable("http://localhost:8000/intel-rag-xeon")