TEMPLATES Add rag-opensearch template (#13501)

Adding rag-opensearch template. --------- Signed-off-by: kalyanr <kalyan.ben10@live.com> Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-31 15:20:26 +00:00 · 2023-11-28 02:51:39 +05:30 · 2023-11-28 02:51:39 +05:30 · ec53d983a1
commit ec53d983a1
parent e47b9c5285
11 changed files with 2055 additions and 0 deletions
--- a/templates/rag-opensearch/.gitignore
+++ b/templates/rag-opensearch/.gitignore
@ -0,0 +1 @@
+__pycache__
--- a/templates/rag-opensearch/LICENSE
+++ b/templates/rag-opensearch/LICENSE
@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 LangChain, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/templates/rag-opensearch/README.md
+++ b/templates/rag-opensearch/README.md
@ -0,0 +1,74 @@
+# rag-opensearch
+
+This Template performs RAG using [OpenSearch](https://python.langchain.com/docs/integrations/vectorstores/opensearch).
+
+## Environment Setup
+
+Set the following environment variables. 
+
+- `OPENAI_API_KEY` -  To access OpenAI Embeddings and Models.
+- `OPENSEARCH_URL` - URL of the hosted OpenSearch Instance
+- `OPENSEARCH_USERNAME` - User name for the OpenSearch instance
+- `OPENSEARCH_PASSWORD` - Password for the OpenSearch instance
+- `OPENSEARCH_INDEX_NAME` - Name of the index 
+
+Note: To load dummy index named `langchain-test` with dummy documents, use `dummy_index_setup.py` script in the folder
+
+## Usage
+
+To use this package, you should first have the LangChain CLI installed:
+
+```shell
+pip install -U langchain-cli
+```
+
+To create a new LangChain project and install this as the only package, you can do:
+
+```shell
+langchain app new my-app --package rag-opensearch
+```
+
+If you want to add this to an existing project, you can just run:
+
+```shell
+langchain app add rag-opensearch
+```
+
+And add the following code to your `server.py` file:
+```python
+from rag_opensearch import chain as rag_opensearch_chain
+
+add_routes(app, rag_opensearch_chain, path="/rag-opensearch")
+```
+
+(Optional) Let's now configure LangSmith. 
+LangSmith will help us trace, monitor and debug LangChain applications. 
+LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/). 
+If you don't have access, you can skip this section
+
+
+```shell
+export LANGCHAIN_TRACING_V2=true
+export LANGCHAIN_API_KEY=<your-api-key>
+export LANGCHAIN_PROJECT=<your-project>  # if not specified, defaults to "default"
+```
+
+If you are inside this directory, then you can spin up a LangServe instance directly by:
+
+```shell
+langchain serve
+```
+
+This will start the FastAPI app with a server is running locally at 
+[http://localhost:8000](http://localhost:8000)
+
+We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
+We can access the playground at [http://127.0.0.1:8000/rag-opensearch/playground](http://127.0.0.1:8000/rag-opensearch/playground)  
+
+We can access the template from code with:
+
+```python
+from langserve.client import RemoteRunnable
+
+runnable = RemoteRunnable("http://localhost:8000/rag-opensearch")
+```
--- a/templates/rag-opensearch/dummy_data.txt
+++ b/templates/rag-opensearch/dummy_data.txt
@ -0,0 +1,19 @@
+[INFO] Initializing machine learning training job. Model: Convolutional Neural Network Dataset: MNIST Hyperparameters: ;   - Learning Rate: 0.001;   - Batch Size: 64
+[INFO] Loading training data. Training data loaded successfully. Number of training samples: 60,000
+[INFO] Loading validation data. Validation data loaded successfully. Number of validation samples: 10,000
+[INFO] Training started. Epoch 1/10;   - Loss: 0.532;   - Accuracy: 0.812 Epoch 2/10;   - Loss: 0.398;   - Accuracy: 0.874 Epoch 3/10;   - Loss: 0.325;   - Accuracy: 0.901 ... (training progress) Training completed.
+[INFO] Validation started. Validation loss: 0.287 Validation accuracy: 0.915 Model performance meets validation criteria. Saving the model.
+[INFO] Testing the trained model. Test loss: 0.298 Test accuracy: 0.910
+[INFO] Deploying the trained model to production. Model deployment successful. API endpoint: http://your-api-endpoint/predict
+[INFO] Monitoring system initialized. Monitoring metrics:;   - CPU Usage: 25%;   - Memory Usage: 40%;   - GPU Usage: 80%
+[ALERT] High GPU Usage Detected! Scaling resources to handle increased load.
+[INFO] Machine learning training job completed successfully. Total training time: 3 hours and 45 minutes.
+[INFO] Cleaning up resources. Job artifacts removed. Training environment closed.
+[INFO] Image processing web server started. Listening on port 8080.
+[INFO] Received image processing request from client at IP address 192.168.1.100. Preprocessing image: resizing to 800x600 pixels. Image preprocessing completed successfully.
+[INFO] Applying filters to enhance image details. Filters applied: sharpening, contrast adjustment. Image enhancement completed.
+[INFO] Generating thumbnail for the processed image. Thumbnail generated successfully.
+[INFO] Uploading processed image to the user's gallery. Image successfully added to the gallery. Image ID: 123456.
+[INFO] Sending notification to the user: Image processing complete. Notification sent successfully.
+[ERROR] Failed to process image due to corrupted file format. Informing the client about the issue. Client notified about the image processing failure.
+[INFO] Image processing web server shutting down. Cleaning up resources. Server shutdown complete.
--- a/templates/rag-opensearch/dummy_index_setup.py
+++ b/templates/rag-opensearch/dummy_index_setup.py
@ -0,0 +1,60 @@
+import os
+
+from openai import OpenAI
+from opensearchpy import OpenSearch
+
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
+OPENSEARCH_URL = os.getenv("OPENSEARCH_URL", "https://localhost:9200")
+OPENSEARCH_USERNAME = os.getenv("OPENSEARCH_USERNAME", "admin")
+OPENSEARCH_PASSWORD = os.getenv("OPENSEARCH_PASSWORD", "admin")
+OPENSEARCH_INDEX_NAME = os.getenv("OPENSEARCH_INDEX_NAME", "langchain-test")
+
+with open("dummy_data.txt") as f:
+    docs = [line.strip() for line in f.readlines()]
+
+
+client_oai = OpenAI(api_key=OPENAI_API_KEY)
+
+
+client = OpenSearch(
+    hosts=[OPENSEARCH_URL],
+    http_auth=(OPENSEARCH_USERNAME, OPENSEARCH_PASSWORD),
+    use_ssl=True,
+    verify_certs=False,
+)
+
+# Define the index settings and mappings
+index_settings = {
+    "settings": {
+        "index": {"knn": True, "number_of_shards": 1, "number_of_replicas": 0}
+    },
+    "mappings": {
+        "properties": {
+            "vector_field": {
+                "type": "knn_vector",
+                "dimension": 1536,
+                "method": {"name": "hnsw", "space_type": "l2", "engine": "faiss"},
+            }
+        }
+    },
+}
+
+response = client.indices.create(index=OPENSEARCH_INDEX_NAME, body=index_settings)
+
+print(response)
+
+
+# Insert docs
+
+
+for each in docs:
+    res = client_oai.embeddings.create(input=each, model="text-embedding-ada-002")
+
+    document = {
+        "vector_field": res.data[0].embedding,
+        "text": each,
+    }
+
+    response = client.index(index=OPENSEARCH_INDEX_NAME, body=document, refresh=True)
+
+    print(response)
--- a/templates/rag-opensearch/poetry.lock
+++ b/templates/rag-opensearch/poetry.lock
--- a/templates/rag-opensearch/pyproject.toml
+++ b/templates/rag-opensearch/pyproject.toml
@ -0,0 +1,33 @@
+[tool.poetry]
+name = "rag-opensearch"
+version = "0.0.1"
+description = "RAG template for OpenSearch"
+authors = ["Kalyan Reddy <kalyan.ben10@live.com>"]
+readme = "README.md"
+
+[tool.poetry.dependencies]
+python = ">=3.8.1,<4.0"
+langchain = ">=0.0.313, <0.1"
+openai = "^0.28.1"
+opensearch-py = "^2.0.0"
+tiktoken = "^0.5.1"
+
+
+[tool.poetry.group.dev.dependencies]
+langchain-cli = ">=0.0.15"
+fastapi = "^0.104.0"
+sse-starlette = "^1.6.5"
+
+[tool.langserve]
+export_module = "rag_opensearch"
+export_attr = "chain"
+
+[tool.templates-hub]
+use-case = "rag"
+author = "OpenSearch"
+integrations = ["OpenAI", "OpenSearch"]
+tags = ["vectordbs"]
+
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"
--- a/templates/rag-opensearch/rag_opensearch.ipynb
+++ b/templates/rag-opensearch/rag_opensearch.ipynb
@ -0,0 +1,35 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Connect to template\n",
+    "\n",
+    "In `server.py`, set -\n",
+    "```\n",
+    "add_routes(app, chain_ext, path=\"/rag_opensearch\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langserve.client import RemoteRunnable\n",
+    "\n",
+    "rag_app = RemoteRunnable(\"http://localhost:8001/rag-opensearch\")\n",
+    "rag_app.invoke(\"What is the ip address used in the image processing logs\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/templates/rag-opensearch/rag_opensearch/init.py
+++ b/templates/rag-opensearch/rag_opensearch/init.py
@ -0,0 +1,3 @@
+from rag_opensearch.chain import chain
+
+__all__ = ["chain"]
--- a/templates/rag-opensearch/rag_opensearch/chain.py
+++ b/templates/rag-opensearch/rag_opensearch/chain.py
@ -0,0 +1,60 @@
+import os
+
+from langchain.chat_models import ChatOpenAI
+from langchain.embeddings import OpenAIEmbeddings
+from langchain.prompts import ChatPromptTemplate
+from langchain.pydantic_v1 import BaseModel
+from langchain.schema.output_parser import StrOutputParser
+from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
+from langchain.vectorstores.opensearch_vector_search import OpenSearchVectorSearch
+
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
+OPENSEARCH_URL = os.getenv("OPENSEARCH_URL", "https://localhost:9200")
+OPENSEARCH_USERNAME = os.getenv("OPENSEARCH_USERNAME", "admin")
+OPENSEARCH_PASSWORD = os.getenv("OPENSEARCH_PASSWORD", "admin")
+OPENSEARCH_INDEX_NAME = os.getenv("OPENSEARCH_INDEX_NAME", "langchain-test")
+
+
+embedding_function = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
+
+vector_store = OpenSearchVectorSearch(
+    opensearch_url=OPENSEARCH_URL,
+    http_auth=(OPENSEARCH_USERNAME, OPENSEARCH_PASSWORD),
+    index_name=OPENSEARCH_INDEX_NAME,
+    embedding_function=embedding_function,
+    verify_certs=False,
+)
+
+
+retriever = vector_store.as_retriever()
+
+
+def format_docs(docs):
+    return "\n\n".join([d.page_content for d in docs])
+
+
+# RAG prompt
+template = """Answer the question based only on the following context:
+{context}
+Question: {question}
+"""
+prompt = ChatPromptTemplate.from_template(template)
+
+# RAG
+model = ChatOpenAI(openai_api_key=OPENAI_API_KEY)
+chain = (
+    RunnableParallel(
+        {"context": retriever | format_docs, "question": RunnablePassthrough()}
+    )
+    | prompt
+    | model
+    | StrOutputParser()
+)
+
+
+# Add typing for input
+class Question(BaseModel):
+    __root__: str
+
+
+chain = chain.with_types(input_type=Question)
--- a/templates/rag-opensearch/tests/init.py
+++ b/templates/rag-opensearch/tests/init.py