langchain/cookbook/elasticsearch_db_qa.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Elasticsearch\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/qa_structured/integrations/elasticsearch.ipynb)\n",
    "\n",
    "We can use LLMs to interact with Elasticsearch analytics databases in natural language.\n",
    "\n",
    "This chain builds search queries via the Elasticsearch DSL API (filters and aggregations).\n",
    "\n",
    "The Elasticsearch client must have permissions for index listing, mapping description and search queries.\n",
    "\n",
    "See [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) for instructions on how to run Elasticsearch locally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install langchain langchain-experimental openai elasticsearch\n",
    "\n",
    "# Set env var OPENAI_API_KEY or load from a .env file\n",
    "# import dotenv\n",
    "\n",
    "# dotenv.load_dotenv()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "from elasticsearch import Elasticsearch\n",
    "from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain\n",
    "from langchain_openai import ChatOpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize Elasticsearch python client.\n",
    "# See https://elasticsearch-py.readthedocs.io/en/v8.8.2/api.html#elasticsearch.Elasticsearch\n",
    "ELASTIC_SEARCH_SERVER = \"https://elastic:pass@localhost:9200\"\n",
    "db = Elasticsearch(ELASTIC_SEARCH_SERVER)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Uncomment the next cell to initially populate your db."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# customers = [\n",
    "#     {\"firstname\": \"Jennifer\", \"lastname\": \"Walters\"},\n",
    "#     {\"firstname\": \"Monica\",\"lastname\":\"Rambeau\"},\n",
    "#     {\"firstname\": \"Carol\",\"lastname\":\"Danvers\"},\n",
    "#     {\"firstname\": \"Wanda\",\"lastname\":\"Maximoff\"},\n",
    "#     {\"firstname\": \"Jennifer\",\"lastname\":\"Takeda\"},\n",
    "# ]\n",
    "# for i, customer in enumerate(customers):\n",
    "#     db.create(index=\"customers\", document=customer, id=i)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = ChatOpenAI(model_name=\"gpt-4\", temperature=0)\n",
    "chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"What are the first names of all the customers?\"\n",
    "chain.run(question)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can customize the prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts.prompt import PromptTemplate\n",
    "\n",
    "PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",
    "\n",
    "Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.\n",
    "\n",
    "Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.\n",
    "\n",
    "Use the following format:\n",
    "\n",
    "Question: Question here\n",
    "ESQuery: Elasticsearch Query formatted as json\n",
    "\"\"\"\n",
    "\n",
    "PROMPT = PromptTemplate.from_template(\n",
    "    PROMPT_TEMPLATE,\n",
    ")\n",
    "chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, query_prompt=PROMPT)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
Split sql use case docs (#10257) Split sql use case into directory so we can add other structured data pages 2023-09-06 23:19:21 +00:00			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"# Elasticsearch\n",`
			`"\n",`
Fix typos (#11663) 2023-10-12 15:44:03 +00:00			`"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/qa_structured/integrations/elasticsearch.ipynb)\n",`
Split sql use case docs (#10257) Split sql use case into directory so we can add other structured data pages 2023-09-06 23:19:21 +00:00			`"\n",`
			`"We can use LLMs to interact with Elasticsearch analytics databases in natural language.\n",`
			`"\n",`
			`"This chain builds search queries via the Elasticsearch DSL API (filters and aggregations).\n",`
			`"\n",`
			`"The Elasticsearch client must have permissions for index listing, mapping description and search queries.\n",`
			`"\n",`
			`"See [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) for instructions on how to run Elasticsearch locally."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 2,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"! pip install langchain langchain-experimental openai elasticsearch\n",`
			`"\n",`
			`"# Set env var OPENAI_API_KEY or load from a .env file\n",`
			`"# import dotenv\n",`
			`"\n",`
			`"# dotenv.load_dotenv()"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 15,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"from elasticsearch import Elasticsearch\n",`
DOCS: format notebooks (#13371) 2023-11-14 22:17:44 +00:00			`"from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain\n",`
docs: langchain-openai (#15513) Updates docs and cookbooks to import ChatOpenAI, OpenAI, and OpenAI Embeddings from `langchain_openai` There are likely more --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2024-01-06 23:54:48 +00:00			`"from langchain_openai import ChatOpenAI"`
Split sql use case docs (#10257) Split sql use case into directory so we can add other structured data pages 2023-09-06 23:19:21 +00:00			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# Initialize Elasticsearch python client.\n",`
			`"# See https://elasticsearch-py.readthedocs.io/en/v8.8.2/api.html#elasticsearch.Elasticsearch\n",`
			`"ELASTIC_SEARCH_SERVER = \"https://elastic:pass@localhost:9200\"\n",`
			`"db = Elasticsearch(ELASTIC_SEARCH_SERVER)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Uncomment the next cell to initially populate your db."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# customers = [\n",`
			`"# {\"firstname\": \"Jennifer\", \"lastname\": \"Walters\"},\n",`
			`"# {\"firstname\": \"Monica\",\"lastname\":\"Rambeau\"},\n",`
			`"# {\"firstname\": \"Carol\",\"lastname\":\"Danvers\"},\n",`
			`"# {\"firstname\": \"Wanda\",\"lastname\":\"Maximoff\"},\n",`
			`"# {\"firstname\": \"Jennifer\",\"lastname\":\"Takeda\"},\n",`
			`"# ]\n",`
			`"# for i, customer in enumerate(customers):\n",`
			`"# db.create(index=\"customers\", document=customer, id=i)"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"llm = ChatOpenAI(model_name=\"gpt-4\", temperature=0)\n",`
			`"chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, verbose=True)"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"question = \"What are the first names of all the customers?\"\n",`
			`"chain.run(question)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"We can customize the prompt."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"from langchain.prompts.prompt import PromptTemplate\n",`
			`"\n",`
			`"PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",`
			`"\n",`
			`"Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.\n",`
			`"\n",`
			`"Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.\n",`
			`"\n",`
			`"Use the following format:\n",`
			`"\n",`
			`"Question: Question here\n",`
			`"ESQuery: Elasticsearch Query formatted as json\n",`
			`"\"\"\"\n",`
			`"\n",`
			`"PROMPT = PromptTemplate.from_template(\n",`
			`" PROMPT_TEMPLATE,\n",`
			`")\n",`
			`"chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, query_prompt=PROMPT)"`
			`]`
			`}`
			`],`
			`"metadata": {`
			`"kernelspec": {`
			`"display_name": "Python 3 (ipykernel)",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.9.1"`
			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 4`
			`}`