{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "dd7ec7af",
   "metadata": {},
   "source": [
    "# Elasticsearch database\n",
    "\n",
    "Interact with Elasticsearch analytics database via Langchain. This chain builds search queries via the Elasticsearch DSL API (filters and aggregations).\n",
    "\n",
    "The Elasticsearch client must have permissions for index listing, mapping description and search queries.\n",
    "\n",
    "See [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) for instructions on how to run Elasticsearch locally.\n",
    "\n",
    "Make sure to install the Elasticsearch Python client before:\n",
    "\n",
    "```sh\n",
    "pip install elasticsearch\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "dd8eae75",
   "metadata": {},
   "outputs": [],
   "source": [
    "from elasticsearch import Elasticsearch\n",
    "\n",
    "from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain\n",
    "from langchain.chat_models import ChatOpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "5cde03bc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize Elasticsearch python client.\n",
    "# See https://elasticsearch-py.readthedocs.io/en/v8.8.2/api.html#elasticsearch.Elasticsearch\n",
    "ELASTIC_SEARCH_SERVER = \"https://elastic:pass@localhost:9200\"\n",
    "db = Elasticsearch(ELASTIC_SEARCH_SERVER)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74a41374",
   "metadata": {},
   "source": [
    "Uncomment the next cell to initially populate your db."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "430ada0f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# customers = [\n",
    "#     {\"firstname\": \"Jennifer\", \"lastname\": \"Walters\"},\n",
    "#     {\"firstname\": \"Monica\",\"lastname\":\"Rambeau\"},\n",
    "#     {\"firstname\": \"Carol\",\"lastname\":\"Danvers\"},\n",
    "#     {\"firstname\": \"Wanda\",\"lastname\":\"Maximoff\"},\n",
    "#     {\"firstname\": \"Jennifer\",\"lastname\":\"Takeda\"},\n",
    "# ]\n",
    "# for i, customer in enumerate(customers):\n",
    "#     db.create(index=\"customers\", document=customer, id=i)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "f36ae0d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = ChatOpenAI(model_name=\"gpt-4\", temperature=0)\n",
    "chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "b5d22d9d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new ElasticsearchDatabaseChain chain...\u001b[0m\n",
      "What are the first names of all the customers?\n",
      "ESQuery:\u001b[32;1m\u001b[1;3m{'size': 10, 'query': {'match_all': {}}, '_source': ['firstname']}\u001b[0m\n",
      "ESResult: \u001b[33;1m\u001b[1;3m{'took': 5, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 6, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'customers', '_id': '0', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}, {'_index': 'customers', '_id': '1', '_score': 1.0, '_source': {'firstname': 'Monica'}}, {'_index': 'customers', '_id': '2', '_score': 1.0, '_source': {'firstname': 'Carol'}}, {'_index': 'customers', '_id': '3', '_score': 1.0, '_source': {'firstname': 'Wanda'}}, {'_index': 'customers', '_id': '4', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}, {'_index': 'customers', '_id': 'firstname', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}]}}\u001b[0m\n",
      "Answer:\u001b[32;1m\u001b[1;3mThe first names of all the customers are Jennifer, Monica, Carol, Wanda, and Jennifer.\u001b[0m\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'The first names of all the customers are Jennifer, Monica, Carol, Wanda, and Jennifer.'"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "question = \"What are the first names of all the customers?\"\n",
    "chain.run(question)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b4bfada",
   "metadata": {},
   "source": [
    "## Custom prompt\n",
    "\n",
    "For best results you'll likely need to customize the prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0a494f5b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chains.elasticsearch_database.prompts import DEFAULT_DSL_TEMPLATE\n",
    "from langchain.prompts.prompt import PromptTemplate\n",
    "\n",
    "PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",
    "\n",
    "Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.\n",
    "\n",
    "Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.\n",
    "\n",
    "Use the following format:\n",
    "\n",
    "Question: Question here\n",
    "ESQuery: Elasticsearch Query formatted as json\n",
    "\"\"\"\n",
    "\n",
    "PROMPT = PromptTemplate.from_template(\n",
    "    PROMPT_TEMPLATE,\n",
    ")\n",
    "chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, query_prompt=PROMPT)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "372b8f93",
   "metadata": {},
   "source": [
    "## Adding example rows from each index\n",
    "\n",
    "Sometimes, the format of the data is not obvious and it is optimal to include a sample of rows from the indices in the prompt to allow the LLM to understand the data before providing a final query. Here we will use this feature to let the LLM know that artists are saved with their full names by providing ten rows from the index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eef818de",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = ElasticsearchDatabaseChain.from_llm(\n",
    "    llm=ChatOpenAI(temperature=0),\n",
    "    database=db,\n",
    "    sample_documents_in_index_info=2,  # 2 rows from each index will be included in the prompt as sample data\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "venv"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}