langchain/docs/modules/chat/examples/chat_vector_db.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "134a0785",
   "metadata": {},
   "source": [
    "# Chat Vector DB\n",
    "\n",
    "This notebook goes over how to set up a chat model to chat with a vector database.\n",
    "\n",
    "This notebook is very similar to the example of using an LLM in the ChatVectorDBChain. The only differences here are (1) using a ChatModel, and (2) passing in a ChatPromptTemplate (optimized for chat models)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "70c4e529",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.vectorstores import Chroma\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain.chains import ChatVectorDBChain"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdff94be",
   "metadata": {},
   "source": [
    "Load in documents. You can replace this with a loader for whatever type of data you want"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "01c46e92",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.document_loaders import TextLoader\n",
    "loader = TextLoader('../../state_of_the_union.txt')\n",
    "documents = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9be4779",
   "metadata": {},
   "source": [
    "If you had multiple loaders that you wanted to combine, you do something like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "433363a5",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# loaders = [....]\n",
    "# docs = []\n",
    "# for loader in loaders:\n",
    "#     docs.extend(loader.load())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "239475d2",
   "metadata": {},
   "source": [
    "We now split the documents, create embeddings for them, and put them in a vectorstore. This allows us to do semantic search over them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "a8930cf7",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running Chroma using direct local API.\n",
      "Using DuckDB in-memory for database. Data will be transient.\n"
     ]
    }
   ],
   "source": [
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "documents = text_splitter.split_documents(documents)\n",
    "\n",
    "embeddings = OpenAIEmbeddings()\n",
    "vectorstore = Chroma.from_documents(documents, embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18415aca",
   "metadata": {},
   "source": [
    "We are now going to construct a prompt specifically designed for chat models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c8805230",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.prompts.chat import (\n",
    "    ChatPromptTemplate,\n",
    "    SystemMessagePromptTemplate,\n",
    "    AIMessagePromptTemplate,\n",
    "    HumanMessagePromptTemplate,\n",
    ")\n",
    "from langchain.schema import (\n",
    "    AIMessage,\n",
    "    HumanMessage,\n",
    "    SystemMessage\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "cc86c30e",
   "metadata": {},
   "outputs": [],
   "source": [
    "system_template=\"\"\"Use the following pieces of context to answer the users question. \n",
    "If you don't know the answer, just say that you don't know, don't try to make up an answer.\n",
    "----------------\n",
    "{context}\"\"\"\n",
    "messages = [\n",
    "    SystemMessagePromptTemplate.from_template(system_template),\n",
    "    HumanMessagePromptTemplate.from_template(\"{question}\")\n",
    "]\n",
    "prompt = ChatPromptTemplate.from_messages(messages)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c96b118",
   "metadata": {},
   "source": [
    "We now initialize the ChatVectorDBChain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "7b4110f3",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "qa = ChatVectorDBChain.from_llm(ChatOpenAI(temperature=0), vectorstore,qa_prompt=prompt)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3872432d",
   "metadata": {},
   "source": [
    "Here's an example of asking a question with no chat history"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "7fe3e730",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chat_history = []\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "bfff9cc8",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"The President nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court. He described her as one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and a consensus builder. He also mentioned that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e46edf7",
   "metadata": {},
   "source": [
    "Here's an example of asking a question with some chat history"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "00b4cf00",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chat_history = [(query, result[\"answer\"])]\n",
    "query = \"Did he mention who came before her\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "f01828d1",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The context does not provide information about the predecessor of Ketanji Brown Jackson.'"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result['answer']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2324cdc6-98bf-4708-b8cd-02a98b1e5b67",
   "metadata": {},
   "source": [
    "## Chat Vector DB with streaming to `stdout`\n",
    "\n",
    "Output from the chain will be streamed to `stdout` token by token in this example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "2efacec3-2690-4b05-8de3-a32fd2ac3911",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chains.llm import LLMChain\n",
    "from langchain.llms import OpenAI\n",
    "from langchain.callbacks.base import CallbackManager\n",
    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
    "from langchain.chains.chat_vector_db.prompts import CONDENSE_QUESTION_PROMPT\n",
    "from langchain.chains.question_answering import load_qa_chain\n",
    "\n",
    "# Construct a ChatVectorDBChain with a streaming llm for combine docs\n",
    "# and a separate, non-streaming llm for question generation\n",
    "llm = OpenAI(temperature=0)\n",
    "streaming_llm = ChatOpenAI(streaming=True, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), verbose=True, temperature=0)\n",
    "\n",
    "question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
    "doc_chain = load_qa_chain(streaming_llm, chain_type=\"stuff\", prompt=prompt)\n",
    "\n",
    "qa = ChatVectorDBChain(vectorstore=vectorstore, combine_docs_chain=doc_chain, question_generator=question_generator)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "fd6d43f4-7428-44a4-81bc-26fe88a98762",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The President nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court. He described her as one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and a consensus builder. He also mentioned that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."
     ]
    }
   ],
   "source": [
    "chat_history = []\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "5ab38978-f3e8-4fa7-808c-c79dec48379a",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The context does not provide information on who Ketanji Brown Jackson succeeded on the United States Supreme Court."
     ]
    }
   ],
   "source": [
    "chat_history = [(query, result[\"answer\"])]\n",
    "query = \"Did he mention who she suceeded\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e8d0055",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}