{ "cells": [ { "cell_type": "markdown", "id": "134a0785", "metadata": {}, "source": [ "# Chat Over Documents with Vectara\n", "\n", "This notebook is based on the [chat_vector_db](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/chat_vector_db.ipynb) notebook, but using Vectara as the vector database." ] }, { "cell_type": "code", "execution_count": 1, "id": "70c4e529", "metadata": { "tags": [] }, "outputs": [], "source": [ "import os\n", "from langchain.vectorstores import Vectara\n", "from langchain.vectorstores.vectara import VectaraRetriever\n", "from langchain.llms import OpenAI\n", "from langchain.chains import ConversationalRetrievalChain" ] }, { "cell_type": "markdown", "id": "cdff94be", "metadata": {}, "source": [ "Load in documents. You can replace this with a loader for whatever type of data you want" ] }, { "cell_type": "code", "execution_count": 2, "id": "01c46e92", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.document_loaders import TextLoader\n", "loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n", "documents = loader.load()" ] }, { "cell_type": "markdown", "id": "239475d2", "metadata": {}, "source": [ "We now split the documents, create embeddings for them, and put them in a vectorstore. This allows us to do semantic search over them." ] }, { "cell_type": "code", "execution_count": 3, "id": "a8930cf7", "metadata": { "tags": [] }, "outputs": [], "source": [ "vectorstore = Vectara.from_documents(documents, embedding=None)" ] }, { "cell_type": "markdown", "id": "898b574b", "metadata": {}, "source": [ "We can now create a memory object, which is neccessary to track the inputs/outputs and hold a conversation." ] }, { "cell_type": "code", "execution_count": 4, "id": "af803fee", "metadata": {}, "outputs": [], "source": [ "from langchain.memory import ConversationBufferMemory\n", "memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)" ] }, { "cell_type": "markdown", "id": "3c96b118", "metadata": {}, "source": [ "We now initialize the `ConversationalRetrievalChain`" ] }, { "cell_type": "code", "execution_count": 5, "id": "7b4110f3", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "openai_api_key = os.environ['OPENAI_API_KEY']\n", "llm = OpenAI(openai_api_key=openai_api_key, temperature=0)\n", "retriever = VectaraRetriever(vectorstore, alpha=0.025, k=5, filter=None)\n", "\n", "print(type(vectorstore))\n", "d = retriever.get_relevant_documents('What did the president say about Ketanji Brown Jackson')\n", "\n", "qa = ConversationalRetrievalChain.from_llm(llm, retriever, memory=memory)" ] }, { "cell_type": "code", "execution_count": 6, "id": "e8ce4fe9", "metadata": {}, "outputs": [], "source": [ "query = \"What did the president say about Ketanji Brown Jackson\"\n", "result = qa({\"question\": query})" ] }, { "cell_type": "code", "execution_count": 7, "id": "4c79862b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\"" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result[\"answer\"]" ] }, { "cell_type": "code", "execution_count": 8, "id": "c697d9d1", "metadata": {}, "outputs": [], "source": [ "query = \"Did he mention who she suceeded\"\n", "result = qa({\"question\": query})" ] }, { "cell_type": "code", "execution_count": 9, "id": "ba0678f3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' Justice Stephen Breyer.'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result['answer']" ] }, { "cell_type": "markdown", "id": "b3308b01-5300-4999-8cd3-22f16dae757e", "metadata": {}, "source": [ "## Pass in chat history\n", "\n", "In the above example, we used a Memory object to track chat history. We can also just pass it in explicitly. In order to do this, we need to initialize a chain without any memory object." ] }, { "cell_type": "code", "execution_count": 10, "id": "1b41a10b-bf68-4689-8f00-9aed7675e2ab", "metadata": { "tags": [] }, "outputs": [], "source": [ "qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever())" ] }, { "cell_type": "markdown", "id": "83f38c18-ac82-45f4-a79e-8b37ce1ae115", "metadata": {}, "source": [ "Here's an example of asking a question with no chat history" ] }, { "cell_type": "code", "execution_count": 11, "id": "bc672290-8a8b-4828-a90c-f1bbdd6b3920", "metadata": { "tags": [] }, "outputs": [], "source": [ "chat_history = []\n", "query = \"What did the president say about Ketanji Brown Jackson\"\n", "result = qa({\"question\": query, \"chat_history\": chat_history})" ] }, { "cell_type": "code", "execution_count": 12, "id": "6b62d758-c069-4062-88f0-21e7ea4710bf", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\"" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result[\"answer\"]" ] }, { "cell_type": "markdown", "id": "8c26a83d-c945-4458-b54a-c6bd7f391303", "metadata": {}, "source": [ "Here's an example of asking a question with some chat history" ] }, { "cell_type": "code", "execution_count": 13, "id": "9c95460b-7116-4155-a9d2-c0fb027ee592", "metadata": { "tags": [] }, "outputs": [], "source": [ "chat_history = [(query, result[\"answer\"])]\n", "query = \"Did he mention who she suceeded\"\n", "result = qa({\"question\": query, \"chat_history\": chat_history})" ] }, { "cell_type": "code", "execution_count": 14, "id": "698ac00c-cadc-407f-9423-226b2d9258d0", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "' Justice Stephen Breyer.'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result['answer']" ] }, { "cell_type": "markdown", "id": "0eaadf0f", "metadata": {}, "source": [ "## Return Source Documents\n", "You can also easily return source documents from the ConversationalRetrievalChain. This is useful for when you want to inspect what documents were returned." ] }, { "cell_type": "code", "execution_count": 15, "id": "562769c6", "metadata": { "tags": [] }, "outputs": [], "source": [ "qa = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)" ] }, { "cell_type": "code", "execution_count": 16, "id": "ea478300", "metadata": { "tags": [] }, "outputs": [], "source": [ "chat_history = []\n", "query = \"What did the president say about Ketanji Brown Jackson\"\n", "result = qa({\"question\": query, \"chat_history\": chat_history})" ] }, { "cell_type": "code", "execution_count": 17, "id": "4cb75b4e", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "Document(page_content='Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice. A former federal public defender.', metadata={'source': '../../modules/state_of_the_union.txt'})" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result['source_documents'][0]" ] }, { "cell_type": "markdown", "id": "669ede2f-d69f-4960-8468-8a768ce1a55f", "metadata": {}, "source": [ "## ConversationalRetrievalChain with `search_distance`\n", "If you are using a vector store that supports filtering by search distance, you can add a threshold value parameter." ] }, { "cell_type": "code", "execution_count": 18, "id": "f4f32c6f-8e49-44af-9116-8830b1fcc5f2", "metadata": { "tags": [] }, "outputs": [], "source": [ "vectordbkwargs = {\"search_distance\": 0.9}" ] }, { "cell_type": "code", "execution_count": 19, "id": "1e251775-31e7-4679-b744-d4a57937f93a", "metadata": { "tags": [] }, "outputs": [], "source": [ "qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True)\n", "chat_history = []\n", "query = \"What did the president say about Ketanji Brown Jackson\"\n", "result = qa({\"question\": query, \"chat_history\": chat_history, \"vectordbkwargs\": vectordbkwargs})" ] }, { "cell_type": "markdown", "id": "99b96dae", "metadata": {}, "source": [ "## ConversationalRetrievalChain with `map_reduce`\n", "We can also use different types of combine document chains with the ConversationalRetrievalChain chain." ] }, { "cell_type": "code", "execution_count": 20, "id": "e53a9d66", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.chains import LLMChain\n", "from langchain.chains.question_answering import load_qa_chain\n", "from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT" ] }, { "cell_type": "code", "execution_count": 21, "id": "bf205e35", "metadata": { "tags": [] }, "outputs": [], "source": [ "question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n", "doc_chain = load_qa_chain(llm, chain_type=\"map_reduce\")\n", "\n", "chain = ConversationalRetrievalChain(\n", " retriever=vectorstore.as_retriever(),\n", " question_generator=question_generator,\n", " combine_docs_chain=doc_chain,\n", ")" ] }, { "cell_type": "code", "execution_count": 22, "id": "78155887", "metadata": { "tags": [] }, "outputs": [], "source": [ "chat_history = []\n", "query = \"What did the president say about Ketanji Brown Jackson\"\n", "result = chain({\"question\": query, \"chat_history\": chat_history})" ] }, { "cell_type": "code", "execution_count": 23, "id": "e54b5fa2", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "' The president did not mention Ketanji Brown Jackson.'" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result['answer']" ] }, { "cell_type": "markdown", "id": "a2fe6b14", "metadata": {}, "source": [ "## ConversationalRetrievalChain with Question Answering with sources\n", "\n", "You can also use this chain with the question answering with sources chain." ] }, { "cell_type": "code", "execution_count": 24, "id": "d1058fd2", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.chains.qa_with_sources import load_qa_with_sources_chain" ] }, { "cell_type": "code", "execution_count": 25, "id": "a6594482", "metadata": { "tags": [] }, "outputs": [], "source": [ "\n", "question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n", "doc_chain = load_qa_with_sources_chain(llm, chain_type=\"map_reduce\")\n", "\n", "chain = ConversationalRetrievalChain(\n", " retriever=vectorstore.as_retriever(),\n", " question_generator=question_generator,\n", " combine_docs_chain=doc_chain,\n", ")" ] }, { "cell_type": "code", "execution_count": 26, "id": "e2badd21", "metadata": { "tags": [] }, "outputs": [], "source": [ "chat_history = []\n", "query = \"What did the president say about Ketanji Brown Jackson\"\n", "result = chain({\"question\": query, \"chat_history\": chat_history})" ] }, { "cell_type": "code", "execution_count": 27, "id": "edb31fe5", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "' The president did not mention Ketanji Brown Jackson.\\nSOURCES: ../../modules/state_of_the_union.txt'" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result['answer']" ] }, { "cell_type": "markdown", "id": "2324cdc6-98bf-4708-b8cd-02a98b1e5b67", "metadata": {}, "source": [ "## ConversationalRetrievalChain with streaming to `stdout`\n", "\n", "Output from the chain will be streamed to `stdout` token by token in this example." ] }, { "cell_type": "code", "execution_count": 28, "id": "2efacec3-2690-4b05-8de3-a32fd2ac3911", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.chains.llm import LLMChain\n", "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n", "from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT\n", "from langchain.chains.question_answering import load_qa_chain\n", "\n", "# Construct a ConversationalRetrievalChain with a streaming llm for combine docs\n", "# and a separate, non-streaming llm for question generation\n", "llm = OpenAI(temperature=0, openai_api_key=openai_api_key)\n", "streaming_llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0, openai_api_key=openai_api_key)\n", "\n", "question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n", "doc_chain = load_qa_chain(streaming_llm, chain_type=\"stuff\", prompt=QA_PROMPT)\n", "\n", "qa = ConversationalRetrievalChain(\n", " retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator)" ] }, { "cell_type": "code", "execution_count": 29, "id": "fd6d43f4-7428-44a4-81bc-26fe88a98762", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender." ] } ], "source": [ "chat_history = []\n", "query = \"What did the president say about Ketanji Brown Jackson\"\n", "result = qa({\"question\": query, \"chat_history\": chat_history})" ] }, { "cell_type": "code", "execution_count": 30, "id": "5ab38978-f3e8-4fa7-808c-c79dec48379a", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Justice Stephen Breyer." ] } ], "source": [ "chat_history = [(query, result[\"answer\"])]\n", "query = \"Did he mention who she suceeded\"\n", "result = qa({\"question\": query, \"chat_history\": chat_history})\n" ] }, { "cell_type": "markdown", "id": "f793d56b", "metadata": {}, "source": [ "## get_chat_history Function\n", "You can also specify a `get_chat_history` function, which can be used to format the chat_history string." ] }, { "cell_type": "code", "execution_count": 31, "id": "a7ba9d8c", "metadata": { "tags": [] }, "outputs": [], "source": [ "def get_chat_history(inputs) -> str:\n", " res = []\n", " for human, ai in inputs:\n", " res.append(f\"Human:{human}\\nAI:{ai}\")\n", " return \"\\n\".join(res)\n", "qa = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), get_chat_history=get_chat_history)" ] }, { "cell_type": "code", "execution_count": 32, "id": "a3e33c0d", "metadata": { "tags": [] }, "outputs": [], "source": [ "chat_history = []\n", "query = \"What did the president say about Ketanji Brown Jackson\"\n", "result = qa({\"question\": query, \"chat_history\": chat_history})" ] }, { "cell_type": "code", "execution_count": 33, "id": "936dc62f", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\"" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result['answer']" ] }, { "cell_type": "code", "execution_count": null, "id": "b8c26901", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 5 }