forked from Archives/langchain
377 lines
10 KiB
Plaintext
377 lines
10 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "134a0785",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Chat Vector DB\n",
|
|
"\n",
|
|
"This notebook goes over how to set up a chat model to chat with a vector database.\n",
|
|
"\n",
|
|
"This notebook is very similar to the example of using an LLM in the ConversationalRetrievalChain. The only differences here are (1) using a ChatModel, and (2) passing in a ChatPromptTemplate (optimized for chat models)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "70c4e529",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
|
"from langchain.vectorstores import Chroma\n",
|
|
"from langchain.text_splitter import CharacterTextSplitter\n",
|
|
"from langchain.chains import ConversationalRetrievalChain"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cdff94be",
|
|
"metadata": {},
|
|
"source": [
|
|
"Load in documents. You can replace this with a loader for whatever type of data you want"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "01c46e92",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.document_loaders import TextLoader\n",
|
|
"loader = TextLoader('../../state_of_the_union.txt')\n",
|
|
"documents = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e9be4779",
|
|
"metadata": {},
|
|
"source": [
|
|
"If you had multiple loaders that you wanted to combine, you do something like:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "433363a5",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# loaders = [....]\n",
|
|
"# docs = []\n",
|
|
"# for loader in loaders:\n",
|
|
"# docs.extend(loader.load())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "239475d2",
|
|
"metadata": {},
|
|
"source": [
|
|
"We now split the documents, create embeddings for them, and put them in a vectorstore. This allows us to do semantic search over them."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "a8930cf7",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Running Chroma using direct local API.\n",
|
|
"Using DuckDB in-memory for database. Data will be transient.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
|
"documents = text_splitter.split_documents(documents)\n",
|
|
"\n",
|
|
"embeddings = OpenAIEmbeddings()\n",
|
|
"vectorstore = Chroma.from_documents(documents, embeddings)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "18415aca",
|
|
"metadata": {},
|
|
"source": [
|
|
"We are now going to construct a prompt specifically designed for chat models."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "c8805230",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.chat_models import ChatOpenAI\n",
|
|
"from langchain.prompts.chat import (\n",
|
|
" ChatPromptTemplate,\n",
|
|
" SystemMessagePromptTemplate,\n",
|
|
" AIMessagePromptTemplate,\n",
|
|
" HumanMessagePromptTemplate,\n",
|
|
")\n",
|
|
"from langchain.schema import (\n",
|
|
" AIMessage,\n",
|
|
" HumanMessage,\n",
|
|
" SystemMessage\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "cc86c30e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"system_template=\"\"\"Use the following pieces of context to answer the users question. \n",
|
|
"If you don't know the answer, just say that you don't know, don't try to make up an answer.\n",
|
|
"----------------\n",
|
|
"{context}\"\"\"\n",
|
|
"messages = [\n",
|
|
" SystemMessagePromptTemplate.from_template(system_template),\n",
|
|
" HumanMessagePromptTemplate.from_template(\"{question}\")\n",
|
|
"]\n",
|
|
"prompt = ChatPromptTemplate.from_messages(messages)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3c96b118",
|
|
"metadata": {},
|
|
"source": [
|
|
"We now initialize the ConversationalRetrievalChain"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "7b4110f3",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"qa = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0), vectorstore,qa_prompt=prompt)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3872432d",
|
|
"metadata": {},
|
|
"source": [
|
|
"Here's an example of asking a question with no chat history"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "7fe3e730",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"chat_history = []\n",
|
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
|
"result = qa({\"question\": query, \"chat_history\": chat_history})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"id": "bfff9cc8",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"\"The President nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court. He described her as one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and a consensus builder. She has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
|
|
]
|
|
},
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"result[\"answer\"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9e46edf7",
|
|
"metadata": {},
|
|
"source": [
|
|
"Here's an example of asking a question with some chat history"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "00b4cf00",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"chat_history = [(query, result[\"answer\"])]\n",
|
|
"query = \"Did he mention who came before her\"\n",
|
|
"result = qa({\"question\": query, \"chat_history\": chat_history})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"id": "f01828d1",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"\"The President mentioned Circuit Court of Appeals Judge Ketanji Brown Jackson as the nominee for the United States Supreme Court. He described her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence. The President did not mention any specific sources of support for Judge Jackson, but he did note that advancing immigration reform is supported by everyone from labor unions to religious leaders to the U.S. Chamber of Commerce.\""
|
|
]
|
|
},
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"result['answer']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2324cdc6-98bf-4708-b8cd-02a98b1e5b67",
|
|
"metadata": {},
|
|
"source": [
|
|
"## ConversationalRetrievalChain with streaming to `stdout`\n",
|
|
"\n",
|
|
"Output from the chain will be streamed to `stdout` token by token in this example."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"id": "2efacec3-2690-4b05-8de3-a32fd2ac3911",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.chains.llm import LLMChain\n",
|
|
"from langchain.llms import OpenAI\n",
|
|
"from langchain.callbacks.base import CallbackManager\n",
|
|
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
|
"from langchain.chains.chat_index.prompts import CONDENSE_QUESTION_PROMPT\n",
|
|
"from langchain.chains.question_answering import load_qa_chain\n",
|
|
"\n",
|
|
"# Construct a ChatVectorDBChain with a streaming llm for combine docs\n",
|
|
"# and a separate, non-streaming llm for question generation\n",
|
|
"llm = OpenAI(temperature=0)\n",
|
|
"streaming_llm = ChatOpenAI(streaming=True, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), verbose=True, temperature=0)\n",
|
|
"\n",
|
|
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
|
|
"doc_chain = load_qa_chain(streaming_llm, chain_type=\"stuff\", prompt=prompt)\n",
|
|
"\n",
|
|
"qa = ConversationalRetrievalChain(retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"id": "fd6d43f4-7428-44a4-81bc-26fe88a98762",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"The President nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court. He described her as one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and a consensus builder. He also mentioned that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"chat_history = []\n",
|
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
|
"result = qa({\"question\": query, \"chat_history\": chat_history})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"id": "5ab38978-f3e8-4fa7-808c-c79dec48379a",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"The context does not provide information on who Ketanji Brown Jackson succeeded on the United States Supreme Court."
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"chat_history = [(query, result[\"answer\"])]\n",
|
|
"query = \"Did he mention who she suceeded\"\n",
|
|
"result = qa({\"question\": query, \"chat_history\": chat_history})\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "8e8d0055",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.1"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|