langchain/docs/extras/integrations/toolkits/vectorstore.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "18ada398-dce6-4049-9b56-fc0ede63da9c",
   "metadata": {},
   "source": [
    "# Vectorstore\n",
    "\n",
    "This notebook showcases an agent designed to retrieve information from one or more vectorstores, either with or without sources."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eecb683b-3a46-4b9d-81a3-7caefbfec1a1",
   "metadata": {},
   "source": [
    "## Create Vectorstores"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9bfd0ed8-a5eb-443e-8e92-90be8cabb0a7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.vectorstores import Chroma\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain import OpenAI, VectorDBQA\n",
    "\n",
    "llm = OpenAI(temperature=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "345bb078-4ec1-4e3a-827b-cd238c49054d",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running Chroma using direct local API.\n",
      "Using DuckDB in-memory for database. Data will be transient.\n"
     ]
    }
   ],
   "source": [
    "from langchain.document_loaders import TextLoader\n",
    "\n",
    "loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
    "documents = loader.load()\n",
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "texts = text_splitter.split_documents(documents)\n",
    "\n",
    "embeddings = OpenAIEmbeddings()\n",
    "state_of_union_store = Chroma.from_documents(\n",
    "    texts, embeddings, collection_name=\"state-of-union\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "5f50eb82-e1a5-4252-8306-8ec1b478d9b4",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running Chroma using direct local API.\n",
      "Using DuckDB in-memory for database. Data will be transient.\n"
     ]
    }
   ],
   "source": [
    "from langchain.document_loaders import WebBaseLoader\n",
    "\n",
    "loader = WebBaseLoader(\"https://beta.ruff.rs/docs/faq/\")\n",
    "docs = loader.load()\n",
    "ruff_texts = text_splitter.split_documents(docs)\n",
    "ruff_store = Chroma.from_documents(ruff_texts, embeddings, collection_name=\"ruff\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4814175-964d-42f1-aa9d-22801ce1e912",
   "metadata": {},
   "source": [
    "## Initialize Toolkit and Agent\n",
    "\n",
    "First, we'll create an agent with a single vectorstore."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "5b3b3206",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.agents.agent_toolkits import (\n",
    "    create_vectorstore_agent,\n",
    "    VectorStoreToolkit,\n",
    "    VectorStoreInfo,\n",
    ")\n",
    "\n",
    "vectorstore_info = VectorStoreInfo(\n",
    "    name=\"state_of_union_address\",\n",
    "    description=\"the most recent state of the Union adress\",\n",
    "    vectorstore=state_of_union_store,\n",
    ")\n",
    "toolkit = VectorStoreToolkit(vectorstore_info=vectorstore_info)\n",
    "agent_executor = create_vectorstore_agent(llm=llm, toolkit=toolkit, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a38ad10",
   "metadata": {},
   "source": [
    "## Examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "3f2f455c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m I need to find the answer in the state of the union address\n",
      "Action: state_of_union_address\n",
      "Action Input: What did biden say about ketanji brown jackson\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3m Biden said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
      "Final Answer: Biden said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "\"Biden said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent_executor.run(\n",
    "    \"What did biden say about ketanji brown jackson in the state of the union address?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "d61e1e63",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m I need to use the state_of_union_address_with_sources tool to answer this question.\n",
      "Action: state_of_union_address_with_sources\n",
      "Action Input: What did biden say about ketanji brown jackson\u001b[0m\n",
      "Observation: \u001b[33;1m\u001b[1;3m{\"answer\": \" Biden said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to the United States Supreme Court, and that she is one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\\n\", \"sources\": \"../../state_of_the_union.txt\"}\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
      "Final Answer: Biden said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to the United States Supreme Court, and that she is one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence. Sources: ../../state_of_the_union.txt\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "\"Biden said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to the United States Supreme Court, and that she is one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence. Sources: ../../state_of_the_union.txt\""
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent_executor.run(\n",
    "    \"What did biden say about ketanji brown jackson in the state of the union address? List the source.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ca07707",
   "metadata": {},
   "source": [
    "## Multiple Vectorstores\n",
    "We can also easily use this initialize an agent with multiple vectorstores and use the agent to route between them. To do this. This agent is optimized for routing, so it is a different toolkit and initializer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "c3209fd3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.agents.agent_toolkits import (\n",
    "    create_vectorstore_router_agent,\n",
    "    VectorStoreRouterToolkit,\n",
    "    VectorStoreInfo,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "815c4f39-308d-4949-b992-1361036e6e09",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "ruff_vectorstore_info = VectorStoreInfo(\n",
    "    name=\"ruff\",\n",
    "    description=\"Information about the Ruff python linting library\",\n",
    "    vectorstore=ruff_store,\n",
    ")\n",
    "router_toolkit = VectorStoreRouterToolkit(\n",
    "    vectorstores=[vectorstore_info, ruff_vectorstore_info], llm=llm\n",
    ")\n",
    "agent_executor = create_vectorstore_router_agent(\n",
    "    llm=llm, toolkit=router_toolkit, verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71680984-edaf-4a63-90f5-94edbd263550",
   "metadata": {},
   "source": [
    "## Examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "3cd1bf3e-e3df-4e69-bbe1-71c64b1af947",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m I need to use the state_of_union_address tool to answer this question.\n",
      "Action: state_of_union_address\n",
      "Action Input: What did biden say about ketanji brown jackson\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3m Biden said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
      "Final Answer: Biden said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "\"Biden said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent_executor.run(\n",
    "    \"What did biden say about ketanji brown jackson in the state of the union address?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "c5998b8d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m I need to find out what tool ruff uses to run over Jupyter Notebooks\n",
      "Action: ruff\n",
      "Action Input: What tool does ruff use to run over Jupyter Notebooks?\u001b[0m\n",
      "Observation: \u001b[33;1m\u001b[1;3m Ruff is integrated into nbQA, a tool for running linters and code formatters over Jupyter Notebooks. After installing ruff and nbqa, you can run Ruff over a notebook like so: > nbqa ruff Untitled.html\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
      "Final Answer: Ruff is integrated into nbQA, a tool for running linters and code formatters over Jupyter Notebooks. After installing ruff and nbqa, you can run Ruff over a notebook like so: > nbqa ruff Untitled.html\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'Ruff is integrated into nbQA, a tool for running linters and code formatters over Jupyter Notebooks. After installing ruff and nbqa, you can run Ruff over a notebook like so: > nbqa ruff Untitled.html'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent_executor.run(\"What tool does ruff use to run over Jupyter Notebooks?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "744e9b51-fbd9-4778-b594-ea957d0f3467",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m I need to find out what tool ruff uses and if the president mentioned it in the state of the union.\n",
      "Action: ruff\n",
      "Action Input: What tool does ruff use to run over Jupyter Notebooks?\u001b[0m\n",
      "Observation: \u001b[33;1m\u001b[1;3m Ruff is integrated into nbQA, a tool for running linters and code formatters over Jupyter Notebooks. After installing ruff and nbqa, you can run Ruff over a notebook like so: > nbqa ruff Untitled.html\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I need to find out if the president mentioned nbQA in the state of the union.\n",
      "Action: state_of_union_address\n",
      "Action Input: Did the president mention nbQA in the state of the union?\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3m No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
      "Final Answer: No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'No, the president did not mention nbQA in the state of the union.'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent_executor.run(\n",
    "    \"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "92203aa9-f63a-4ce1-b562-fadf4474ad9d",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}