docs: anthropic qa quickstart (#18459)

2024-11-18 09:25:54 +00:00 · 2024-03-03 13:33:24 -08:00 · 2024-03-03 13:33:24 -08:00 · 74f3908182
commit 74f3908182
parent bc768a12ed
2 changed files with 644 additions and 874 deletions
--- a/docs/docs/use_cases/question_answering/quickstart.ipynb
+++ b/docs/docs/use_cases/question_answering/quickstart.ipynb
@ -1,874 +0,0 @@
 {
 "cells": [
  {
   "cell_type": "raw",
   "id": "34814bdb-d05b-4dd3-adf1-ca5779882d7e",
   "metadata": {},
   "source": [
    "---\n",
    "sidebar_position: 0\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86fc5bb2-017f-434e-8cd6-53ab214a5604",
   "metadata": {},
   "source": [
    "# Quickstart"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de913d6d-c57f-4927-82fe-18902a636861",
   "metadata": {},
   "source": [
    "[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/question_answering/quickstart.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5151afed",
   "metadata": {},
   "source": [
    "LangChain has a number of components designed to help build question-answering applications, and RAG applications more generally. To familiarize ourselves with these, we'll build a simple Q&A application over a text data source. Along the way we'll go over a typical Q&A architecture, discuss the relevant LangChain components, and highlight additional resources for more advanced Q&A techniques. We'll also see how LangSmith can help us trace and understand our application. LangSmith will become increasingly helpful as our application grows in complexity."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f25cbbd-0938-4e3d-87e4-17a204a03ffb",
   "metadata": {},
   "source": [
    "## Architecture\n",
    "We'll create a typical RAG application as outlined in the [Q&A introduction](/docs/use_cases/question_answering/), which has two main components:\n",
    "\n",
    "**Indexing**: a pipeline for ingesting data from a source and indexing it. *This usually happens offline.*\n",
    "\n",
    "**Retrieval and generation**: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.\n",
    "\n",
    "The full sequence from raw data to answer will look like:\n",
    "\n",
    "#### Indexing\n",
    "1. **Load**: First we need to load our data. We'll use [DocumentLoaders](/docs/modules/data_connection/document_loaders/) for this.\n",
    "2. **Split**: [Text splitters](/docs/modules/data_connection/document_transformers/) break large `Documents` into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.\n",
    "3. **Store**: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a [VectorStore](/docs/modules/data_connection/vectorstores/) and [Embeddings](/docs/modules/data_connection/text_embedding/) model.\n",
    "\n",
    "#### Retrieval and generation\n",
    "4. **Retrieve**: Given a user input, relevant splits are retrieved from storage using a [Retriever](/docs/modules/data_connection/retrievers/).\n",
    "5. **Generate**: A [ChatModel](/docs/modules/model_io/chat/) / [LLM](/docs/modules/model_io/llms/) produces an answer using a prompt that includes the question and the retrieved data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "487d8d79-5ee9-4aa4-9fdf-cd5f4303e099",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "### Dependencies\n",
    "\n",
    "We'll use an OpenAI chat model and embeddings and a Chroma vector store in this walkthrough, but everything shown here works with any [ChatModel](/docs/modules/model_io/chat/) or [LLM](/docs/modules/model_io/llms/), [Embeddings](/docs/modules/data_connection/text_embedding/), and [VectorStore](/docs/modules/data_connection/vectorstores/) or [Retriever](/docs/modules/data_connection/retrievers/). \n",
    "\n",
    "We'll use the following packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "28d272cd-4e31-40aa-bbb4-0be0a1f49a14",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-openai chromadb bs4"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51ef48de-70b6-4f43-8e0b-ab9b84c9c02a",
   "metadata": {},
   "source": [
    "We need to set environment variable `OPENAI_API_KEY`, which can be done directly or loaded from a `.env` file like so:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "143787ca-d8e6-4dc9-8281-4374f4d71720",
   "metadata": {},
   "outputs": [],
   "source": [
    "import getpass\n",
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
    "\n",
    "# import dotenv\n",
    "\n",
    "# dotenv.load_dotenv()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1665e740-ce01-4f09-b9ed-516db0bd326f",
   "metadata": {},
   "source": [
    "### LangSmith\n",
    "\n",
    "Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with [LangSmith](https://smith.langchain.com).\n",
    "\n",
    "Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07411adb-3722-4f65-ab7f-8f6f57663d11",
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
    "os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa6ba684-26cf-4860-904e-a4d51380c134",
   "metadata": {},
   "source": [
    "## Preview\n",
    "\n",
    "In this guide we'll build a QA app over the [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilian Weng, which allows us to ask questions about the contents of the post. \n",
    "\n",
    "We can create a simple indexing pipeline and RAG chain to do this in ~20 lines of code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d8a913b1-0eea-442a-8a64-ec73333f104b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import bs4\n",
    "from langchain import hub\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "from langchain_community.vectorstores import Chroma\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
    "from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "820244ae-74b4-4593-b392-822979dd91b8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load, chunk and index the contents of the blog.\n",
    "loader = WebBaseLoader(\n",
    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
    "    bs_kwargs=dict(\n",
    "        parse_only=bs4.SoupStrainer(\n",
    "            class_=(\"post-content\", \"post-title\", \"post-header\")\n",
    "        )\n",
    "    ),\n",
    ")\n",
    "docs = loader.load()\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
    "splits = text_splitter.split_documents(docs)\n",
    "vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())\n",
    "\n",
    "# Retrieve and generate using the relevant snippets of the blog.\n",
    "retriever = vectorstore.as_retriever()\n",
    "prompt = hub.pull(\"rlm/rag-prompt\")\n",
    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
    "\n",
    "\n",
    "def format_docs(docs):\n",
    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
    "\n",
    "\n",
    "rag_chain = (\n",
    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "0d3b0f36-7b56-49c0-8e40-a1aa9ebcbf24",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through prompting techniques like Chain of Thought or Tree of Thoughts, or by using task-specific instructions or human inputs. Task decomposition helps agents plan ahead and manage complicated tasks more effectively.'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rag_chain.invoke(\"What is Task Decomposition?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "7cb344e0-c423-400c-a079-964c08e07e32",
   "metadata": {},
   "outputs": [],
   "source": [
    "# cleanup\n",
    "vectorstore.delete_collection()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "639dc31a-7f16-40f6-ba2a-20e7c2ecfe60",
   "metadata": {},
   "source": [
    ":::tip\n",
    "\n",
    "Check out the [LangSmith trace](https://smith.langchain.com/public/1c6ca97e-445b-4d00-84b4-c7befcbc59fe/r) \n",
    "\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "842cf72d-abbc-468e-a2eb-022470347727",
   "metadata": {},
   "source": [
    "## Detailed walkthrough\n",
    "\n",
    "Let's go through the above code step-by-step to really understand what's going on."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba5daed6",
   "metadata": {},
   "source": [
    "## 1. Indexing: Load\n",
    "\n",
    "We need to first load the blog post contents. We can use [DocumentLoaders](/docs/modules/data_connection/document_loaders/) for this, which are objects that load in data from a source and return a list of [Documents](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html).  A `Document` is an object with some `page_content` (str) and `metadata` (dict).\n",
    "\n",
    "In this case we'll use the [WebBaseLoader](/docs/integrations/document_loaders/web_base), which uses `urllib` to load HTML from web URLs and `BeautifulSoup` to parse it to text. We can customize the HTML -> text parsing by passing in parameters to the `BeautifulSoup` parser via `bs_kwargs` (see [BeautifulSoup docs](https://beautiful-soup-4.readthedocs.io/en/latest/#beautifulsoup)). In this case only HTML tags with class \"post-content\", \"post-title\", or \"post-header\" are relevant, so we'll remove all others."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "cf4d5c72",
   "metadata": {},
   "outputs": [],
   "source": [
    "import bs4\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "\n",
    "# Only keep post title, headers, and content from the full HTML.\n",
    "bs4_strainer = bs4.SoupStrainer(class_=(\"post-title\", \"post-header\", \"post-content\"))\n",
    "loader = WebBaseLoader(\n",
    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
    "    bs_kwargs={\"parse_only\": bs4_strainer},\n",
    ")\n",
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "207f87a3-effa-4457-b013-6d233bc7a088",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "42824"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(docs[0].page_content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "52469796-5ce4-4c12-bd2a-a903872dac33",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "      LLM Powered Autonomous Agents\n",
      "    \n",
      "Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n",
      "\n",
      "\n",
      "Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\n",
      "Agent System Overview#\n",
      "In\n"
     ]
    }
   ],
   "source": [
    "print(docs[0].page_content[:500])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee5c6556-56be-4067-adbc-98b5aa19ef6e",
   "metadata": {},
   "source": [
    "### Go deeper\n",
    "`DocumentLoader`: Object that loads data from a source as list of `Documents`.\n",
    "- [Docs](/docs/modules/data_connection/document_loaders/): Detailed documentation on how to use `DocumentLoaders`.\n",
    "- [Integrations](/docs/integrations/document_loaders/): 160+ integrations to choose from.\n",
    "- [Interface](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.base.BaseLoader.html): API reference  for the base interface."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd2cc9a7",
   "metadata": {},
   "source": [
    "## 2. Indexing: Split\n",
    "\n",
    "Our loaded document is over 42k characters long. This is too long to fit in the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs. \n",
    "\n",
    "To handle this we'll split the `Document` into chunks for embedding and vector storage. This should help us retrieve only the most relevant bits of the blog post at run time.\n",
    "\n",
    "In this case we'll split our documents into chunks of 1000 characters with 200 characters of overlap between chunks. The overlap helps mitigate the possibility of separating a statement from important context related to it. We use the [RecursiveCharacterTextSplitter](/docs/modules/data_connection/document_transformers/recursive_text_splitter), which will recursively split the document using common separators like new lines until each chunk is the appropriate size. This is the recommended text splitter for generic text use cases.\n",
    "\n",
    "We set `add_start_index=True` so that the character index at which each split Document starts within the initial Document is preserved as metadata attribute \"start_index\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "4b11c01d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size=1000, chunk_overlap=200, add_start_index=True\n",
    ")\n",
    "all_splits = text_splitter.split_documents(docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "3741eb67-9caf-40f2-a001-62f49349bff5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "66"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(all_splits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "f868d0e5-5670-4d54-b562-f50265e907f4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "969"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(all_splits[0].page_content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "5c9e5f27-c8e3-4ca7-8a8e-45c5de2901cc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',\n",
       " 'start_index': 7056}"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_splits[10].metadata"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a33bd4d",
   "metadata": {},
   "source": [
    "### Go deeper\n",
    "\n",
    "`TextSplitter`: Object that splits a list of `Document`s into smaller chunks. Subclass of `DocumentTransformer`s.\n",
    "- Explore `Context-aware splitters`, which keep the location (\"context\") of each split in the original `Document`:\n",
    "    - [Markdown files](/docs/modules/data_connection/document_transformers/markdown_header_metadata)\n",
    "    - [Code (py or js)](/docs/integrations/document_loaders/source_code)\n",
    "    - [Scientific papers](/docs/integrations/document_loaders/grobid)\n",
    "- [Interface](https://api.python.langchain.com/en/latest/text_splitter/langchain_text_splitters.TextSplitter.html): API reference for the base interface.\n",
    "\n",
    "`DocumentTransformer`: Object that performs a transformation on a list of `Document`s.\n",
    "- [Docs](/docs/modules/data_connection/document_transformers/): Detailed documentation on how to use `DocumentTransformers`\n",
    "- [Integrations](/docs/integrations/document_transformers/)\n",
    "- [Interface](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.transformers.BaseDocumentTransformer.html): API reference for the base interface.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46547031-2352-4321-9970-d6ea27285c2e",
   "metadata": {},
   "source": [
    "## 3. Indexing: Store\n",
    "\n",
    "Now we need to index our 66 text chunks so that we can search over them at runtime. The most common way to do this is to embed the contents of each document split and insert these embeddings into a vector database (or vector store). When we want to search over our splits, we take a text search query, embed it, and perform some sort of \"similarity\" search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is cosine similarity — we measure the cosine of the angle between each pair of embeddings (which are high dimensional vectors).\n",
    "\n",
    "We can embed and store all of our document splits in a single command using the [Chroma](/docs/integrations/vectorstores/chroma) vector store and [OpenAIEmbeddings](/docs/integrations/text_embedding/openai) model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "e9c302c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.vectorstores import Chroma\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "\n",
    "vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc6f22b0",
   "metadata": {},
   "source": [
    "### Go deeper\n",
    "`Embeddings`: Wrapper around a text embedding model, used for converting text to embeddings.\n",
    "- [Docs](/docs/modules/data_connection/text_embedding): Detailed documentation on how to use embeddings.\n",
    "- [Integrations](/docs/integrations/text_embedding/): 30+ integrations to choose from.\n",
    "- [Interface](https://api.python.langchain.com/en/latest/embeddings/langchain_core.embeddings.Embeddings.html): API reference for the base interface.\n",
    "\n",
    "`VectorStore`: Wrapper around a vector database, used for storing and querying embeddings.\n",
    "- [Docs](/docs/modules/data_connection/vectorstores/): Detailed documentation on how to use vector stores.\n",
    "- [Integrations](/docs/integrations/vectorstores/): 40+ integrations to choose from.\n",
    "- [Interface](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.vectorstores.VectorStore.html): API reference for the base interface.\n",
    "\n",
    "This completes the **Indexing** portion of the pipeline. At this point we have a query-able vector store containing the chunked contents of our blog post. Given a user question, we should ideally be able to return the snippets of the blog post that answer the question."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70d64d40-e475-43d9-b64c-925922bb5ef7",
   "metadata": {},
   "source": [
    "## 4. Retrieval and Generation: Retrieve\n",
    "\n",
    "Now let's write the actual application logic. We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and returns an answer.\n",
    "\n",
    "First we need to define our logic for searching over documents. LangChain defines a [Retriever](/docs/modules/data_connection/retrievers/) interface which wraps an index that can return relevant `Documents` given a string query.\n",
    "\n",
    "The most common type of `Retriever` is the [VectorStoreRetriever](/docs/modules/data_connection/retrievers/vectorstore), which uses the similarity search capabilities of a vector store to facilitate retrieval. Any `VectorStore` can easily be turned into a `Retriever` with `VectorStore.as_retriever()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "4414df0d-5d43-46d0-85a9-5f47be0dd099",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = vectorstore.as_retriever(search_type=\"similarity\", search_kwargs={\"k\": 6})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "e2c26b7d",
   "metadata": {},
   "outputs": [],
   "source": [
    "retrieved_docs = retriever.invoke(\"What are the approaches to Task Decomposition?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "8684291d-0f5e-453a-8d3e-ff9feea765d0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(retrieved_docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "9a5dc074-816d-409a-b005-ab4eddfd76af",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\n",
      "Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\n"
     ]
    }
   ],
   "source": [
    "print(retrieved_docs[0].page_content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d5a113b",
   "metadata": {},
   "source": [
    "### Go deeper\n",
    "Vector stores are commonly used for retrieval, but there are other ways to do retrieval, too.\n",
    "\n",
    "`Retriever`: An object that returns `Document`s given a text query\n",
    "\n",
    "- [Docs](/docs/modules/data_connection/retrievers/): Further documentation on the interface and built-in retrieval techniques. Some of which include:\n",
    "    - `MultiQueryRetriever` [generates variants of the input question](/docs/modules/data_connection/retrievers/MultiQueryRetriever) to improve retrieval hit rate.\n",
    "    - `MultiVectorRetriever` (diagram below) instead generates [variants of the embeddings](/docs/modules/data_connection/retrievers/multi_vector), also in order to improve retrieval hit rate.\n",
    "    - `Max marginal relevance` selects for [relevance and diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf) among the retrieved documents to avoid passing in duplicate context.\n",
    "    - Documents can be filtered during vector store retrieval using metadata filters, such as with a [Self Query Retriever](/docs/modules/data_connection/retrievers/self_query).\n",
    "- [Integrations](/docs/integrations/retrievers/): Integrations with retrieval services.\n",
    "- [Interface](https://api.python.langchain.com/en/latest/retrievers/langchain_core.retrievers.BaseRetriever.html): API reference for the base interface."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "415d6824",
   "metadata": {},
   "source": [
    "## 5. Retrieval and Generation: Generate\n",
    "\n",
    "Let's put it all together into a chain that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output.\n",
    "\n",
    "We'll use the gpt-3.5-turbo OpenAI chat model, but any LangChain `LLM` or `ChatModel` could be substituted in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "d34d998c-9abf-4e01-a4ad-06dadfcf131c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc826723-36fc-45d1-a3ef-df8c2c8471a8",
   "metadata": {},
   "source": [
    "We'll use a prompt for RAG that is checked into the LangChain prompt hub ([here](https://smith.langchain.com/hub/rlm/rag-prompt))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "bede955b-9aeb-4fd3-964d-8e43f214ce70",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain import hub\n",
    "\n",
    "prompt = hub.pull(\"rlm/rag-prompt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "11c35354-f275-47ec-9f72-ebd5c23731eb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[HumanMessage(content=\"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\\nQuestion: filler question \\nContext: filler context \\nAnswer:\")]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "example_messages = prompt.invoke(\n",
    "    {\"context\": \"filler context\", \"question\": \"filler question\"}\n",
    ").to_messages()\n",
    "example_messages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "2ccc50fa-5fa2-4f80-8685-58ec2255523a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\n",
      "Question: filler question \n",
      "Context: filler context \n",
      "Answer:\n"
     ]
    }
   ],
   "source": [
    "print(example_messages[0].content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51f9a210-1eee-4054-99d7-9d9ddf7e3593",
   "metadata": {},
   "source": [
    "We'll use the [LCEL Runnable](/docs/expression_language/) protocol to define the chain, allowing us to \n",
    "- pipe together components and functions in a transparent way\n",
    "- automatically trace our chain in LangSmith\n",
    "- get streaming, async, and batched calling out of the box"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "99fa1aec",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
    "\n",
    "\n",
    "def format_docs(docs):\n",
    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
    "\n",
    "\n",
    "rag_chain = (\n",
    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "8655a152-d7cf-466f-b1bc-fbff9ae2b889",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks, allowing for easier interpretation and execution by autonomous agents or models. Task decomposition can be done through various methods, such as using prompting techniques, task-specific instructions, or human inputs."
     ]
    }
   ],
   "source": [
    "for chunk in rag_chain.stream(\"What is Task Decomposition?\"):\n",
    "    print(chunk, end=\"\", flush=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c000e5f-2b7f-4eb9-8876-9f4b186b4a08",
   "metadata": {},
   "source": [
    ":::tip\n",
    "\n",
    "Check out the [LangSmith trace](https://smith.langchain.com/public/1799e8db-8a6d-4eb2-84d5-46e8d7d5a99b/r) \n",
    "\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7d52c84",
   "metadata": {},
   "source": [
    "### Go deeper\n",
    "\n",
    "#### Choosing a model\n",
    "`ChatModel`: An LLM-backed chat model. Takes in a sequence of messages and returns a message.\n",
    "- [Docs](/docs/modules/model_io/chat/): Detailed documentation on \n",
    "- [Integrations](/docs/integrations/chat/): 25+ integrations to choose from.\n",
    "- [Interface](https://api.python.langchain.com/en/latest/language_models/langchain_core.language_models.chat_models.BaseChatModel.html): API reference for the base interface.\n",
    "\n",
    "`LLM`: A text-in-text-out LLM. Takes in a string and returns a string.\n",
    "- [Docs](/docs/modules/model_io/llms)\n",
    "- [Integrations](/docs/integrations/llms): 75+ integrations to choose from.\n",
    "- [Interface](https://api.python.langchain.com/en/latest/language_models/langchain_core.language_models.llms.BaseLLM.html): API reference for the base interface.\n",
    "\n",
    "See a guide on RAG with locally-running models [here](/docs/use_cases/question_answering/local_retrieval_qa)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa82f437",
   "metadata": {},
   "source": [
    "#### Customizing the prompt\n",
    "\n",
    "As shown above, we can load prompts (e.g., [this RAG prompt](https://smith.langchain.com/hub/rlm/rag-prompt)) from the prompt hub. The prompt can also be easily customized:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "e4fee704",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks, allowing for a more systematic and organized approach to problem-solving. Thanks for asking!'"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_core.prompts import PromptTemplate\n",
    "\n",
    "template = \"\"\"Use the following pieces of context to answer the question at the end.\n",
    "If you don't know the answer, just say that you don't know, don't try to make up an answer.\n",
    "Use three sentences maximum and keep the answer as concise as possible.\n",
    "Always say \"thanks for asking!\" at the end of the answer.\n",
    "\n",
    "{context}\n",
    "\n",
    "Question: {question}\n",
    "\n",
    "Helpful Answer:\"\"\"\n",
    "custom_rag_prompt = PromptTemplate.from_template(template)\n",
    "\n",
    "rag_chain = (\n",
    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
    "    | custom_rag_prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")\n",
    "\n",
    "rag_chain.invoke(\"What is Task Decomposition?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94b952e6-dc4b-415b-9cf3-1ad333e48366",
   "metadata": {},
   "source": [
    ":::tip\n",
    "\n",
    "Check out the [LangSmith trace](https://smith.langchain.com/public/da23c4d8-3b33-47fd-84df-a3a582eedf84/r) \n",
    "\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "580e18de-132d-4009-ba67-4aaf2c7717a2",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "\n",
    "That's a lot of content we've covered in a short amount of time. There's plenty of features, integrations, and extensions to explore in each of the above sections. Along from the **Go deeper** sources mentioned above, good next steps include:\n",
    "\n",
    "- [Return sources](/docs/use_cases/question_answering/sources): Learn how to return source documents\n",
    "- [Streaming](/docs/use_cases/question_answering/streaming): Learn how to stream outputs and intermediate steps\n",
    "- [Add chat history](/docs/use_cases/question_answering/chat_history): Learn how to add chat history to your app"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/docs/docs/use_cases/question_answering/quickstart.mdx
+++ b/docs/docs/use_cases/question_answering/quickstart.mdx
@ -0,0 +1,644 @@
 ---
 sidebar_position: 0
 title: Quickstart
 ---
 # Quickstart
 [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/question_answering/quickstart.ipynb)
 LangChain has a number of components designed to help build
 question-answering applications, and RAG applications more generally. To
 familiarize ourselves with these, we’ll build a simple Q&A application
 over a text data source. Along the way we’ll go over a typical Q&A
 architecture, discuss the relevant LangChain components, and highlight
 additional resources for more advanced Q&A techniques. We’ll also see
 how LangSmith can help us trace and understand our application.
 LangSmith will become increasingly helpful as our application grows in
 complexity.
 ## Architecture
 We’ll create a typical RAG application as outlined in the [Q&A
 introduction](../../../docs/use_cases/question_answering/), which has
 two main components:
 **Indexing**: a pipeline for ingesting data from a source and indexing
 it. *This usually happens offline.*
 **Retrieval and generation**: the actual RAG chain, which takes the user
 query at run time and retrieves the relevant data from the index, then
 passes that to the model.
 The full sequence from raw data to answer will look like:
 #### Indexing
 1.  **Load**: First we need to load our data. We’ll use
    [DocumentLoaders](../../../docs/modules/data_connection/document_loaders/)
    for this.
 2.  **Split**: [Text
    splitters](../../../docs/modules/data_connection/document_transformers/)
    break large `Documents` into smaller chunks. This is useful both for
    indexing data and for passing it in to a model, since large chunks
    are harder to search over and won’t fit in a model’s finite context
    window.
 3.  **Store**: We need somewhere to store and index our splits, so that
    they can later be searched over. This is often done using a
    [VectorStore](../../../docs/modules/data_connection/vectorstores/)
    and
    [Embeddings](../../../docs/modules/data_connection/text_embedding/)
    model.
 #### Retrieval and generation
 1.  **Retrieve**: Given a user input, relevant splits are retrieved from
    storage using a
    [Retriever](../../../docs/modules/data_connection/retrievers/).
 2.  **Generate**: A [ChatModel](../../../docs/modules/model_io/chat/) /
    [LLM](../../../docs/modules/model_io/llms/) produces an answer using
    a prompt that includes the question and the retrieved data
 ## Setup
 ### Dependencies
 We’ll use an OpenAI chat model and embeddings and a Chroma vector store
 in this walkthrough, but everything shown here works with any
 [ChatModel](../../../docs/modules/model_io/chat/) or
 [LLM](../../../docs/modules/model_io/llms/),
 [Embeddings](../../../docs/modules/data_connection/text_embedding/), and
 [VectorStore](../../../docs/modules/data_connection/vectorstores/) or
 [Retriever](../../../docs/modules/data_connection/retrievers/).
 We’ll use the following packages:
 ```python
 %pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-openai chromadb bs4
 ```
 We need to set environment variable `OPENAI_API_KEY`, which can be done
 directly or loaded from a `.env` file like so:
 ```python
 import getpass
 import os
 os.environ["OPENAI_API_KEY"] = getpass.getpass()
 # import dotenv
 # dotenv.load_dotenv()
 ```
 ### LangSmith
 Many of the applications you build with LangChain will contain multiple
 steps with multiple invocations of LLM calls. As these applications get
 more and more complex, it becomes crucial to be able to inspect what
 exactly is going on inside your chain or agent. The best way to do this
 is with [LangSmith](https://smith.langchain.com).
 Note that LangSmith is not needed, but it is helpful. If you do want to
 use LangSmith, after you sign up at the link above, make sure to set
 your environment variables to start logging traces:
 ```python
 os.environ["LANGCHAIN_TRACING_V2"] = "true"
 os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()
 ```
 ## Preview
 In this guide we’ll build a QA app over the [LLM Powered Autonomous
 Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post
 by Lilian Weng, which allows us to ask questions about the contents of
 the post.
 We can create a simple indexing pipeline and RAG chain to do this in ~20
 lines of code:
 ```python
 import bs4
 from langchain import hub
 from langchain_community.document_loaders import WebBaseLoader
 from langchain_community.vectorstores import Chroma
 from langchain_core.output_parsers import StrOutputParser
 from langchain_core.runnables import RunnablePassthrough
 from langchain_openai import ChatOpenAI, OpenAIEmbeddings
 from langchain_text_splitters import RecursiveCharacterTextSplitter
 ```
 ```python
 # Load, chunk and index the contents of the blog.
 loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
 )
 docs = loader.load()
 text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
 splits = text_splitter.split_documents(docs)
 vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
 # Retrieve and generate using the relevant snippets of the blog.
 retriever = vectorstore.as_retriever()
 prompt = hub.pull("rlm/rag-prompt")
 llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
 def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)
 rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
 )
 ```
 ```python
 rag_chain.invoke("What is Task Decomposition?")
 ```
 ``` text
 'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through prompting techniques like Chain of Thought or Tree of Thoughts, or by using task-specific instructions or human inputs. Task decomposition helps agents plan ahead and manage complicated tasks more effectively.'
 ```
 ```python
 # cleanup
 vectorstore.delete_collection()
 ```
 Check out the [LangSmith
 trace](https://smith.langchain.com/public/1c6ca97e-445b-4d00-84b4-c7befcbc59fe/r)
 ## Detailed walkthrough
 Let’s go through the above code step-by-step to really understand what’s
 going on.
 ## 1. Indexing: Load {#indexing-load}
 We need to first load the blog post contents. We can use
 [DocumentLoaders](../../../docs/modules/data_connection/document_loaders/)
 for this, which are objects that load in data from a source and return a
 list of
 [Documents](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html).
 A `Document` is an object with some `page_content` (str) and `metadata`
 (dict).
 In this case we’ll use the
 [WebBaseLoader](../../../docs/integrations/document_loaders/web_base),
 which uses `urllib` to load HTML from web URLs and `BeautifulSoup` to
 parse it to text. We can customize the HTML -\> text parsing by passing
 in parameters to the `BeautifulSoup` parser via `bs_kwargs` (see
 [BeautifulSoup
 docs](https://beautiful-soup-4.readthedocs.io/en/latest/#beautifulsoup)).
 In this case only HTML tags with class “post-content”, “post-title”, or
 “post-header” are relevant, so we’ll remove all others.
 ```python
 import bs4
 from langchain_community.document_loaders import WebBaseLoader
 # Only keep post title, headers, and content from the full HTML.
 bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
 loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
 )
 docs = loader.load()
 ```
 ```python
 len(docs[0].page_content)
 ```
 ``` text
 42824
 ```
 ```python
 print(docs[0].page_content[:500])
 ```
 ``` text
      LLM Powered Autonomous Agents
 Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng
 Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
 Agent System Overview#
 In
 ```
 ### Go deeper
 `DocumentLoader`: Object that loads data from a source as list of
 `Documents`. -
 [Docs](../../../docs/modules/data_connection/document_loaders/):
 Detailed documentation on how to use `DocumentLoaders`. -
 [Integrations](../../../docs/integrations/document_loaders/): 160+
 integrations to choose from. -
 [Interface](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.base.BaseLoader.html):
 API reference  for the base interface.
 ## 2. Indexing: Split {#indexing-split}
 Our loaded document is over 42k characters long. This is too long to fit
 in the context window of many models. Even for those models that could
 fit the full post in their context window, models can struggle to find
 information in very long inputs.
 To handle this we’ll split the `Document` into chunks for embedding and
 vector storage. This should help us retrieve only the most relevant bits
 of the blog post at run time.
 In this case we’ll split our documents into chunks of 1000 characters
 with 200 characters of overlap between chunks. The overlap helps
 mitigate the possibility of separating a statement from important
 context related to it. We use the
 [RecursiveCharacterTextSplitter](../../../docs/modules/data_connection/document_transformers/recursive_text_splitter),
 which will recursively split the document using common separators like
 new lines until each chunk is the appropriate size. This is the
 recommended text splitter for generic text use cases.
 We set `add_start_index=True` so that the character index at which each
 split Document starts within the initial Document is preserved as
 metadata attribute “start_index”.
 ```python
 from langchain_text_splitters import RecursiveCharacterTextSplitter
 text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
 )
 all_splits = text_splitter.split_documents(docs)
 ```
 ```python
 len(all_splits)
 ```
 ``` text
 66
 ```
 ```python
 len(all_splits[0].page_content)
 ```
 ``` text
 969
 ```
 ```python
 all_splits[10].metadata
 ```
 ``` text
 {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'start_index': 7056}
 ```
 ### Go deeper
 `TextSplitter`: Object that splits a list of `Document`s into smaller
 chunks. Subclass of `DocumentTransformer`s. - Explore
 `Context-aware splitters`, which keep the location (“context”) of each
 split in the original `Document`: - [Markdown
 files](../../../docs/modules/data_connection/document_transformers/markdown_header_metadata) -
 [Code (py or
 js)](../../../docs/integrations/document_loaders/source_code) -
 [Scientific
 papers](../../../docs/integrations/document_loaders/grobid) -
 [Interface](https://api.python.langchain.com/en/latest/text_splitter/langchain_text_splitters.TextSplitter.html):
 API reference for the base interface.
 `DocumentTransformer`: Object that performs a transformation on a list
 of `Document`s. -
 [Docs](../../../docs/modules/data_connection/document_transformers/):
 Detailed documentation on how to use `DocumentTransformers` -
 [Integrations](../../../docs/integrations/document_transformers/) -
 [Interface](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.transformers.BaseDocumentTransformer.html):
 API reference for the base interface.
 ## 3. Indexing: Store {#indexing-store}
 Now we need to index our 66 text chunks so that we can search over them
 at runtime. The most common way to do this is to embed the contents of
 each document split and insert these embeddings into a vector database
 (or vector store). When we want to search over our splits, we take a
 text search query, embed it, and perform some sort of “similarity”
 search to identify the stored splits with the most similar embeddings to
 our query embedding. The simplest similarity measure is cosine
 similarity — we measure the cosine of the angle between each pair of
 embeddings (which are high dimensional vectors).
 We can embed and store all of our document splits in a single command
 using the [Chroma](../../../docs/integrations/vectorstores/chroma)
 vector store and
 [OpenAIEmbeddings](../../../docs/integrations/text_embedding/openai)
 model.
 ```python
 from langchain_community.vectorstores import Chroma
 from langchain_openai import OpenAIEmbeddings
 vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())
 ```
 ### Go deeper
 `Embeddings`: Wrapper around a text embedding model, used for converting
 text to embeddings. -
 [Docs](../../../docs/modules/data_connection/text_embedding): Detailed
 documentation on how to use embeddings. -
 [Integrations](../../../docs/integrations/text_embedding/): 30+
 integrations to choose from. -
 [Interface](https://api.python.langchain.com/en/latest/embeddings/langchain_core.embeddings.Embeddings.html):
 API reference for the base interface.
 `VectorStore`: Wrapper around a vector database, used for storing and
 querying embeddings. -
 [Docs](../../../docs/modules/data_connection/vectorstores/): Detailed
 documentation on how to use vector stores. -
 [Integrations](../../../docs/integrations/vectorstores/): 40+
 integrations to choose from. -
 [Interface](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.vectorstores.VectorStore.html):
 API reference for the base interface.
 This completes the **Indexing** portion of the pipeline. At this point
 we have a query-able vector store containing the chunked contents of our
 blog post. Given a user question, we should ideally be able to return
 the snippets of the blog post that answer the question.
 ## 4. Retrieval and Generation: Retrieve {#retrieval-and-generation-retrieve}
 Now let’s write the actual application logic. We want to create a simple
 application that takes a user question, searches for documents relevant
 to that question, passes the retrieved documents and initial question to
 a model, and returns an answer.
 First we need to define our logic for searching over documents.
 LangChain defines a
 [Retriever](../../../docs/modules/data_connection/retrievers/) interface
 which wraps an index that can return relevant `Documents` given a string
 query.
 The most common type of `Retriever` is the
 [VectorStoreRetriever](../../../docs/modules/data_connection/retrievers/vectorstore),
 which uses the similarity search capabilities of a vector store to
 facilitate retrieval. Any `VectorStore` can easily be turned into a
 `Retriever` with `VectorStore.as_retriever()`:
 ```python
 retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
 ```
 ```python
 retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")
 ```
 ```python
 len(retrieved_docs)
 ```
 ``` text
 6
 ```
 ```python
 print(retrieved_docs[0].page_content)
 ```
 ``` text
 Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
 Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
 ```
 ### Go deeper
 Vector stores are commonly used for retrieval, but there are other ways
 to do retrieval, too.
 `Retriever`: An object that returns `Document`s given a text query
 -   [Docs](../../../docs/modules/data_connection/retrievers/): Further
    documentation on the interface and built-in retrieval techniques.
    Some of which include:
    -   `MultiQueryRetriever` [generates variants of the input
        question](../../../docs/modules/data_connection/retrievers/MultiQueryRetriever)
        to improve retrieval hit rate.
    -   `MultiVectorRetriever` (diagram below) instead generates
        [variants of the
        embeddings](../../../docs/modules/data_connection/retrievers/multi_vector),
        also in order to improve retrieval hit rate.
    -   `Max marginal relevance` selects for [relevance and
        diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf)
        among the retrieved documents to avoid passing in duplicate
        context.
    -   Documents can be filtered during vector store retrieval using
        metadata filters, such as with a [Self Query
        Retriever](../../../docs/modules/data_connection/retrievers/self_query).
 -   [Integrations](../../../docs/integrations/retrievers/): Integrations
    with retrieval services.
 -   [Interface](https://api.python.langchain.com/en/latest/retrievers/langchain_core.retrievers.BaseRetriever.html):
    API reference for the base interface.
 ## 5. Retrieval and Generation: Generate {#retrieval-and-generation-generate}
 Let’s put it all together into a chain that takes a question, retrieves
 relevant documents, constructs a prompt, passes that to a model, and
 parses the output.
 We’ll use the gpt-3.5-turbo OpenAI chat model, but any LangChain `LLM`
 or `ChatModel` could be substituted in.
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 <Tabs>
 <TabItem value="openai" label="OpenAI" default>
 ```python
 from langchain_openai import ChatOpenAI
 llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
 ```
 </TabItem>
 <TabItem value="local" label="Anthropic">
 ```python
 %pip install -qU langchain-anthropic
 ```
 ```python
 from langchain_anthropic import ChatAnthropic
 llm = ChatAnthropic(model="claude-2.1", temperature=0, max_tokens=1024)
 ```
 </TabItem>
 </Tabs>
 We’ll use a prompt for RAG that is checked into the LangChain prompt hub
 ([here](https://smith.langchain.com/hub/rlm/rag-prompt)).
 ```python
 from langchain import hub
 prompt = hub.pull("rlm/rag-prompt")
 ```
 ```python
 example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
 ).to_messages()
 example_messages
 ```
 ``` text
 [HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: filler question \nContext: filler context \nAnswer:")]
 ```
 ```python
 print(example_messages[0].content)
 ```
 ``` text
 You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
 Question: filler question 
 Context: filler context 
 Answer:
 ```
 We’ll use the [LCEL Runnable](../../../docs/expression_language/)
 protocol to define the chain, allowing us to - pipe together components
 and functions in a transparent way - automatically trace our chain in
 LangSmith - get streaming, async, and batched calling out of the box
 ```python
 from langchain_core.output_parsers import StrOutputParser
 from langchain_core.runnables import RunnablePassthrough
 def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)
 rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
 )
 ```
 ```python
 for chunk in rag_chain.stream("What is Task Decomposition?"):
    print(chunk, end="", flush=True)
 ```
 ``` text
 Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks, allowing for easier interpretation and execution by autonomous agents or models. Task decomposition can be done through various methods, such as using prompting techniques, task-specific instructions, or human inputs.
 ```
 Check out the [LangSmith
 trace](https://smith.langchain.com/public/1799e8db-8a6d-4eb2-84d5-46e8d7d5a99b/r)
 ### Go deeper
 #### Choosing a model
 `ChatModel`: An LLM-backed chat model. Takes in a sequence of messages
 and returns a message. - [Docs](../../../docs/modules/model_io/chat/):
 Detailed documentation on -
 [Integrations](../../../docs/integrations/chat/): 25+ integrations to
 choose from. -
 [Interface](https://api.python.langchain.com/en/latest/language_models/langchain_core.language_models.chat_models.BaseChatModel.html):
 API reference for the base interface.
 `LLM`: A text-in-text-out LLM. Takes in a string and returns a string. -
 [Docs](../../../docs/modules/model_io/llms) -
 [Integrations](../../../docs/integrations/llms): 75+ integrations to
 choose from. -
 [Interface](https://api.python.langchain.com/en/latest/language_models/langchain_core.language_models.llms.BaseLLM.html):
 API reference for the base interface.
 See a guide on RAG with locally-running models
 [here](../../../docs/use_cases/question_answering/local_retrieval_qa).
 #### Customizing the prompt
 As shown above, we can load prompts (e.g., [this RAG
 prompt](https://smith.langchain.com/hub/rlm/rag-prompt)) from the prompt
 hub. The prompt can also be easily customized:
 ```python
 from langchain_core.prompts import PromptTemplate
 template = """Use the following pieces of context to answer the question at the end.
 If you don't know the answer, just say that you don't know, don't try to make up an answer.
 Use three sentences maximum and keep the answer as concise as possible.
 Always say "thanks for asking!" at the end of the answer.
 {context}
 Question: {question}
 Helpful Answer:"""
 custom_rag_prompt = PromptTemplate.from_template(template)
 rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
 )
 rag_chain.invoke("What is Task Decomposition?")
 ```
 ``` text
 'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks, allowing for a more systematic and organized approach to problem-solving. Thanks for asking!'
 ```
 Check out the [LangSmith
 trace](https://smith.langchain.com/public/da23c4d8-3b33-47fd-84df-a3a582eedf84/r)
 ## Next steps
 That’s a lot of content we’ve covered in a short amount of time. There’s
 plenty of features, integrations, and extensions to explore in each of
 the above sections. Along from the **Go deeper** sources mentioned
 above, good next steps include:
 -   [Return
    sources](../../../docs/use_cases/question_answering/sources): Learn
    how to return source documents
 -   [Streaming](../../../docs/use_cases/question_answering/streaming):
    Learn how to stream outputs and intermediate steps
 -   [Add chat
    history](../../../docs/use_cases/question_answering/chat_history):
    Learn how to add chat history to your app