diff --git a/docs/docs/use_cases/question_answering/code_understanding.ipynb b/docs/docs/use_cases/code_understanding.ipynb
similarity index 99%
rename from docs/docs/use_cases/question_answering/code_understanding.ipynb
rename to docs/docs/use_cases/code_understanding.ipynb
index f655417d59..8ae7542ede 100644
--- a/docs/docs/use_cases/question_answering/code_understanding.ipynb
+++ b/docs/docs/use_cases/code_understanding.ipynb
@@ -5,8 +5,7 @@
    "metadata": {},
    "source": [
     "---\n",
-    "sidebar_position: 1\n",
-    "title: RAG over code\n",
+    "title: Code understanding\n",
     "---"
    ]
   },
@@ -14,7 +13,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/question_answering/code_understanding.ipynb)\n",
+    "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/code_understanding.ipynb)\n",
     "\n",
     "## Use case\n",
     "\n",
@@ -24,7 +23,7 @@
     "- Using LLMs for suggesting refactors or improvements\n",
     "- Using LLMs for documenting the code\n",
     "\n",
-    "![Image description](../../../static/img/code_understanding.png)\n",
+    "![Image description](../../static/img/code_understanding.png)\n",
     "\n",
     "## Overview\n",
     "\n",
@@ -340,7 +339,7 @@
     "* In particular, the code well structured and kept together in the retrieval output\n",
     "* The retrieved code and chat history are passed to the LLM for answer distillation\n",
     "\n",
-    "![Image description](../../../static/img/code_retrieval.png)"
+    "![Image description](../../static/img/code_retrieval.png)"
    ]
   },
   {
diff --git a/docs/docs/use_cases/qa_structured/_category_.yml b/docs/docs/use_cases/qa_structured/_category_.yml
index 7597ebd377..a900f9ee01 100644
--- a/docs/docs/use_cases/qa_structured/_category_.yml
+++ b/docs/docs/use_cases/qa_structured/_category_.yml
@@ -1,3 +1,3 @@
-label: 'QA over structured data'
+label: 'Q&A over structured data'
 collapsed: false
-position: 0
+position: 0.1
diff --git a/docs/docs/use_cases/question_answering/_category_.yml b/docs/docs/use_cases/question_answering/_category_.yml
deleted file mode 100644
index 31bf0b211d..0000000000
--- a/docs/docs/use_cases/question_answering/_category_.yml
+++ /dev/null
@@ -1,2 +0,0 @@
-position: 0.1
-collapsed: true
diff --git a/docs/docs/use_cases/question_answering/chat_history.ipynb b/docs/docs/use_cases/question_answering/chat_history.ipynb
new file mode 100644
index 0000000000..0394bdc90f
--- /dev/null
+++ b/docs/docs/use_cases/question_answering/chat_history.ipynb
@@ -0,0 +1,391 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "id": "023635f2-71cf-43f2-a2e2-a7b4ced30a74",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "sidebar_position: 2\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86fc5bb2-017f-434e-8cd6-53ab214a5604",
+   "metadata": {},
+   "source": [
+    "# Add chat history\n",
+    "\n",
+    "In many Q&A applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of \"memory\" of past questions and answers, and some logic for incorporating those into its current thinking.\n",
+    "\n",
+    "In this guide we focus on **adding logic for incorporating historical messages, and NOT on chat history management.** Chat history management is [covered here](/docs/expression_language/how_to/message_history).\n",
+    "\n",
+    "We'll work off of the Q&A app we built over the [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilian Weng in the [Quickstart](/docs/use_cases/question_answering/quickstart). We'll need to update two things about our existing app:\n",
+    "\n",
+    "1. **Prompt**: Update our prompt to support historical messages as an input.\n",
+    "2. **Contextualizing questions**: Add a sub-chain that takes the latest user question and reformulates it in the context of the chat history. This is needed in case the latest question references some context from past messages. For example, if a user asks a follow-up question like \"Can you elaborate on the second point?\", this cannot be understood without the context of the previous message. Therefore we can't effectively perform retrieval with a question like this."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "487d8d79-5ee9-4aa4-9fdf-cd5f4303e099",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "### Dependencies\n",
+    "\n",
+    "We'll use an OpenAI chat model and embeddings and a Chroma vector store in this walkthrough, but everything shown here works with any [ChatModel](/docs/modules/model_io/chat/) or [LLM](/docs/modules/model_io/llms/), [Embeddings](/docs/modules/data_connection/text_embedding/), and [VectorStore](/docs/modules/data_connection/vectorstores/) or [Retriever](/docs/modules/data_connection/retrievers/). \n",
+    "\n",
+    "We'll use the following packages:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "28d272cd-4e31-40aa-bbb4-0be0a1f49a14",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install -U langchain langchain-community langchainhub openai chromadb bs4"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51ef48de-70b6-4f43-8e0b-ab9b84c9c02a",
+   "metadata": {},
+   "source": [
+    "We need to set environment variable `OPENAI_API_KEY`, which can be done directly or loaded from a `.env` file like so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "143787ca-d8e6-4dc9-8281-4374f4d71720",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import getpass\n",
+    "import os\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
+    "\n",
+    "# import dotenv\n",
+    "\n",
+    "# dotenv.load_dotenv()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1665e740-ce01-4f09-b9ed-516db0bd326f",
+   "metadata": {},
+   "source": [
+    "### LangSmith\n",
+    "\n",
+    "Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with [LangSmith](https://smith.langchain.com).\n",
+    "\n",
+    "Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "07411adb-3722-4f65-ab7f-8f6f57663d11",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
+    "os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa6ba684-26cf-4860-904e-a4d51380c134",
+   "metadata": {},
+   "source": [
+    "## Chain without chat history\n",
+    "\n",
+    "Here is the Q&A app we built over the [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilian Weng in the [Quickstart](/docs/use_cases/question_answering/quickstart):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "d8a913b1-0eea-442a-8a64-ec73333f104b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import bs4\n",
+    "from langchain import hub\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "from langchain_community.chat_models import ChatOpenAI\n",
+    "from langchain_community.document_loaders import WebBaseLoader\n",
+    "from langchain_community.embeddings import OpenAIEmbeddings\n",
+    "from langchain_community.vectorstores import Chroma\n",
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.runnables import RunnablePassthrough"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "820244ae-74b4-4593-b392-822979dd91b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load, chunk and index the contents of the blog.\n",
+    "loader = WebBaseLoader(\n",
+    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
+    "    bs_kwargs=dict(\n",
+    "        parse_only=bs4.SoupStrainer(\n",
+    "            class_=(\"post-content\", \"post-title\", \"post-header\")\n",
+    "        )\n",
+    "    ),\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
+    "splits = text_splitter.split_documents(docs)\n",
+    "vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())\n",
+    "\n",
+    "# Retrieve and generate using the relevant snippets of the blog.\n",
+    "retriever = vectorstore.as_retriever()\n",
+    "prompt = hub.pull(\"rlm/rag-prompt\")\n",
+    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
+    "\n",
+    "\n",
+    "def format_docs(docs):\n",
+    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
+    "\n",
+    "\n",
+    "rag_chain = (\n",
+    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
+    "    | prompt\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "0d3b0f36-7b56-49c0-8e40-a1aa9ebcbf24",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through prompting techniques like Chain of Thought or Tree of Thoughts, or by using task-specific instructions or human inputs. Task decomposition helps agents plan ahead and manage complicated tasks more effectively.'"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "rag_chain.invoke(\"What is Task Decomposition?\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "776ae958-cbdc-4471-8669-c6087436f0b5",
+   "metadata": {},
+   "source": [
+    "## Contextualizing the question\n",
+    "\n",
+    "First we'll need to define a sub-chain that takes historical messages and the latest user question, and reformulates the question if it makes reference to any information in the historical information.\n",
+    "\n",
+    "We'll use a prompt that includes a `MessagesPlaceholder` variable under the name \"chat_history\". This allows us to pass in a list of Messages to the prompt using the \"chat_history\" input key, and these messages will be inserted after the system message and before the human message containing the latest question."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "2b685428-8b82-4af1-be4f-7232c5d55b73",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
+    "\n",
+    "contextualize_q_system_prompt = \"\"\"Given a chat history and the latest user question \\\n",
+    "which might reference context in the chat history, formulate a standalone question \\\n",
+    "which can be understood without the chat history. Do NOT answer the question, \\\n",
+    "just reformulate it if needed and otherwise return it as is.\"\"\"\n",
+    "contextualize_q_prompt = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        (\"system\", contextualize_q_system_prompt),\n",
+    "        MessagesPlaceholder(variable_name=\"chat_history\"),\n",
+    "        (\"human\", \"{question}\"),\n",
+    "    ]\n",
+    ")\n",
+    "contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23cbd8d7-7162-4fb0-9e69-67ea4d4603a5",
+   "metadata": {},
+   "source": [
+    "Using this chain we can ask follow-up questions that reference past messages and have them reformulated into standalone questions:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "46ee9aa1-16f1-4509-8dae-f8c71f4ad47d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'What is the definition of \"large\" in the context of a language model?'"
+      ]
+     },
+     "execution_count": 29,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain_core.messages import AIMessage, HumanMessage\n",
+    "\n",
+    "contextualize_q_chain.invoke(\n",
+    "    {\n",
+    "        \"chat_history\": [\n",
+    "            HumanMessage(content=\"What does LLM stand for?\"),\n",
+    "            AIMessage(content=\"Large language model\"),\n",
+    "        ],\n",
+    "        \"question\": \"What is meant by large\",\n",
+    "    }\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42a47168-4a1f-4e39-bd2d-d5b03609a243",
+   "metadata": {},
+   "source": [
+    "## Chain with chat history\n",
+    "\n",
+    "And now we can build our full QA chain. \n",
+    "\n",
+    "Notice we add some routing functionality to only run the \"condense question chain\" when our chat history isn't empty. Here we're taking advantage of the fact that if a function in an LCEL chain returns another chain, that chain will itself be invoked."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "66f275f3-ddef-4678-b90d-ee64576878f9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "qa_system_prompt = \"\"\"You are an assistant for question-answering tasks. \\\n",
+    "Use the following pieces of retrieved context to answer the question. \\\n",
+    "If you don't know the answer, just say that you don't know. \\\n",
+    "Use three sentences maximum and keep the answer concise.\\\n",
+    "\n",
+    "{context}\"\"\"\n",
+    "qa_prompt = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        (\"system\", qa_system_prompt),\n",
+    "        MessagesPlaceholder(variable_name=\"chat_history\"),\n",
+    "        (\"human\", \"{question}\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "\n",
+    "def contextualized_question(input: dict):\n",
+    "    if input.get(\"chat_history\"):\n",
+    "        return contextualize_q_chain\n",
+    "    else:\n",
+    "        return input[\"question\"]\n",
+    "\n",
+    "\n",
+    "rag_chain = (\n",
+    "    RunnablePassthrough.assign(\n",
+    "        context=contextualized_question | retriever | format_docs\n",
+    "    )\n",
+    "    | qa_prompt\n",
+    "    | llm\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "51fd0e54-5bb4-4a9a-b012-87a18ebe2bef",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "AIMessage(content='Common ways of task decomposition include:\\n\\n1. Using Chain of Thought (CoT): CoT is a prompting technique that instructs the model to \"think step by step\" and decompose complex tasks into smaller and simpler steps. This approach utilizes more computation at test-time and sheds light on the model\\'s thinking process.\\n\\n2. Prompting with LLM: Language Model (LLM) can be used to prompt the model with simple instructions like \"Steps for XYZ\" or \"What are the subgoals for achieving XYZ?\" This method guides the model to break down the task into manageable steps.\\n\\n3. Task-specific instructions: For certain tasks, task-specific instructions can be provided to guide the model in decomposing the task. For example, for writing a novel, the instruction \"Write a story outline\" can be given to help the model break down the task into smaller components.\\n\\n4. Human inputs: In some cases, human inputs can be used to assist in task decomposition. Humans can provide insights, expertise, and domain knowledge to help break down complex tasks into smaller subtasks.\\n\\nThese approaches aim to simplify complex tasks and enable more effective problem-solving and planning.')"
+      ]
+     },
+     "execution_count": 31,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chat_history = []\n",
+    "\n",
+    "question = \"What is Task Decomposition?\"\n",
+    "ai_msg = rag_chain.invoke({\"question\": question, \"chat_history\": chat_history})\n",
+    "chat_history.extend([HumanMessage(content=question), ai_msg])\n",
+    "\n",
+    "second_question = \"What are common ways of doing it?\"\n",
+    "rag_chain.invoke({\"question\": second_question, \"chat_history\": chat_history})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53263a65-4de2-4dd8-9291-6a8169ab6f1d",
+   "metadata": {},
+   "source": [
+    ":::tip\n",
+    "\n",
+    "Check out the [LangSmith trace](https://smith.langchain.com/public/b3001782-bb30-476a-886b-12da17ec258f/r) \n",
+    "\n",
+    ":::"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fdf6c7e0-84f8-4747-b2ae-e84315152bd9",
+   "metadata": {},
+   "source": [
+    "Here we've gone over how to add application logic for incorporating historical outputs, but we're still manually updating the chat history and inserting it into each input. In a real Q&A application we'll want some way of persisting chat history and some way of automatically inserting and updating it.\n",
+    "\n",
+    "For this we can use:\n",
+    "- [BaseChatMessageHistory](/docs/modules/memory/chat_messages/): Store chat history.\n",
+    "- [RunnableWithMessageHistory](/docs/expression_language/how_to/message_history): Wrapper for an LCEL chain and a `BaseChatMessageHistory` that handles injecting chat history into inputs and updating it after each invocation.\n",
+    "\n",
+    "For a detailed walkthrough of how to use these classes together to create a stateful conversational chain, head to the [How to add message history (memory)](/docs/expression_language/how_to/message_history) LCEL page."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "poetry-venv",
+   "language": "python",
+   "name": "poetry-venv"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/docs/use_cases/question_answering/conversational_retrieval_agents.ipynb b/docs/docs/use_cases/question_answering/conversational_retrieval_agents.ipynb
index 46b0fa002f..5d0fb9f81d 100644
--- a/docs/docs/use_cases/question_answering/conversational_retrieval_agents.ipynb
+++ b/docs/docs/use_cases/question_answering/conversational_retrieval_agents.ipynb
@@ -5,13 +5,23 @@
    "id": "839f3c76",
    "metadata": {},
    "source": [
-    "# RAG with Agents\n",
+    "# Using agents\n",
     "\n",
     "This is an agent specifically optimized for doing retrieval when necessary and also holding a conversation.\n",
     "\n",
     "To start, we will set up the retriever we want to use, and then turn it into a retriever tool. Next, we will use the high level constructor for this type of agent. Finally, we will walk through how to construct a conversational retrieval agent from components."
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "756e6cc8-e268-4831-b707-b56537e405f7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -U langchain langchain-community langchainhub openai faiss-cpu"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "dc66a262",
@@ -29,9 +39,10 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from langchain.document_loaders import TextLoader\n",
+    "from langchain_community.document_loaders import TextLoader\n",
     "\n",
-    "loader = TextLoader(\"../../../../../docs/docs/modules/state_of_the_union.txt\")"
+    "loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
+    "documents = loader.load()"
    ]
   },
   {
@@ -41,11 +52,10 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from langchain.embeddings import OpenAIEmbeddings\n",
     "from langchain.text_splitter import CharacterTextSplitter\n",
-    "from langchain.vectorstores import FAISS\n",
+    "from langchain_community.embeddings import OpenAIEmbeddings\n",
+    "from langchain_community.vectorstores import FAISS\n",
     "\n",
-    "documents = loader.load()\n",
     "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
     "texts = text_splitter.split_documents(documents)\n",
     "embeddings = OpenAIEmbeddings()\n",
@@ -75,24 +85,16 @@
   {
    "cell_type": "code",
    "execution_count": 4,
-   "id": "9a82f72a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.agents.agent_toolkits import create_retriever_tool"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
    "id": "dfbd92a2",
    "metadata": {},
    "outputs": [],
    "source": [
+    "from langchain.tools.retriever import create_retriever_tool\n",
+    "\n",
     "tool = create_retriever_tool(\n",
     "    retriever,\n",
     "    \"search_state_of_union\",\n",
-    "    \"Searches and returns documents regarding the state-of-the-union.\",\n",
+    "    \"Searches and returns excerpts from the 2022 State of the Union.\",\n",
     ")\n",
     "tools = [tool]"
    ]
@@ -104,25 +106,42 @@
    "source": [
     "## Agent Constructor\n",
     "\n",
-    "Here, we will use the high level `create_conversational_retrieval_agent` API to construct the agent.\n",
+    "Here, we will use the high level `create_openai_tools_agent` API to construct the agent.\n",
     "\n",
     "Notice that beside the list of tools, the only thing we need to pass in is a language model to use.\n",
-    "Under the hood, this agent is using the OpenAIFunctionsAgent, so we need to use an ChatOpenAI model."
+    "Under the hood, this agent is using the OpenAI tool-calling capabilities, so we need to use a ChatOpenAI model."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 5,
    "id": "0cd147eb",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are a helpful assistant')),\n",
+       " MessagesPlaceholder(variable_name='chat_history', optional=True),\n",
+       " HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='{input}')),\n",
+       " MessagesPlaceholder(variable_name='agent_scratchpad')]"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
-    "from langchain.agents.agent_toolkits import create_conversational_retrieval_agent"
+    "from langchain import hub\n",
+    "\n",
+    "prompt = hub.pull(\"hwchase17/openai-tools-agent\")\n",
+    "prompt.messages"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 6,
    "id": "9fa4661b",
    "metadata": {},
    "outputs": [],
@@ -134,12 +153,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 7,
    "id": "da2ad764",
    "metadata": {},
    "outputs": [],
    "source": [
-    "agent_executor = create_conversational_retrieval_agent(llm, tools, verbose=True)"
+    "from langchain.agents import AgentExecutor, create_openai_tools_agent\n",
+    "\n",
+    "agent = create_openai_tools_agent(llm, tools, prompt)\n",
+    "agent_executor = AgentExecutor(agent=agent, tools=tools)"
    ]
   },
   {
@@ -152,30 +174,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 8,
    "id": "03059322",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "\n",
-      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
-      "\u001b[32;1m\u001b[1;3mHello Bob! How can I assist you today?\u001b[0m\n",
-      "\n",
-      "\u001b[1m> Finished chain.\u001b[0m\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "result = agent_executor({\"input\": \"hi, im bob\"})"
+    "result = agent_executor.invoke({\"input\": \"hi, im bob\"})"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 9,
    "id": "33073ff0",
    "metadata": {},
    "outputs": [
@@ -185,59 +194,7 @@
        "'Hello Bob! How can I assist you today?'"
       ]
      },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "result[\"output\"]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5f1f0b79",
-   "metadata": {},
-   "source": [
-    "Notice that it remembers your name"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "4ad92bc7",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "\n",
-      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
-      "\u001b[32;1m\u001b[1;3mYour name is Bob.\u001b[0m\n",
-      "\n",
-      "\u001b[1m> Finished chain.\u001b[0m\n"
-     ]
-    }
-   ],
-   "source": [
-    "result = agent_executor({\"input\": \"whats my name?\"})"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "7ae62ecd",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'Your name is Bob.'"
-      ]
-     },
-     "execution_count": 12,
+     "execution_count": 9,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -256,31 +213,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 12,
    "id": "6cd17d67",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "\n",
-      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
-      "\u001b[32;1m\u001b[1;3m\n",
-      "Invoking: `search_state_of_union` with `{'query': 'Kentaji Brown Jackson'}`\n",
-      "\n",
-      "\n",
-      "\u001b[0m\u001b[36;1m\u001b[1;3m[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../../../docs/docs/modules/state_of_the_union.txt'}), Document(page_content='One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \\n\\nWhen they came home, many of the world’s fittest and best trained warriors were never the same. \\n\\nHeadaches. Numbness. Dizziness. \\n\\nA cancer that would put them in a flag-draped coffin. \\n\\nI know. \\n\\nOne of those soldiers was my son Major Beau Biden. \\n\\nWe don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \\n\\nBut I’m committed to finding out everything we can. \\n\\nCommitted to military families like Danielle Robinson from Ohio. \\n\\nThe widow of Sergeant First Class Heath Robinson.  \\n\\nHe was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. \\n\\nStationed near Baghdad, just yards from burn pits the size of football fields. \\n\\nHeath’s widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter.', metadata={'source': '../../../../../docs/docs/modules/state_of_the_union.txt'}), Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  \\n\\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  \\n\\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../../../docs/docs/modules/state_of_the_union.txt'}), Document(page_content='We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \\n\\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \\n\\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \\n\\nOfficer Mora was 27 years old. \\n\\nOfficer Rivera was 22. \\n\\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \\n\\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \\n\\nI’ve worked on these issues a long time. \\n\\nI know what works: Investing in crime prevention and community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.', metadata={'source': '../../../../../docs/docs/modules/state_of_the_union.txt'})]\u001b[0m\u001b[32;1m\u001b[1;3mIn the most recent state of the union, the President mentioned Kentaji Brown Jackson. The President nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court. The President described Judge Ketanji Brown Jackson as one of our nation's top legal minds who will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
-      "\n",
-      "\u001b[1m> Finished chain.\u001b[0m\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "result = agent_executor(\n",
+    "result = agent_executor.invoke(\n",
     "    {\n",
-    "        \"input\": \"what did the president say about kentaji brown jackson in the most recent state of the union?\"\n",
+    "        \"input\": \"what did the president say about ketanji brown jackson in the most recent state of the union?\"\n",
     "    }\n",
     ")"
    ]
@@ -334,7 +274,9 @@
     }
    ],
    "source": [
-    "result = agent_executor({\"input\": \"how long ago did he nominate her?\"})"
+    "result = agent_executor.invoke(\n",
+    "    {\"input\": \"how long ago did the president nominate ketanji brown jackson?\"}\n",
+    ")"
    ]
   },
   {
@@ -360,213 +302,16 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e599dbd3",
-   "metadata": {},
-   "source": [
-    "## Creating from components\n",
-    "\n",
-    "What actually is going on underneath the hood? Let's take a look so we can understand how to modify going forward.\n",
-    "\n",
-    "There are a few components:\n",
-    "\n",
-    "- The memory\n",
-    "- The prompt template\n",
-    "- The agent\n",
-    "- The agent executor"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "1b21be1d",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# This is needed for both the memory and the prompt\n",
-    "memory_key = \"history\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f827f95f",
-   "metadata": {},
-   "source": [
-    "### The Memory\n",
-    "\n",
-    "In this example, we want the agent to remember not only previous conversations, but also previous intermediate steps. For that, we can use `AgentTokenBufferMemory`. Note that if you want to change whether the agent remembers intermediate steps, or how the long the buffer is, or anything like that you should change this part."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "id": "138b0675",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.agents.openai_functions_agent.agent_token_buffer_memory import (\n",
-    "    AgentTokenBufferMemory,\n",
-    ")\n",
-    "\n",
-    "memory = AgentTokenBufferMemory(memory_key=memory_key, llm=llm)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4827993f",
-   "metadata": {},
-   "source": [
-    "## The Prompt Template\n",
-    "\n",
-    "For the prompt template, we will use the `OpenAIFunctionsAgent` default way of creating one, but pass in a system prompt and a placeholder for memory."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "id": "779272dd",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.agents.openai_functions_agent.base import OpenAIFunctionsAgent\n",
-    "from langchain.prompts import MessagesPlaceholder\n",
-    "from langchain_core.messages import SystemMessage"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "id": "824bc74e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "system_message = SystemMessage(\n",
-    "    content=(\n",
-    "        \"Do your best to answer the questions. \"\n",
-    "        \"Feel free to use any tools available to look up \"\n",
-    "        \"relevant information, only if necessary\"\n",
-    "    )\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "id": "07e41722",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "prompt = OpenAIFunctionsAgent.create_prompt(\n",
-    "    system_message=system_message,\n",
-    "    extra_prompt_messages=[MessagesPlaceholder(variable_name=memory_key)],\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "001e455e",
-   "metadata": {},
-   "source": [
-    "## The Agent\n",
-    "\n",
-    "We will use the OpenAIFunctionsAgent"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "id": "adf4b5b3",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "agent = OpenAIFunctionsAgent(llm=llm, tools=tools, prompt=prompt)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2c5c321e",
-   "metadata": {},
-   "source": [
-    "## The Agent Executor\n",
-    "\n",
-    "Importantly, we pass in `return_intermediate_steps=True` since we are recording that with our memory object"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "id": "2e7ffe96",
+   "id": "51af2123-f63d-429a-aa58-fe24e3ce22a1",
    "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.agents import AgentExecutor"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "id": "e39a095f",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "agent_executor = AgentExecutor(\n",
-    "    agent=agent,\n",
-    "    tools=tools,\n",
-    "    memory=memory,\n",
-    "    verbose=True,\n",
-    "    return_intermediate_steps=True,\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "id": "96136958",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "\n",
-      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
-      "\u001b[32;1m\u001b[1;3mHello Bob! How can I assist you today?\u001b[0m\n",
-      "\n",
-      "\u001b[1m> Finished chain.\u001b[0m\n"
-     ]
-    }
-   ],
-   "source": [
-    "result = agent_executor({\"input\": \"hi, im bob\"})"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 26,
-   "id": "8de674cb",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "\n",
-      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
-      "\u001b[32;1m\u001b[1;3mYour name is Bob.\u001b[0m\n",
-      "\n",
-      "\u001b[1m> Finished chain.\u001b[0m\n"
-     ]
-    }
-   ],
    "source": [
-    "result = agent_executor({\"input\": \"whats my name\"})"
+    "For more on how to use agents with retrievers and other tools, head to the [Agents](/docs/modules/agents) section."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "bf655a48",
+   "id": "ada378d6-0616-4f61-923c-f5c1fbeb04a2",
    "metadata": {},
    "outputs": [],
    "source": []
@@ -574,9 +319,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "poetry-venv",
    "language": "python",
-   "name": "python3"
+   "name": "poetry-venv"
   },
   "language_info": {
    "codemirror_mode": {
diff --git a/docs/docs/use_cases/question_answering/document-context-aware-QA.ipynb b/docs/docs/use_cases/question_answering/document-context-aware-QA.ipynb
deleted file mode 100644
index 3f16c8dc1b..0000000000
--- a/docs/docs/use_cases/question_answering/document-context-aware-QA.ipynb
+++ /dev/null
@@ -1,340 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "88d7cc8c",
-   "metadata": {},
-   "source": [
-    "# Text splitting by header\n",
-    "\n",
-    "Text splitting for vector storage often uses sentences or other delimiters [to keep related text together](https://www.pinecone.io/learn/chunking-strategies/). \n",
-    "\n",
-    "But many documents (such as `Markdown` files) have structure (headers) that can be explicitly used in splitting. \n",
-    "\n",
-    "The `MarkdownHeaderTextSplitter` lets a user split `Markdown` files files based on specified headers. \n",
-    "\n",
-    "This results in chunks that retain the header(s) that it came from in the metadata.\n",
-    "\n",
-    "This works nicely w/ `SelfQueryRetriever`.\n",
-    "\n",
-    "First, tell the retriever about our splits.\n",
-    "\n",
-    "Then, query based on the doc structure (e.g., \"summarize the doc introduction\"). \n",
-    "\n",
-    "Chunks only from that section of the Document will be filtered and used in chat / Q+A.\n",
-    "\n",
-    "Let's test this out on an [example Notion page](https://rlancemartin.notion.site/Auto-Evaluation-of-Metadata-Filtering-18502448c85240828f33716740f9574b?pvs=4)!\n",
-    "\n",
-    "First, I download the page to Markdown as explained [here](https://python.langchain.com/docs/ecosystem/integrations/notion)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "2e587f65",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Load Notion page as a markdownfile file\n",
-    "from langchain.document_loaders import NotionDirectoryLoader\n",
-    "\n",
-    "path = \"../Notion_DB/\"\n",
-    "loader = NotionDirectoryLoader(path)\n",
-    "docs = loader.load()\n",
-    "md_file = docs[0].page_content"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "1cd3fd7e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Let's create groups based on the section headers in our page\n",
-    "from langchain.text_splitter import MarkdownHeaderTextSplitter\n",
-    "\n",
-    "headers_to_split_on = [\n",
-    "    (\"###\", \"Section\"),\n",
-    "]\n",
-    "markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)\n",
-    "md_header_splits = markdown_splitter.split_text(md_file)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4f73a609",
-   "metadata": {},
-   "source": [
-    "Now, perform text splitting on the header grouped documents. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 26,
-   "id": "7fbff95f",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Define our text splitter\n",
-    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
-    "\n",
-    "chunk_size = 500\n",
-    "chunk_overlap = 0\n",
-    "text_splitter = RecursiveCharacterTextSplitter(\n",
-    "    chunk_size=chunk_size, chunk_overlap=chunk_overlap\n",
-    ")\n",
-    "all_splits = text_splitter.split_documents(md_header_splits)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5bd72546",
-   "metadata": {},
-   "source": [
-    "This sets us up well do perform metadata filtering based on the document structure.\n",
-    "\n",
-    "Let's bring this all together by building a vectorstore first."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b050b4de",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "! pip install chromadb"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "id": "01d59c39",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Build vectorstore and keep the metadata\n",
-    "from langchain.embeddings import OpenAIEmbeddings\n",
-    "from langchain.vectorstores import Chroma\n",
-    "\n",
-    "vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "310346dd",
-   "metadata": {},
-   "source": [
-    "Let's create a `SelfQueryRetriever` that can filter based upon metadata we defined."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 28,
-   "id": "7fd4d283",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Create retriever\n",
-    "from langchain.chains.query_constructor.base import AttributeInfo\n",
-    "from langchain.llms import OpenAI\n",
-    "from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
-    "\n",
-    "# Define our metadata\n",
-    "metadata_field_info = [\n",
-    "    AttributeInfo(\n",
-    "        name=\"Section\",\n",
-    "        description=\"Part of the document that the text comes from\",\n",
-    "        type=\"string or list[string]\",\n",
-    "    ),\n",
-    "]\n",
-    "document_content_description = \"Major sections of the document\"\n",
-    "\n",
-    "# Define self query retriever\n",
-    "llm = OpenAI(temperature=0)\n",
-    "retriever = SelfQueryRetriever.from_llm(\n",
-    "    llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "218b9820",
-   "metadata": {},
-   "source": [
-    "We can see that we can query *only for texts* in the `Introduction` of the document!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 29,
-   "id": "d688db6e",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "query='Introduction' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='Section', value='Introduction') limit=None\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[Document(page_content='![Untitled](Auto-Evaluation%20of%20Metadata%20Filtering%2018502448c85240828f33716740f9574b/Untitled.png)', metadata={'Section': 'Introduction'}),\n",
-       " Document(page_content='Q+A systems often use a two-step approach: retrieve relevant text chunks and then synthesize them into an answer. There many ways to approach this. For example, we recently [discussed](https://blog.langchain.dev/auto-evaluation-of-anthropic-100k-context-window/) the Retriever-Less option (at bottom in the below diagram), highlighting the Anthropic 100k context window model. Metadata filtering is an alternative approach that pre-filters chunks based on a user-defined criteria in a VectorDB using', metadata={'Section': 'Introduction'}),\n",
-       " Document(page_content='metadata tags prior to semantic search.', metadata={'Section': 'Introduction'})]"
-      ]
-     },
-     "execution_count": 29,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Test\n",
-    "retriever.get_relevant_documents(\"Summarize the Introduction section of the document\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 29,
-   "id": "f8064987",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "query='Introduction' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='Section', value='Introduction') limit=None\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[Document(page_content='![Untitled](Auto-Evaluation%20of%20Metadata%20Filtering%2018502448c85240828f33716740f9574b/Untitled.png)', metadata={'Section': 'Introduction'}),\n",
-       " Document(page_content='Q+A systems often use a two-step approach: retrieve relevant text chunks and then synthesize them into an answer. There many ways to approach this. For example, we recently [discussed](https://blog.langchain.dev/auto-evaluation-of-anthropic-100k-context-window/) the Retriever-Less option (at bottom in the below diagram), highlighting the Anthropic 100k context window model. Metadata filtering is an alternative approach that pre-filters chunks based on a user-defined criteria in a VectorDB using', metadata={'Section': 'Introduction'}),\n",
-       " Document(page_content='metadata tags prior to semantic search.', metadata={'Section': 'Introduction'})]"
-      ]
-     },
-     "execution_count": 29,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Test\n",
-    "retriever.get_relevant_documents(\"Summarize the Introduction section of the document\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f35999b3",
-   "metadata": {},
-   "source": [
-    "We can also look at other parts of the document."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 30,
-   "id": "47929be4",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "query='Testing' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='Section', value='Testing') limit=None\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[Document(page_content='![Untitled](Auto-Evaluation%20of%20Metadata%20Filtering%2018502448c85240828f33716740f9574b/Untitled%202.png)', metadata={'Section': 'Testing'}),\n",
-       " Document(page_content='`SelfQueryRetriever` works well in [many cases](https://twitter.com/hwchase17/status/1656791488569954304/photo/1). For example, given [this test case](https://twitter.com/hwchase17/status/1656791488569954304?s=20):  \\n![Untitled](Auto-Evaluation%20of%20Metadata%20Filtering%2018502448c85240828f33716740f9574b/Untitled%201.png)  \\nThe query can be nicely broken up into semantic query and metadata filter:  \\n```python\\nsemantic query: \"prompt injection\"', metadata={'Section': 'Testing'}),\n",
-       " Document(page_content='Below, we can see detailed results from the app:  \\n- Kor extraction is above to perform the transformation between query and metadata format ✅\\n- Self-querying attempts to filter using the episode ID (`252`) in the query and fails 🚫\\n- Baseline returns docs from 3 different episodes (one from `252`), confusing the answer 🚫', metadata={'Section': 'Testing'}),\n",
-       " Document(page_content='will use in retrieval [here](https://github.com/langchain-ai/auto-evaluator/blob/main/streamlit/kor_retriever_lex.py).', metadata={'Section': 'Testing'})]"
-      ]
-     },
-     "execution_count": 30,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "retriever.get_relevant_documents(\"Summarize the Testing section of the document\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1af7720f",
-   "metadata": {},
-   "source": [
-    "Now, we can create chat or Q+A apps that are aware of the explicit document structure. \n",
-    "\n",
-    "The ability to retain document structure for metadata filtering can be helpful for complicated or longer documents."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 31,
-   "id": "565822a1",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "query='Testing' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='Section', value='Testing') limit=None\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "'The Testing section of the document describes the evaluation of the `SelfQueryRetriever` component in comparison to a baseline model. The evaluation was performed on a test case where the query was broken down into a semantic query and a metadata filter. The results showed that the `SelfQueryRetriever` component was able to perform the transformation between query and metadata format, but failed to filter using the episode ID in the query. The baseline model returned documents from three different episodes, which confused the answer. The `SelfQueryRetriever` component was deemed to work well in many cases and will be used in retrieval.'"
-      ]
-     },
-     "execution_count": 31,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from langchain.chains import RetrievalQA\n",
-    "from langchain.chat_models import ChatOpenAI\n",
-    "\n",
-    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
-    "qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever)\n",
-    "qa_chain.run(\"Summarize the Testing section of the document\")"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.1"
-  },
-  "vscode": {
-   "interpreter": {
-    "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
-   }
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/docs/docs/use_cases/question_answering/index.ipynb b/docs/docs/use_cases/question_answering/index.ipynb
index 0232cae24b..5bfbbd73e6 100644
--- a/docs/docs/use_cases/question_answering/index.ipynb
+++ b/docs/docs/use_cases/question_answering/index.ipynb
@@ -1,19 +1,22 @@
 {
  "cells": [
   {
-   "cell_type": "markdown",
-   "id": "86fc5bb2-017f-434e-8cd6-53ab214a5604",
+   "cell_type": "raw",
+   "id": "3434dfe3-cdd1-4715-b3ec-d2a5ca7b0b35",
    "metadata": {},
    "source": [
-    "# Retrieval-augmented generation (RAG)"
+    "---\n",
+    "sidebar_position: 0\n",
+    "collapsed: true\n",
+    "---"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "de913d6d-c57f-4927-82fe-18902a636861",
+   "id": "86fc5bb2-017f-434e-8cd6-53ab214a5604",
    "metadata": {},
    "source": [
-    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/question_answering/index.ipynb)"
+    "# Q&A with RAG"
    ]
   },
   {
@@ -23,20 +26,20 @@
    "source": [
     "## Overview\n",
     "\n",
+    "One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific source information. These applications use a technique known as Retrieval Augmented Generation, or RAG.\n",
+    "\n",
     "### What is RAG?\n",
     "\n",
-    "RAG is a technique for augmenting LLM knowledge with additional, often private or real-time, data.\n",
+    "RAG is a technique for augmenting LLM knowledge with additional data.\n",
     "\n",
     "LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG).\n",
     "\n",
-    "### What's in this guide?\n",
+    "LangChain has a number of components designed to help build Q&A applications, and RAG applications more generally. \n",
     "\n",
-    "LangChain has a number of components specifically designed to help build RAG applications. To familiarize ourselves with these, we'll build a simple question-answering application over a text data source. Specifically, we'll build a QA bot over the [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilian Weng. Along the way we'll go over a typical QA architecture, discuss the relevant LangChain components, and highlight additional resources for more advanced QA techniques. We'll also see how LangSmith can help us trace and understand our application. LangSmith will become increasingly helpful as our application grows in complexity.\n",
+    "**Note**: Here we focus on Q&A for unstructured data. Two RAG use cases which we cover elsewhere are:\n",
     "\n",
-    "**Note**\n",
-    "Here we focus on RAG for unstructured data. Two RAG use cases which we cover elsewhere are:\n",
-    "- [QA over structured data](/docs/use_cases/qa_structured/sql) (e.g., SQL)\n",
-    "- [QA over code](/docs/use_cases/question_answering/code_understanding) (e.g., Python)"
+    "- [Q&A over structured data](/docs/use_cases/qa_structured/sql) (e.g., SQL)\n",
+    "- [Q&A over code](/docs/use_cases/question_answering/code_understanding) (e.g., Python)"
    ]
   },
   {
@@ -44,7 +47,7 @@
    "id": "2f25cbbd-0938-4e3d-87e4-17a204a03ffb",
    "metadata": {},
    "source": [
-    "## Architecture\n",
+    "## RAG Architecture\n",
     "A typical RAG application has two main components:\n",
     "\n",
     "**Indexing**: a pipeline for ingesting data from a source and indexing it. *This usually happen offline.*\n",
@@ -54,7 +57,7 @@
     "The most common full sequence from raw data to answer looks like:\n",
     "\n",
     "#### Indexing\n",
-    "1. **Load**: First we need to load our data. We'll use [DocumentLoaders](/docs/modules/data_connection/document_loaders/) for this.\n",
+    "1. **Load**: First we need to load our data. This is done with [DocumentLoaders](/docs/modules/data_connection/document_loaders/).\n",
     "2. **Split**: [Text splitters](/docs/modules/data_connection/document_transformers/) break large `Documents` into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.\n",
     "3. **Store**: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a [VectorStore](/docs/modules/data_connection/vectorstores/) and [Embeddings](/docs/modules/data_connection/text_embedding/) model.\n",
     "\n",
@@ -69,1027 +72,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "487d8d79-5ee9-4aa4-9fdf-cd5f4303e099",
-   "metadata": {},
-   "source": [
-    "## Setup\n",
-    "\n",
-    "### Dependencies\n",
-    "\n",
-    "We'll use an OpenAI chat model and embeddings and a Chroma vector store in this walkthrough, but everything shown here works with any [ChatModel](/docs/integrations/chat/) or [LLM](/docs/integrations/llms/), [Embeddings](/docs/integrations/text_embedding/), and [VectorStore](/docs/integrations/vectorstores/) or [Retriever](/docs/integrations/retrievers). \n",
-    "\n",
-    "We'll use the following packages:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "28d272cd-4e31-40aa-bbb4-0be0a1f49a14",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!pip install -U langchain openai chromadb langchainhub bs4"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "51ef48de-70b6-4f43-8e0b-ab9b84c9c02a",
-   "metadata": {},
-   "source": [
-    "We need to set environment variable `OPENAI_API_KEY`, which can be done directly or loaded from a `.env` file like so:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "143787ca-d8e6-4dc9-8281-4374f4d71720",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import getpass\n",
-    "import os\n",
-    "\n",
-    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
-    "\n",
-    "# import dotenv\n",
-    "\n",
-    "# dotenv.load_dotenv()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1665e740-ce01-4f09-b9ed-516db0bd326f",
-   "metadata": {},
-   "source": [
-    "### LangSmith\n",
-    "\n",
-    "Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with [LangSmith](https://smith.langchain.com).\n",
-    "\n",
-    "Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "07411adb-3722-4f65-ab7f-8f6f57663d11",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
-    "os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fa6ba684-26cf-4860-904e-a4d51380c134",
-   "metadata": {},
-   "source": [
-    "## Quickstart\n",
-    "\n",
-    "Suppose we want to build a QA app over the [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilian Weng. We can create a simple pipeline for this in ~20 lines of code:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "d8a913b1-0eea-442a-8a64-ec73333f104b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import bs4\n",
-    "from langchain import hub\n",
-    "from langchain.chat_models import ChatOpenAI\n",
-    "from langchain.document_loaders import WebBaseLoader\n",
-    "from langchain.embeddings import OpenAIEmbeddings\n",
-    "from langchain.schema import StrOutputParser\n",
-    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
-    "from langchain.vectorstores import Chroma\n",
-    "from langchain_core.runnables import RunnablePassthrough"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "820244ae-74b4-4593-b392-822979dd91b8",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "loader = WebBaseLoader(\n",
-    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
-    "    bs_kwargs=dict(\n",
-    "        parse_only=bs4.SoupStrainer(\n",
-    "            class_=(\"post-content\", \"post-title\", \"post-header\")\n",
-    "        )\n",
-    "    ),\n",
-    ")\n",
-    "docs = loader.load()\n",
-    "\n",
-    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
-    "splits = text_splitter.split_documents(docs)\n",
-    "\n",
-    "vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())\n",
-    "retriever = vectorstore.as_retriever()\n",
-    "\n",
-    "prompt = hub.pull(\"rlm/rag-prompt\")\n",
-    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
-    "\n",
-    "\n",
-    "def format_docs(docs):\n",
-    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
-    "\n",
-    "\n",
-    "rag_chain = (\n",
-    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
-    "    | prompt\n",
-    "    | llm\n",
-    "    | StrOutputParser()\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "0d3b0f36-7b56-49c0-8e40-a1aa9ebcbf24",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through prompting techniques like Chain of Thought or Tree of Thoughts, or by using task-specific instructions or human inputs. Task decomposition helps agents plan ahead and manage complicated tasks more effectively.'"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "rag_chain.invoke(\"What is Task Decomposition?\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "7cb344e0-c423-400c-a079-964c08e07e32",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# cleanup\n",
-    "vectorstore.delete_collection()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "639dc31a-7f16-40f6-ba2a-20e7c2ecfe60",
-   "metadata": {},
-   "source": [
-    ":::tip\n",
-    "\n",
-    "Check out the [LangSmith trace](https://smith.langchain.com/public/1c6ca97e-445b-4d00-84b4-c7befcbc59fe/r) \n",
-    "\n",
-    ":::"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "842cf72d-abbc-468e-a2eb-022470347727",
-   "metadata": {},
-   "source": [
-    "## Detailed walkthrough\n",
-    "\n",
-    "Let's go through the above code step-by-step to really understand what's going on."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ba5daed6",
-   "metadata": {},
-   "source": [
-    "## Step 1. Load\n",
-    "\n",
-    "We need to first load the blog post contents. We can use `DocumentLoader`s for this, which are objects that load in data from a source as `Documents`.  A `Document` is an object with `page_content` (str) and `metadata` (dict) attributes. \n",
-    "\n",
-    "In this case we'll use the `WebBaseLoader`, which uses `urllib` and `BeautifulSoup` to load and parse the passed in web urls, returning one `Document` per url. We can customize the html -> text parsing by passing in parameters to the `BeautifulSoup` parser via `bs_kwargs` (see [BeautifulSoup docs](https://beautiful-soup-4.readthedocs.io/en/latest/#beautifulsoup)). In this case only HTML tags with class \"post-content\", \"post-title\", or \"post-header\" are relevant, so we'll remove all others."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "cf4d5c72",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.document_loaders import WebBaseLoader\n",
-    "\n",
-    "loader = WebBaseLoader(\n",
-    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
-    "    bs_kwargs={\n",
-    "        \"parse_only\": bs4.SoupStrainer(\n",
-    "            class_=(\"post-content\", \"post-title\", \"post-header\")\n",
-    "        )\n",
-    "    },\n",
-    ")\n",
-    "docs = loader.load()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "207f87a3-effa-4457-b013-6d233bc7a088",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "42824"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "len(docs[0].page_content)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "52469796-5ce4-4c12-bd2a-a903872dac33",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "\n",
-      "      LLM Powered Autonomous Agents\n",
-      "    \n",
-      "Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n",
-      "\n",
-      "\n",
-      "Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\n",
-      "Agent System Overview#\n",
-      "In\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(docs[0].page_content[:500])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ee5c6556-56be-4067-adbc-98b5aa19ef6e",
-   "metadata": {},
-   "source": [
-    "### Go deeper\n",
-    "`DocumentLoader`: Object that load data from a source as `Documents`.\n",
-    "- [Docs](/docs/modules/data_connection/document_loaders/): Further documentation on how to use `DocumentLoader`s.\n",
-    "- [Integrations](/docs/integrations/document_loaders/): Find the relevant `DocumentLoader` integration (of the > 160 of them) for your use case."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fd2cc9a7",
-   "metadata": {},
-   "source": [
-    "## Step 2. Split\n",
-    "\n",
-    "Our loaded document is over 42k characters long. This is too long to fit in the context window of many models. And even for those models that could fit the full post in their context window, empirically models struggle to find the relevant context in very long prompts. \n",
-    "\n",
-    "So we'll split the `Document` into chunks for embedding and vector storage. This should help us retrieve only the most relevant bits of the blog post at run time.\n",
-    "\n",
-    "In this case we'll split our documents into chunks of 1000 characters with 200 characters of overlap between chunks. The overlap helps mitigate the possibility of separating a statement from important context related to it. We use the `RecursiveCharacterTextSplitter`, which will (recursively) split the document using common separators (like new lines) until each chunk is the appropriate size."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "4b11c01d",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
-    "\n",
-    "text_splitter = RecursiveCharacterTextSplitter(\n",
-    "    chunk_size=1000, chunk_overlap=200, add_start_index=True\n",
-    ")\n",
-    "all_splits = text_splitter.split_documents(docs)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "3741eb67-9caf-40f2-a001-62f49349bff5",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "66"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "len(all_splits)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "f868d0e5-5670-4d54-b562-f50265e907f4",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "969"
-      ]
-     },
-     "execution_count": 12,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "len(all_splits[0].page_content)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "5c9e5f27-c8e3-4ca7-8a8e-45c5de2901cc",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',\n",
-       " 'start_index': 7056}"
-      ]
-     },
-     "execution_count": 13,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "all_splits[10].metadata"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0a33bd4d",
-   "metadata": {},
-   "source": [
-    "### Go deeper\n",
-    "\n",
-    "`DocumentSplitter`: Object that splits a list of `Document`s into smaller chunks. Subclass of `DocumentTransformer`s.\n",
-    "- Explore `Context-aware splitters`, which keep the location (\"context\") of each split in the original `Document`:\n",
-    "    - [Markdown files](/docs/use_cases/question_answering/document-context-aware-QA)\n",
-    "    - [Code (py or js)](docs/integrations/document_loaders/source_code)\n",
-    "    - [Scientific papers](/docs/integrations/document_loaders/grobid)\n",
-    "\n",
-    "`DocumentTransformer`: Object that performs a transformation on a list of `Document`s.\n",
-    "- [Docs](/docs/modules/data_connection/document_transformers/): Further documentation on how to use `DocumentTransformer`s\n",
-    "- [Integrations](/docs/integrations/document_transformers/)\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "46547031-2352-4321-9970-d6ea27285c2e",
-   "metadata": {},
-   "source": [
-    "## Step 3. Store\n",
-    "\n",
-    "Now that we've got 66 text chunks in memory, we need to store and index them so that we can search them later in our RAG app. The most common way to do this is to embed the contents of each document split and upload those embeddings to a vector store. \n",
-    "\n",
-    "Then, when we want to search over our splits, we take the search query, embed it as well, and perform some sort of \"similarity\" search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is cosine similarity — we measure the cosine of the angle between each pair of embeddings (which are just very high dimensional vectors).\n",
-    "\n",
-    "We can embed and store all of our document splits in a single command using the `Chroma` vector store and `OpenAIEmbeddings` model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "e9c302c8",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.embeddings import OpenAIEmbeddings\n",
-    "from langchain.vectorstores import Chroma\n",
-    "\n",
-    "vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dc6f22b0",
-   "metadata": {},
-   "source": [
-    "### Go deeper\n",
-    "`Embeddings`: Wrapper around a text embedding model, used for converting text to embeddings.\n",
-    "- [Docs](/docs/modules/data_connection/text_embedding): Further documentation on the interface.\n",
-    "- [Integrations](/docs/integrations/text_embedding/): Browse the > 30 text embedding integrations\n",
-    "\n",
-    "`VectorStore`: Wrapper around a vector database, used for storing and querying embeddings.\n",
-    "- [Docs](/docs/modules/data_connection/vectorstores/): Further documentation on the interface.\n",
-    "- [Integrations](/docs/integrations/vectorstores/): Browse the > 40 `VectorStore` integrations.\n",
-    "\n",
-    "This completes the **Indexing** portion of the pipeline. At this point we have an query-able vector store containing the chunked contents of our blog post. Given a user question, we should ideally be able to return the snippets of the blog post that answer the question:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "70d64d40-e475-43d9-b64c-925922bb5ef7",
-   "metadata": {},
-   "source": [
-    "## Step 4. Retrieve\n",
-    "\n",
-    "Now let's write the actual application logic. We want to create a simple application that let's the user ask a question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and finally returns an answer.\n",
-    "\n",
-    "LangChain defines a `Retriever` interface which wraps an index that can return relevant documents given a string query. All retrievers implement a common method `get_relevant_documents()` (and its asynchronous variant `aget_relevant_documents()`).\n",
-    "\n",
-    "The most common type of `Retriever` is the `VectorStoreRetriever`, which uses the similarity search capabilities of a vector store to facillitate retrieval. Any `VectorStore` can easily be turned into a `Retriever`:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "4414df0d-5d43-46d0-85a9-5f47be0dd099",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "retriever = vectorstore.as_retriever(search_type=\"similarity\", search_kwargs={\"k\": 6})"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "id": "e2c26b7d",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "retrieved_docs = retriever.get_relevant_documents(\n",
-    "    \"What are the approaches to Task Decomposition?\"\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "8684291d-0f5e-453a-8d3e-ff9feea765d0",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "6"
-      ]
-     },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "len(retrieved_docs)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "id": "9a5dc074-816d-409a-b005-ab4eddfd76af",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\n",
-      "Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(retrieved_docs[0].page_content)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5d5a113b",
-   "metadata": {},
-   "source": [
-    "### Go deeper\n",
-    "Vector stores are commonly used for retrieval, but there are plenty of other ways to do retrieval. \n",
-    "\n",
-    "`Retriever`: An object that returns `Document`s given a text query\n",
-    "- [Docs](/docs/modules/data_connection/retrievers/): Further documentation on the interface and built-in retrieval techniques. Some of which include:\n",
-    "    - `MultiQueryRetriever` [generates variants of the input question](/docs/modules/data_connection/retrievers/MultiQueryRetriever) to improve retrieval hit rate.\n",
-    "    - `MultiVectorRetriever` (diagram below) instead generates [variants of the embeddings](/docs/modules/data_connection/retrievers/multi_vector), also in order to improve retrieval hit rate.\n",
-    "    - `Max marginal relevance` selects for [relevance and diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf) among the retrieved documents to avoid passing in duplicate context.\n",
-    "    - Documents can be filtered during vector store retrieval using [`metadata` filters](/docs/use_cases/question_answering/document-context-aware-QA).\n",
-    "- [Integrations](/docs/integrations/retrievers/): Integrations with retrieval services."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "415d6824",
-   "metadata": {},
-   "source": [
-    "## Step 5. Generate\n",
-    "\n",
-    "Let's put it all together into a chain that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output.\n",
-    "\n",
-    "We'll use the gpt-3.5-turbo OpenAI chat model, but any LangChain `LLM` or `ChatModel` could be substituted in."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "id": "d34d998c-9abf-4e01-a4ad-06dadfcf131c",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.chat_models import ChatOpenAI\n",
-    "\n",
-    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bc826723-36fc-45d1-a3ef-df8c2c8471a8",
-   "metadata": {},
-   "source": [
-    "We'll use a prompt for RAG that is checked into the LangChain prompt hub ([here](https://smith.langchain.com/hub/rlm/rag-prompt))."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "id": "bede955b-9aeb-4fd3-964d-8e43f214ce70",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain import hub\n",
-    "\n",
-    "prompt = hub.pull(\"rlm/rag-prompt\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "id": "11c35354-f275-47ec-9f72-ebd5c23731eb",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\n",
-      "Question: filler question \n",
-      "Context: filler context \n",
-      "Answer:\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(\n",
-    "    prompt.invoke(\n",
-    "        {\"context\": \"filler context\", \"question\": \"filler question\"}\n",
-    "    ).to_string()\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "51f9a210-1eee-4054-99d7-9d9ddf7e3593",
-   "metadata": {},
-   "source": [
-    "We'll use the [LCEL Runnable](https://python.langchain.com/docs/expression_language/) protocol to define the chain, allowing us to \n",
-    "- pipe together components and functions in a transparent way\n",
-    "- automatically trace our chain in LangSmith\n",
-    "- get streaming, async, and batched calling out of the box"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "id": "99fa1aec",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.schema import StrOutputParser\n",
-    "from langchain_core.runnables import RunnablePassthrough\n",
-    "\n",
-    "\n",
-    "def format_docs(docs):\n",
-    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
-    "\n",
-    "\n",
-    "rag_chain = (\n",
-    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
-    "    | prompt\n",
-    "    | llm\n",
-    "    | StrOutputParser()\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "id": "8655a152-d7cf-466f-b1bc-fbff9ae2b889",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through methods like Chain of Thought (CoT) or Tree of Thoughts, which involve dividing the task into manageable subtasks and exploring multiple reasoning possibilities at each step. Task decomposition can be performed by AI models with prompting, task-specific instructions, or human inputs."
-     ]
-    }
-   ],
-   "source": [
-    "for chunk in rag_chain.stream(\"What is Task Decomposition?\"):\n",
-    "    print(chunk, end=\"\", flush=True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2c000e5f-2b7f-4eb9-8876-9f4b186b4a08",
-   "metadata": {},
-   "source": [
-    ":::tip\n",
-    "\n",
-    "Check out the [LangSmith trace](https://smith.langchain.com/public/1799e8db-8a6d-4eb2-84d5-46e8d7d5a99b/r) \n",
-    "\n",
-    ":::"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f7d52c84",
+   "id": "97b3fb4e-ad76-4ccf-b779-075697119bff",
    "metadata": {},
    "source": [
-    "### Go deeper\n",
-    "\n",
-    "#### Choosing LLMs\n",
-    "`ChatModel`: An LLM-backed chat model wrapper. Takes in a sequence of messages and returns a message.\n",
-    "- [Docs](/docs/modules/model_io/chat/)\n",
-    "- [Integrations](/docs/integrations/chat/): Explore over 25 `ChatModel` integrations.\n",
-    "\n",
-    "`LLM`: A text-in-text-out LLM. Takes in a string and returns a string.\n",
-    "- [Docs](/docs/modules/model_io/llms)\n",
-    "- [Integrations](/docs/integrations/llms): Explore over 75 `LLM` integrations.\n",
-    "\n",
-    "See a guide on RAG with locally-running models [here](/docs/use_cases/question_answering/local_retrieval_qa)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fa82f437",
-   "metadata": {},
-   "source": [
-    "#### Customizing the prompt\n",
-    "\n",
-    "As shown above, we can load prompts (e.g., [this RAG prompt](https://smith.langchain.com/hub/rlm/rag-prompt)) from the prompt hub. The prompt can also be easily customized:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "id": "e4fee704",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'Task decomposition is the process of breaking down a complex task into smaller and simpler steps. It can be done through techniques like Chain of Thought (CoT) or Tree of Thoughts, which involve dividing the problem into multiple thought steps and generating multiple thoughts per step. Task decomposition helps in enhancing model performance and understanding the thinking process of the model. Thanks for asking!'"
-      ]
-     },
-     "execution_count": 24,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from langchain.prompts import PromptTemplate\n",
-    "\n",
-    "template = \"\"\"Use the following pieces of context to answer the question at the end.\n",
-    "If you don't know the answer, just say that you don't know, don't try to make up an answer.\n",
-    "Use three sentences maximum and keep the answer as concise as possible.\n",
-    "Always say \"thanks for asking!\" at the end of the answer.\n",
-    "{context}\n",
-    "Question: {question}\n",
-    "Helpful Answer:\"\"\"\n",
-    "rag_prompt_custom = PromptTemplate.from_template(template)\n",
-    "\n",
-    "rag_chain = (\n",
-    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
-    "    | rag_prompt_custom\n",
-    "    | llm\n",
-    "    | StrOutputParser()\n",
-    ")\n",
-    "\n",
-    "rag_chain.invoke(\"What is Task Decomposition?\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "94b952e6-dc4b-415b-9cf3-1ad333e48366",
-   "metadata": {},
-   "source": [
-    ":::tip\n",
-    "\n",
-    "Check out the [LangSmith trace](https://smith.langchain.com/public/da23c4d8-3b33-47fd-84df-a3a582eedf84/r) \n",
-    "\n",
-    ":::"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1c2f99b5-80b4-4178-bf30-c1c0a152638f",
-   "metadata": {},
-   "source": [
-    "### Adding sources\n",
-    "\n",
-    "With LCEL it's easy to return the retrieved documents or certain source metadata from the documents:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "id": "ded41680-b749-4e2a-9daa-b1165d74783b",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'documents': [{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',\n",
-       "   'start_index': 1585},\n",
-       "  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',\n",
-       "   'start_index': 2192},\n",
-       "  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',\n",
-       "   'start_index': 17804},\n",
-       "  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',\n",
-       "   'start_index': 17414},\n",
-       "  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',\n",
-       "   'start_index': 29630},\n",
-       "  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',\n",
-       "   'start_index': 19373}],\n",
-       " 'answer': 'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks, allowing for a more systematic and organized approach to problem-solving. Thanks for asking!'}"
-      ]
-     },
-     "execution_count": 25,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from operator import itemgetter\n",
-    "\n",
-    "from langchain_core.runnables import RunnableParallel\n",
-    "\n",
-    "rag_chain_from_docs = (\n",
-    "    {\n",
-    "        \"context\": lambda input: format_docs(input[\"documents\"]),\n",
-    "        \"question\": itemgetter(\"question\"),\n",
-    "    }\n",
-    "    | rag_prompt_custom\n",
-    "    | llm\n",
-    "    | StrOutputParser()\n",
-    ")\n",
-    "rag_chain_with_source = RunnableParallel(\n",
-    "    {\"documents\": retriever, \"question\": RunnablePassthrough()}\n",
-    ") | {\n",
-    "    \"documents\": lambda input: [doc.metadata for doc in input[\"documents\"]],\n",
-    "    \"answer\": rag_chain_from_docs,\n",
-    "}\n",
-    "\n",
-    "rag_chain_with_source.invoke(\"What is Task Decomposition\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b437da5d-ca09-4d15-9be2-c35e5a1ace77",
-   "metadata": {},
-   "source": [
-    ":::tip\n",
-    "\n",
-    "Check out the [LangSmith trace](https://smith.langchain.com/public/007d7e01-cb62-4a84-8b71-b24767f953ee/r)\n",
-    "\n",
-    ":::"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "776ae958-cbdc-4471-8669-c6087436f0b5",
-   "metadata": {},
-   "source": [
-    "### Adding memory\n",
-    "\n",
-    "Suppose we want to create a stateful application that remembers past user inputs. There are two main things we need to do to support this.\n",
-    "1. Add a messages placeholder to our chain which allows us to pass in historical messages\n",
-    "2. Add a chain that takes the latest user query and reformulates it in the context of the chat history into a standalone question that can be passed to our retriever.\n",
-    "\n",
-    "Let's start with 2. We can build a \"condense question\" chain that looks something like this:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 26,
-   "id": "2b685428-8b82-4af1-be4f-7232c5d55b73",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
-    "\n",
-    "condense_q_system_prompt = \"\"\"Given a chat history and the latest user question \\\n",
-    "which might reference the chat history, formulate a standalone question \\\n",
-    "which can be understood without the chat history. Do NOT answer the question, \\\n",
-    "just reformulate it if needed and otherwise return it as is.\"\"\"\n",
-    "condense_q_prompt = ChatPromptTemplate.from_messages(\n",
-    "    [\n",
-    "        (\"system\", condense_q_system_prompt),\n",
-    "        MessagesPlaceholder(variable_name=\"chat_history\"),\n",
-    "        (\"human\", \"{question}\"),\n",
-    "    ]\n",
-    ")\n",
-    "condense_q_chain = condense_q_prompt | llm | StrOutputParser()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "id": "46ee9aa1-16f1-4509-8dae-f8c71f4ad47d",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'What is the definition of \"large\" in the context of a language model?'"
-      ]
-     },
-     "execution_count": 27,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from langchain_core.messages import AIMessage, HumanMessage\n",
-    "\n",
-    "condense_q_chain.invoke(\n",
-    "    {\n",
-    "        \"chat_history\": [\n",
-    "            HumanMessage(content=\"What does LLM stand for?\"),\n",
-    "            AIMessage(content=\"Large language model\"),\n",
-    "        ],\n",
-    "        \"question\": \"What is meant by large\",\n",
-    "    }\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 28,
-   "id": "31ee8481-ce37-41ae-8ca5-62196619d4b3",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'How do transformer models function?'"
-      ]
-     },
-     "execution_count": 28,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "condense_q_chain.invoke(\n",
-    "    {\n",
-    "        \"chat_history\": [\n",
-    "            HumanMessage(content=\"What does LLM stand for?\"),\n",
-    "            AIMessage(content=\"Large language model\"),\n",
-    "        ],\n",
-    "        \"question\": \"How do transformers work\",\n",
-    "    }\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "42a47168-4a1f-4e39-bd2d-d5b03609a243",
-   "metadata": {},
-   "source": [
-    "And now we can build our full QA chain. Notice we add some routing functionality to only run the \"condense question chain\" when our chat history isn't empty."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 29,
-   "id": "66f275f3-ddef-4678-b90d-ee64576878f9",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "qa_system_prompt = \"\"\"You are an assistant for question-answering tasks. \\\n",
-    "Use the following pieces of retrieved context to answer the question. \\\n",
-    "If you don't know the answer, just say that you don't know. \\\n",
-    "Use three sentences maximum and keep the answer concise.\\\n",
-    "\n",
-    "{context}\"\"\"\n",
-    "qa_prompt = ChatPromptTemplate.from_messages(\n",
-    "    [\n",
-    "        (\"system\", qa_system_prompt),\n",
-    "        MessagesPlaceholder(variable_name=\"chat_history\"),\n",
-    "        (\"human\", \"{question}\"),\n",
-    "    ]\n",
-    ")\n",
-    "\n",
-    "\n",
-    "def condense_question(input: dict):\n",
-    "    if input.get(\"chat_history\"):\n",
-    "        return condense_q_chain\n",
-    "    else:\n",
-    "        return input[\"question\"]\n",
-    "\n",
-    "\n",
-    "rag_chain = (\n",
-    "    RunnablePassthrough.assign(context=condense_question | retriever | format_docs)\n",
-    "    | qa_prompt\n",
-    "    | llm\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 30,
-   "id": "51fd0e54-5bb4-4a9a-b012-87a18ebe2bef",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content='Common ways of task decomposition include:\\n\\n1. Using Chain of Thought (CoT): CoT is a prompting technique that instructs the model to \"think step by step\" and decompose complex tasks into smaller and simpler steps. It utilizes more test-time computation and sheds light on the model\\'s thinking process.\\n\\n2. Prompting with LLM: Language Model (LLM) can be used to prompt the model with simple instructions like \"Steps for XYZ\" or \"What are the subgoals for achieving XYZ?\" This allows the model to generate a sequence of subtasks or thought steps.\\n\\n3. Task-specific instructions: For certain tasks, task-specific instructions can be provided to guide the model in decomposing the task. For example, for writing a novel, the instruction \"Write a story outline\" can be given to break down the task into manageable steps.\\n\\n4. Human inputs: In some cases, human inputs can be used to assist in task decomposition. Humans can provide their expertise and knowledge to identify and break down complex tasks into smaller subtasks.')"
-      ]
-     },
-     "execution_count": 30,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "chat_history = []\n",
-    "\n",
-    "question = \"What is Task Decomposition?\"\n",
-    "ai_msg = rag_chain.invoke({\"question\": question, \"chat_history\": chat_history})\n",
-    "chat_history.extend([HumanMessage(content=question), ai_msg])\n",
-    "\n",
-    "second_question = \"What are common ways of doing it?\"\n",
-    "rag_chain.invoke({\"question\": second_question, \"chat_history\": chat_history})"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "53263a65-4de2-4dd8-9291-6a8169ab6f1d",
-   "metadata": {},
-   "source": [
-    ":::tip\n",
-    "\n",
-    "Check out the [LangSmith trace](https://smith.langchain.com/public/b3001782-bb30-476a-886b-12da17ec258f/r) \n",
-    "\n",
-    ":::"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fdf6c7e0-84f8-4747-b2ae-e84315152bd9",
-   "metadata": {},
-   "source": [
-    "Here we've gone over how to add chain logic for incorporating historical outputs. But how do we actually store and retrieve historical outputs for different sessions? For that check out the LCEL [How to add message history (memory)](/docs/expression_language/how_to/message_history) page."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "580e18de-132d-4009-ba67-4aaf2c7717a2",
-   "metadata": {},
-   "source": [
-    "## Next steps\n",
-    "\n",
-    "That's a lot of content we've covered in a short amount of time. There's plenty of nuances, features, integrations, etc to explore in each of the above sections. Aside from the sources mentioned above, good next steps include:\n",
+    "## Table of contents\n",
     "\n",
-    "- Reading up on more advanced retrieval techniques in the [Retrievers](/docs/modules/data_connection/retrievers/) section.\n",
-    "- Learning about the LangChain [Indexing API](/docs/modules/data_connection/indexing), which helps repeatedly sync data sources and vector stores without redundant computation or storage.\n",
-    "- Exploring RAG [LangChain Templates](/docs/templates/#-advanced-retrieval), which are reference applications that can easily be deployed with [LangServe](/docs/langserve).\n",
-    "- Learning about [evaluating RAG applications with LangSmith](https://github.com/langchain-ai/langsmith-cookbook/blob/main/testing-examples/qa-correctness/qa-correctness.ipynb)."
+    "- [Quickstart](/docs/use_cases/question_answering/quickstart): We recommend starting here. Many of the following guides assume you fully understand the architecture shown in the Quickstart.\n",
+    "- [Returning sources](/docs/use_cases/question_answering/sources): How to return the source documents used in a particular generation.\n",
+    "- [Streaming](/docs/use_cases/question_answering/streaming): How to stream final answers as well as intermediate steps.\n",
+    "- [Adding chat history](/docs/use_cases/question_answering/chat_history): How to add chat history to a Q&A app.\n",
+    "- [Per-user retrieval](/docs/use_cases/question_answering/per_user): How to do retrieval when each user has their own private data.\n",
+    "- [Using agents](/docs/use_cases/question_answering/conversational_retrieval_agents): How to use agents for Q&A.\n",
+    "- [Using local models](/docs/use_cases/question_answering/local_retrieval_qa): How to use local models for Q&A."
    ]
   }
  ],
diff --git a/docs/docs/use_cases/question_answering/local_retrieval_qa.ipynb b/docs/docs/use_cases/question_answering/local_retrieval_qa.ipynb
index 319a9ef0b4..7f7c0875e5 100644
--- a/docs/docs/use_cases/question_answering/local_retrieval_qa.ipynb
+++ b/docs/docs/use_cases/question_answering/local_retrieval_qa.ipynb
@@ -5,7 +5,7 @@
    "id": "3ea857b1",
    "metadata": {},
    "source": [
-    "# RAG using local models\n",
+    "# Using local models\n",
     "\n",
     "The popularity of projects like [PrivateGPT](https://github.com/imartinez/privateGPT), [llama.cpp](https://github.com/ggerganov/llama.cpp), and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally.\n",
     "\n",
@@ -27,7 +27,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "pip install gpt4all chromadb langchainhub"
+    "!pip install -U langchain langchain-community langchainhub gpt4all chromadb "
    ]
   },
   {
@@ -47,13 +47,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from langchain.document_loaders import WebBaseLoader\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "from langchain_community.document_loaders import WebBaseLoader\n",
     "\n",
     "loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
     "data = loader.load()\n",
     "\n",
-    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
-    "\n",
     "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
     "all_splits = text_splitter.split_documents(data)"
    ]
@@ -68,28 +67,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "id": "fdce8923",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Found model file at  /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "objc[49534]: Class GGMLMetalClass is implemented in both /Users/rlm/miniforge3/envs/llama2/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libreplit-mainline-metal.dylib (0x131614208) and /Users/rlm/miniforge3/envs/llama2/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libllamamodel-mainline-metal.dylib (0x131988208). One of the two will be used. Which one is undefined.\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "from langchain.embeddings import GPT4AllEmbeddings\n",
-    "from langchain.vectorstores import Chroma\n",
+    "from langchain_community.embeddings import GPT4AllEmbeddings\n",
+    "from langchain_community.vectorstores import Chroma\n",
     "\n",
     "vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings())"
    ]
@@ -171,7 +155,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "pip install llama-cpp-python"
+    "!pip install llama-cpp-python"
    ]
   },
   {
@@ -209,9 +193,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from langchain.callbacks.manager import CallbackManager\n",
-    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
-    "from langchain.llms import LlamaCpp"
+    "from langchain_community.llms import LlamaCpp"
    ]
   },
   {
@@ -231,7 +213,6 @@
    "source": [
     "n_gpu_layers = 1  # Metal set to 1 is enough.\n",
     "n_batch = 512  # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.\n",
-    "callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])\n",
     "\n",
     "# Make sure the model path is correct for your system!\n",
     "llm = LlamaCpp(\n",
@@ -240,7 +221,6 @@
     "    n_batch=n_batch,\n",
     "    n_ctx=2048,\n",
     "    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls\n",
-    "    callback_manager=callback_manager,\n",
     "    verbose=True,\n",
     ")"
    ]
@@ -312,7 +292,7 @@
     }
    ],
    "source": [
-    "llm(\"Simulate a rap battle between Stephen Colbert and John Oliver\")"
+    "llm.invoke(\"Simulate a rap battle between Stephen Colbert and John Oliver\")"
    ]
   },
   {
@@ -342,9 +322,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from langchain.llms import GPT4All\n",
+    "from langchain_community.llms import GPT4All\n",
     "\n",
-    "llm = GPT4All(\n",
+    "gpt4all = GPT4All(\n",
     "    model=\"/Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\",\n",
     "    max_tokens=2048,\n",
     ")"
@@ -355,13 +335,11 @@
    "id": "d58838ae",
    "metadata": {},
    "source": [
-    "## LLMChain\n",
+    "## Using in a chain\n",
     "\n",
-    "Run an `LLMChain` (see [here](https://python.langchain.com/docs/modules/chains/foundational/llm_chain)) with either model by passing in the retrieved docs and a simple prompt.\n",
+    "We can create a summarization chain with either model by passing in the retrieved docs and a simple prompt.\n",
     "\n",
-    "It formats the prompt template using the input key values provided and passes the formatted string to `GPT4All`, `LLama-V2`, or another specified LLM.\n",
-    " \n",
-    "In this case, the list of retrieved documents (`docs`) above are pass into `{context}`."
+    "It formats the prompt template using the input key values provided and passes the formatted string to `GPT4All`, `LLama-V2`, or another specified LLM."
    ]
   },
   {
@@ -413,36 +391,26 @@
     }
    ],
    "source": [
-    "from langchain.chains import LLMChain\n",
-    "from langchain.prompts import PromptTemplate\n",
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.prompts import PromptTemplate\n",
     "\n",
     "# Prompt\n",
     "prompt = PromptTemplate.from_template(\n",
     "    \"Summarize the main themes in these retrieved docs: {docs}\"\n",
     ")\n",
     "\n",
+    "\n",
     "# Chain\n",
-    "llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
+    "def format_docs(docs):\n",
+    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
+    "\n",
+    "\n",
+    "chain = {\"docs\": format_docs} | prompt | llm | StrOutputParser()\n",
     "\n",
     "# Run\n",
     "question = \"What are the approaches to Task Decomposition?\"\n",
     "docs = vectorstore.similarity_search(question)\n",
-    "result = llm_chain(docs)\n",
-    "\n",
-    "# Output\n",
-    "result[\"text\"]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ed9cecf8",
-   "metadata": {},
-   "source": [
-    "## QA Chain\n",
-    "\n",
-    "We can use a `QA chain` to handle our question above.\n",
-    "\n",
-    "`chain_type=\"stuff\"` (see [here](https://python.langchain.com/docs/modules/chains/document/stuff)) means that all the docs will be added (stuffed) into a prompt."
+    "chain.invoke(docs)"
    ]
   },
   {
@@ -450,21 +418,35 @@
    "id": "3cce6977-52e7-4944-89b4-c161d04f6698",
    "metadata": {},
    "source": [
-    "We can also use the LangChain Prompt Hub to store and fetch prompts that are model-specific.\n",
+    "## Q&A \n",
     "\n",
-    "This will work with your [LangSmith API key](https://docs.smith.langchain.com/).\n",
+    "We can also use the LangChain Prompt Hub to store and fetch prompts that are model-specific.\n",
     "\n",
     "Let's try with a default RAG prompt, [here](https://smith.langchain.com/hub/rlm/rag-prompt)."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "cc638992-0924-41c0-8dae-8cf683e72b16",
+   "execution_count": 3,
+   "id": "59ed5f0d-7089-41cc-8486-af37b690dd33",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template=\"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\\nQuestion: {question} \\nContext: {context} \\nAnswer:\"))]"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
-    "pip install langchainhub"
+    "from langchain import hub\n",
+    "\n",
+    "rag_prompt = hub.pull(\"rlm/rag-prompt\")\n",
+    "rag_prompt.messages"
    ]
   },
   {
@@ -512,16 +494,18 @@
     }
    ],
    "source": [
-    "# Prompt\n",
-    "from langchain import hub\n",
-    "\n",
-    "rag_prompt = hub.pull(\"rlm/rag-prompt\")\n",
-    "from langchain.chains.question_answering import load_qa_chain\n",
+    "from langchain_core.runnables import RunnablePassthrough, RunnablePick\n",
     "\n",
     "# Chain\n",
-    "chain = load_qa_chain(llm, chain_type=\"stuff\", prompt=rag_prompt)\n",
+    "chain = (\n",
+    "    RunnablePassthrough.assign(context=RunnablePick(\"context\") | format_docs)\n",
+    "    | rag_prompt\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")\n",
+    "\n",
     "# Run\n",
-    "chain({\"input_documents\": docs, \"question\": question}, return_only_outputs=True)"
+    "chain.invoke({\"context\": docs, \"question\": question})"
    ]
   },
   {
@@ -552,7 +536,7 @@
    "source": [
     "# Prompt\n",
     "rag_prompt_llama = hub.pull(\"rlm/rag-prompt-llama\")\n",
-    "rag_prompt_llama"
+    "rag_prompt_llama.messages"
    ]
   },
   {
@@ -606,9 +590,15 @@
    ],
    "source": [
     "# Chain\n",
-    "chain = load_qa_chain(llm, chain_type=\"stuff\", prompt=rag_prompt_llama)\n",
+    "chain = (\n",
+    "    RunnablePassthrough.assign(context=RunnablePick(\"context\") | format_docs)\n",
+    "    | rag_prompt_llama\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")\n",
+    "\n",
     "# Run\n",
-    "chain({\"input_documents\": docs, \"question\": question}, return_only_outputs=True)"
+    "chain.invoke({\"context\": docs, \"question\": question})"
    ]
   },
   {
@@ -616,13 +606,11 @@
    "id": "821729cb",
    "metadata": {},
    "source": [
-    "## RetrievalQA\n",
-    "\n",
-    "For an even simpler flow, use `RetrievalQA`.\n",
+    "## Q&A with retrieval\n",
     "\n",
-    "This will use a QA default prompt (shown [here](https://github.com/langchain-ai/langchain/blob/275b926cf745b5668d3ea30236635e20e7866442/langchain/chains/retrieval_qa/prompt.py#L4)) and will retrieve from the vectorDB.\n",
+    "Instead of manually passing in docs, we can automatically retrieve them from our vector store based on the user question.\n",
     "\n",
-    "But, you can still pass in a prompt, as before, if desired."
+    "This will use a QA default prompt (shown [here](https://github.com/langchain-ai/langchain/blob/275b926cf745b5668d3ea30236635e20e7866442/langchain/chains/retrieval_qa/prompt.py#L4)) and will retrieve from the vectorDB."
    ]
   },
   {
@@ -632,12 +620,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from langchain.chains import RetrievalQA\n",
-    "\n",
-    "qa_chain = RetrievalQA.from_chain_type(\n",
-    "    llm,\n",
-    "    retriever=vectorstore.as_retriever(),\n",
-    "    chain_type_kwargs={\"prompt\": rag_prompt_llama},\n",
+    "retriever = vectorstore.as_retriever()\n",
+    "qa_chain = (\n",
+    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
+    "    | rag_prompt\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
     ")"
    ]
   },
@@ -694,7 +682,7 @@
     }
    ],
    "source": [
-    "qa_chain({\"query\": question})"
+    "qa_chain.invoke(question)"
    ]
   }
  ],
diff --git a/docs/docs/use_cases/question_answering/per_user.ipynb b/docs/docs/use_cases/question_answering/per_user.ipynb
index 58f9fab85a..424da86e91 100644
--- a/docs/docs/use_cases/question_answering/per_user.ipynb
+++ b/docs/docs/use_cases/question_answering/per_user.ipynb
@@ -1,5 +1,15 @@
 {
  "cells": [
+  {
+   "cell_type": "raw",
+   "id": "0e77c293-4049-43be-ba49-ff9daeefeee7",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "sidebar_position: 4\n",
+    "---"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "14d3fd06",
@@ -315,7 +325,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.1"
+   "version": "3.9.1"
   }
  },
  "nbformat": 4,
diff --git a/docs/docs/use_cases/question_answering/quickstart.ipynb b/docs/docs/use_cases/question_answering/quickstart.ipynb
new file mode 100644
index 0000000000..36f36f6966
--- /dev/null
+++ b/docs/docs/use_cases/question_answering/quickstart.ipynb
@@ -0,0 +1,874 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "id": "34814bdb-d05b-4dd3-adf1-ca5779882d7e",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "sidebar_position: 0\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86fc5bb2-017f-434e-8cd6-53ab214a5604",
+   "metadata": {},
+   "source": [
+    "# Quickstart"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de913d6d-c57f-4927-82fe-18902a636861",
+   "metadata": {},
+   "source": [
+    "[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/question_answering/quickstart.ipynb)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5151afed",
+   "metadata": {},
+   "source": [
+    "LangChain has a number of components designed to help build question-answering applications, and RAG applications more generally. To familiarize ourselves with these, we'll build a simple Q&A application over a text data source. Along the way we'll go over a typical Q&A architecture, discuss the relevant LangChain components, and highlight additional resources for more advanced Q&A techniques. We'll also see how LangSmith can help us trace and understand our application. LangSmith will become increasingly helpful as our application grows in complexity."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f25cbbd-0938-4e3d-87e4-17a204a03ffb",
+   "metadata": {},
+   "source": [
+    "## Architecture\n",
+    "We'll create a typical RAG application as outlined in the [Q&A introduction](/docs/use_cases/question_answering/), which has two main components:\n",
+    "\n",
+    "**Indexing**: a pipeline for ingesting data from a source and indexing it. *This usually happen offline.*\n",
+    "\n",
+    "**Retrieval and generation**: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.\n",
+    "\n",
+    "The full sequence from raw data to answer will look like:\n",
+    "\n",
+    "#### Indexing\n",
+    "1. **Load**: First we need to load our data. We'll use [DocumentLoaders](/docs/modules/data_connection/document_loaders/) for this.\n",
+    "2. **Split**: [Text splitters](/docs/modules/data_connection/document_transformers/) break large `Documents` into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.\n",
+    "3. **Store**: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a [VectorStore](/docs/modules/data_connection/vectorstores/) and [Embeddings](/docs/modules/data_connection/text_embedding/) model.\n",
+    "\n",
+    "#### Retrieval and generation\n",
+    "4. **Retrieve**: Given a user input, relevant splits are retrieved from storage using a [Retriever](/docs/modules/data_connection/retrievers/).\n",
+    "5. **Generate**: A [ChatModel](/docs/modules/model_io/chat_models) / [LLM](/docs/modules/model_io/llms/) produces an answer using a prompt that includes the question and the retrieved data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "487d8d79-5ee9-4aa4-9fdf-cd5f4303e099",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "### Dependencies\n",
+    "\n",
+    "We'll use an OpenAI chat model and embeddings and a Chroma vector store in this walkthrough, but everything shown here works with any [ChatModel](/docs/modules/model_io/chat/) or [LLM](/docs/modules/model_io/llms/), [Embeddings](/docs/modules/data_connection/text_embedding/), and [VectorStore](/docs/modules/data_connection/vectorstores/) or [Retriever](/docs/modules/data_connection/retrievers/). \n",
+    "\n",
+    "We'll use the following packages:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "28d272cd-4e31-40aa-bbb4-0be0a1f49a14",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install -U langchain langchain-community langchainhub openai chromadb bs4"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51ef48de-70b6-4f43-8e0b-ab9b84c9c02a",
+   "metadata": {},
+   "source": [
+    "We need to set environment variable `OPENAI_API_KEY`, which can be done directly or loaded from a `.env` file like so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "143787ca-d8e6-4dc9-8281-4374f4d71720",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import getpass\n",
+    "import os\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
+    "\n",
+    "# import dotenv\n",
+    "\n",
+    "# dotenv.load_dotenv()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1665e740-ce01-4f09-b9ed-516db0bd326f",
+   "metadata": {},
+   "source": [
+    "### LangSmith\n",
+    "\n",
+    "Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with [LangSmith](https://smith.langchain.com).\n",
+    "\n",
+    "Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "07411adb-3722-4f65-ab7f-8f6f57663d11",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
+    "os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa6ba684-26cf-4860-904e-a4d51380c134",
+   "metadata": {},
+   "source": [
+    "## Preview\n",
+    "\n",
+    "In this guide we'll build a QA app over the [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilian Weng, which allows us to ask questions about the contents of the post. \n",
+    "\n",
+    "We can create a simple indexing pipeline and RAG chain to do this in ~20 lines of code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "d8a913b1-0eea-442a-8a64-ec73333f104b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import bs4\n",
+    "from langchain import hub\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "from langchain_community.chat_models import ChatOpenAI\n",
+    "from langchain_community.document_loaders import WebBaseLoader\n",
+    "from langchain_community.embeddings import OpenAIEmbeddings\n",
+    "from langchain_community.vectorstores import Chroma\n",
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.runnables import RunnablePassthrough"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "820244ae-74b4-4593-b392-822979dd91b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load, chunk and index the contents of the blog.\n",
+    "loader = WebBaseLoader(\n",
+    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
+    "    bs_kwargs=dict(\n",
+    "        parse_only=bs4.SoupStrainer(\n",
+    "            class_=(\"post-content\", \"post-title\", \"post-header\")\n",
+    "        )\n",
+    "    ),\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
+    "splits = text_splitter.split_documents(docs)\n",
+    "vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())\n",
+    "\n",
+    "# Retrieve and generate using the relevant snippets of the blog.\n",
+    "retriever = vectorstore.as_retriever()\n",
+    "prompt = hub.pull(\"rlm/rag-prompt\")\n",
+    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
+    "\n",
+    "\n",
+    "def format_docs(docs):\n",
+    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
+    "\n",
+    "\n",
+    "rag_chain = (\n",
+    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
+    "    | prompt\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "0d3b0f36-7b56-49c0-8e40-a1aa9ebcbf24",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through prompting techniques like Chain of Thought or Tree of Thoughts, or by using task-specific instructions or human inputs. Task decomposition helps agents plan ahead and manage complicated tasks more effectively.'"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "rag_chain.invoke(\"What is Task Decomposition?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "7cb344e0-c423-400c-a079-964c08e07e32",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# cleanup\n",
+    "vectorstore.delete_collection()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "639dc31a-7f16-40f6-ba2a-20e7c2ecfe60",
+   "metadata": {},
+   "source": [
+    ":::tip\n",
+    "\n",
+    "Check out the [LangSmith trace](https://smith.langchain.com/public/1c6ca97e-445b-4d00-84b4-c7befcbc59fe/r) \n",
+    "\n",
+    ":::"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "842cf72d-abbc-468e-a2eb-022470347727",
+   "metadata": {},
+   "source": [
+    "## Detailed walkthrough\n",
+    "\n",
+    "Let's go through the above code step-by-step to really understand what's going on."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba5daed6",
+   "metadata": {},
+   "source": [
+    "## 1. Indexing: Load\n",
+    "\n",
+    "We need to first load the blog post contents. We can use [DocumentLoaders](/docs/modules/data_connection/document_loaders/) for this, which are objects that load in data from a source and return a list of [Documents](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html).  A `Document` is an object with some `page_content` (str) and `metadata` (dict).\n",
+    "\n",
+    "In this case we'll use the [WebBaseLoader](/docs/integrations/document_loaders/web_base), which uses `urllib` to load HTML form web URLs and `BeautifulSoup` to parse it to text. We can customize the HTML -> text parsing by passing in parameters to the `BeautifulSoup` parser via `bs_kwargs` (see [BeautifulSoup docs](https://beautiful-soup-4.readthedocs.io/en/latest/#beautifulsoup)). In this case only HTML tags with class \"post-content\", \"post-title\", or \"post-header\" are relevant, so we'll remove all others."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "cf4d5c72",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import bs4\n",
+    "from langchain_community.document_loaders import WebBaseLoader\n",
+    "\n",
+    "# Only keep post title, headers, and content from the full HTML.\n",
+    "bs4_strainer = bs4.SoupStrainer(class_=(\"post-title\", \"post-header\", \"post-content\"))\n",
+    "loader = WebBaseLoader(\n",
+    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
+    "    bs_kwargs={\"parse_only\": bs4_strainer},\n",
+    ")\n",
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "207f87a3-effa-4457-b013-6d233bc7a088",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "42824"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(docs[0].page_content)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "52469796-5ce4-4c12-bd2a-a903872dac33",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "      LLM Powered Autonomous Agents\n",
+      "    \n",
+      "Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n",
+      "\n",
+      "\n",
+      "Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\n",
+      "Agent System Overview#\n",
+      "In\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(docs[0].page_content[:500])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee5c6556-56be-4067-adbc-98b5aa19ef6e",
+   "metadata": {},
+   "source": [
+    "### Go deeper\n",
+    "`DocumentLoader`: Object that loads data from a source as list of `Documents`.\n",
+    "- [Docs](/docs/modules/data_connection/document_loaders/): Detailed documentation on how to use `DocumentLoaders`.\n",
+    "- [Integrations](/docs/integrations/document_loaders/): 160+ integrations to choose from.\n",
+    "- [Interface](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.base.BaseLoader.html): API reference  for the base interface."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fd2cc9a7",
+   "metadata": {},
+   "source": [
+    "## 2. Indexing: Split\n",
+    "\n",
+    "Our loaded document is over 42k characters long. This is too long to fit in the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs. \n",
+    "\n",
+    "To handle this we'll split the `Document` into chunks for embedding and vector storage. This should help us retrieve only the most relevant bits of the blog post at run time.\n",
+    "\n",
+    "In this case we'll split our documents into chunks of 1000 characters with 200 characters of overlap between chunks. The overlap helps mitigate the possibility of separating a statement from important context related to it. We use the [RecursiveCharacterTextSplitter](/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter), which will recursively split the document using common separators like new lines until each chunk is the appropriate size. This is the recommended text splitter for generic text use cases.\n",
+    "\n",
+    "We set `add_start_index=True` so that the character index at which each split Document starts within the initial Document is preserved as metadata attribute \"start_index\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "4b11c01d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "\n",
+    "text_splitter = RecursiveCharacterTextSplitter(\n",
+    "    chunk_size=1000, chunk_overlap=200, add_start_index=True\n",
+    ")\n",
+    "all_splits = text_splitter.split_documents(docs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "3741eb67-9caf-40f2-a001-62f49349bff5",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "66"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(all_splits)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "f868d0e5-5670-4d54-b562-f50265e907f4",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "969"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(all_splits[0].page_content)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "5c9e5f27-c8e3-4ca7-8a8e-45c5de2901cc",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',\n",
+       " 'start_index': 7056}"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "all_splits[10].metadata"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a33bd4d",
+   "metadata": {},
+   "source": [
+    "### Go deeper\n",
+    "\n",
+    "`TextSplitter`: Object that splits a list of `Document`s into smaller chunks. Subclass of `DocumentTransformer`s.\n",
+    "- Explore `Context-aware splitters`, which keep the location (\"context\") of each split in the original `Document`:\n",
+    "    - [Markdown files](/docs/use_cases/question_answering/document-context-aware-QA)\n",
+    "    - [Code (py or js)](docs/integrations/document_loaders/source_code)\n",
+    "    - [Scientific papers](/docs/integrations/document_loaders/grobid)\n",
+    "- [Interface](https://api.python.langchain.com/en/latest/text_splitter/langchain.text_splitter.TextSplitter.html): API reference for the base interface.\n",
+    "\n",
+    "`DocumentTransformer`: Object that performs a transformation on a list of `Document`s.\n",
+    "- [Docs](/docs/modules/data_connection/document_transformers/): Detailed documentation on how to use `DocumentTransformers`\n",
+    "- [Integrations](/docs/integrations/document_transformers/)\n",
+    "- [Interface](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.transformers.BaseDocumentTransformer.html): API reference for the base interface.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46547031-2352-4321-9970-d6ea27285c2e",
+   "metadata": {},
+   "source": [
+    "## 3. Indexing: Store\n",
+    "\n",
+    "Now we need to index our 66 text chunks so that we can search over them at runtime. The most common way to do this is to embed the contents of each document split and insert these embeddings into a vector database (or vector store). When we want to search over our splits, we take a text search query, embed it, and perform some sort of \"similarity\" search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is cosine similarity — we measure the cosine of the angle between each pair of embeddings (which are high dimensional vectors).\n",
+    "\n",
+    "We can embed and store all of our document splits in a single command using the [Chroma](/docs/integrations/vectorstores/chroma) vector store and [OpenAIEmbeddings](/docs/integrations/text_embedding/openai) model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "e9c302c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.embeddings import OpenAIEmbeddings\n",
+    "from langchain_community.vectorstores import Chroma\n",
+    "\n",
+    "vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc6f22b0",
+   "metadata": {},
+   "source": [
+    "### Go deeper\n",
+    "`Embeddings`: Wrapper around a text embedding model, used for converting text to embeddings.\n",
+    "- [Docs](/docs/modules/data_connection/text_embedding): Detailed documentation on how to use embeddings.\n",
+    "- [Integrations](/docs/integrations/text_embedding/): 30+ integrations to choose from.\n",
+    "- [Interface](https://api.python.langchain.com/en/latest/embeddings/langchain_core.embeddings.Embeddings.html): API reference for the base interface.\n",
+    "\n",
+    "`VectorStore`: Wrapper around a vector database, used for storing and querying embeddings.\n",
+    "- [Docs](/docs/modules/data_connection/vectorstores/): Detailed documentation on how to use vector stores.\n",
+    "- [Integrations](/docs/integrations/vectorstores/): 40+ integrations to choose from.\n",
+    "- [Interface](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.vectorstores.VectorStore.html): API reference for the base interface.\n",
+    "\n",
+    "This completes the **Indexing** portion of the pipeline. At this point we have a query-able vector store containing the chunked contents of our blog post. Given a user question, we should ideally be able to return the snippets of the blog post that answer the question."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70d64d40-e475-43d9-b64c-925922bb5ef7",
+   "metadata": {},
+   "source": [
+    "## 4. Retrieval and Generation: Retrieve\n",
+    "\n",
+    "Now let's write the actual application logic. We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and returns an answer.\n",
+    "\n",
+    "First we need to define our logic for searching over documents. LangChain defines a [Retriever](/docs/modules/data_connection/retrievers/) interface which wraps an index that can return relevant `Documents` given a string query.\n",
+    "\n",
+    "The most common type of `Retriever` is the [VectorStoreRetriever](/docs/modules/data_connection/retrievers/vectorstore), which uses the similarity search capabilities of a vector store to facillitate retrieval. Any `VectorStore` can easily be turned into a `Retriever` with `VectorStore.as_retriever()`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "4414df0d-5d43-46d0-85a9-5f47be0dd099",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "retriever = vectorstore.as_retriever(search_type=\"similarity\", search_kwargs={\"k\": 6})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "e2c26b7d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "retrieved_docs = retriever.invoke(\"What are the approaches to Task Decomposition?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "8684291d-0f5e-453a-8d3e-ff9feea765d0",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "6"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(retrieved_docs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "9a5dc074-816d-409a-b005-ab4eddfd76af",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\n",
+      "Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(retrieved_docs[0].page_content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d5a113b",
+   "metadata": {},
+   "source": [
+    "### Go deeper\n",
+    "Vector stores are commonly used for retrieval, but there are other ways to do retrieval, too.\n",
+    "\n",
+    "`Retriever`: An object that returns `Document`s given a text query\n",
+    "- [Docs](/docs/modules/data_connection/retrievers/): Further documentation on the interface and built-in retrieval techniques. Some of which include:\n",
+    "    - `MultiQueryRetriever` [generates variants of the input question](/docs/modules/data_connection/retrievers/MultiQueryRetriever) to improve retrieval hit rate.\n",
+    "    - `MultiVectorRetriever` (diagram below) instead generates [variants of the embeddings](/docs/modules/data_connection/retrievers/multi_vector), also in order to improve retrieval hit rate.\n",
+    "    - `Max marginal relevance` selects for [relevance and diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf) among the retrieved documents to avoid passing in duplicate context.\n",
+    "    - Documents can be filtered during vector store retrieval using [`metadata` filters](/docs/use_cases/question_answering/document-context-aware-QA).\n",
+    "- [Integrations](/docs/integrations/retrievers/): Integrations with retrieval services.\n",
+    "- [Interface](https://api.python.langchain.com/en/latest/retrievers/langchain_core.retrievers.BaseRetriever.html): API reference for the base interface."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "415d6824",
+   "metadata": {},
+   "source": [
+    "## 5. Retrieval and Generation: Generate\n",
+    "\n",
+    "Let's put it all together into a chain that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output.\n",
+    "\n",
+    "We'll use the gpt-3.5-turbo OpenAI chat model, but any LangChain `LLM` or `ChatModel` could be substituted in."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "d34d998c-9abf-4e01-a4ad-06dadfcf131c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.chat_models import ChatOpenAI\n",
+    "\n",
+    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc826723-36fc-45d1-a3ef-df8c2c8471a8",
+   "metadata": {},
+   "source": [
+    "We'll use a prompt for RAG that is checked into the LangChain prompt hub ([here](https://smith.langchain.com/hub/rlm/rag-prompt))."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "bede955b-9aeb-4fd3-964d-8e43f214ce70",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain import hub\n",
+    "\n",
+    "prompt = hub.pull(\"rlm/rag-prompt\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "11c35354-f275-47ec-9f72-ebd5c23731eb",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[HumanMessage(content=\"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\\nQuestion: filler question \\nContext: filler context \\nAnswer:\")]"
+      ]
+     },
+     "execution_count": 22,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "example_messages = prompt.invoke(\n",
+    "    {\"context\": \"filler context\", \"question\": \"filler question\"}\n",
+    ").to_messages()\n",
+    "example_messages"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "2ccc50fa-5fa2-4f80-8685-58ec2255523a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\n",
+      "Question: filler question \n",
+      "Context: filler context \n",
+      "Answer:\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(example_messages[0].content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51f9a210-1eee-4054-99d7-9d9ddf7e3593",
+   "metadata": {},
+   "source": [
+    "We'll use the [LCEL Runnable](/docs/expression_language/) protocol to define the chain, allowing us to \n",
+    "- pipe together components and functions in a transparent way\n",
+    "- automatically trace our chain in LangSmith\n",
+    "- get streaming, async, and batched calling out of the box"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "99fa1aec",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.runnables import RunnablePassthrough\n",
+    "\n",
+    "\n",
+    "def format_docs(docs):\n",
+    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
+    "\n",
+    "\n",
+    "rag_chain = (\n",
+    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
+    "    | prompt\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "8655a152-d7cf-466f-b1bc-fbff9ae2b889",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks, allowing for easier interpretation and execution by autonomous agents or models. Task decomposition can be done through various methods, such as using prompting techniques, task-specific instructions, or human inputs."
+     ]
+    }
+   ],
+   "source": [
+    "for chunk in rag_chain.stream(\"What is Task Decomposition?\"):\n",
+    "    print(chunk, end=\"\", flush=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c000e5f-2b7f-4eb9-8876-9f4b186b4a08",
+   "metadata": {},
+   "source": [
+    ":::tip\n",
+    "\n",
+    "Check out the [LangSmith trace](https://smith.langchain.com/public/1799e8db-8a6d-4eb2-84d5-46e8d7d5a99b/r) \n",
+    "\n",
+    ":::"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7d52c84",
+   "metadata": {},
+   "source": [
+    "### Go deeper\n",
+    "\n",
+    "#### Choosing a model\n",
+    "`ChatModel`: An LLM-backed chat model. Takes in a sequence of messages and returns a message.\n",
+    "- [Docs](/docs/modules/model_io/chat/): Detailed documentation on \n",
+    "- [Integrations](/docs/integrations/chat/): 25+ integrations to choose from.\n",
+    "- [Interface](https://api.python.langchain.com/en/latest/language_models/langchain_core.language_models.chat_models.BaseChatModel.html): API reference for the base interface.\n",
+    "\n",
+    "`LLM`: A text-in-text-out LLM. Takes in a string and returns a string.\n",
+    "- [Docs](/docs/modules/model_io/llms)\n",
+    "- [Integrations](/docs/integrations/llms): 75+ integrations to choose from.\n",
+    "- [Interface](https://api.python.langchain.com/en/latest/language_models/langchain_core.language_models.llms.BaseLLM.html): API reference for the base interface.\n",
+    "\n",
+    "See a guide on RAG with locally-running models [here](/docs/use_cases/question_answering/local_retrieval_qa)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa82f437",
+   "metadata": {},
+   "source": [
+    "#### Customizing the prompt\n",
+    "\n",
+    "As shown above, we can load prompts (e.g., [this RAG prompt](https://smith.langchain.com/hub/rlm/rag-prompt)) from the prompt hub. The prompt can also be easily customized:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "e4fee704",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks, allowing for a more systematic and organized approach to problem-solving. Thanks for asking!'"
+      ]
+     },
+     "execution_count": 26,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain_core.prompts import PromptTemplate\n",
+    "\n",
+    "template = \"\"\"Use the following pieces of context to answer the question at the end.\n",
+    "If you don't know the answer, just say that you don't know, don't try to make up an answer.\n",
+    "Use three sentences maximum and keep the answer as concise as possible.\n",
+    "Always say \"thanks for asking!\" at the end of the answer.\n",
+    "\n",
+    "{context}\n",
+    "\n",
+    "Question: {question}\n",
+    "\n",
+    "Helpful Answer:\"\"\"\n",
+    "custom_rag_prompt = PromptTemplate.from_template(template)\n",
+    "\n",
+    "rag_chain = (\n",
+    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
+    "    | custom_rag_prompt\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")\n",
+    "\n",
+    "rag_chain.invoke(\"What is Task Decomposition?\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94b952e6-dc4b-415b-9cf3-1ad333e48366",
+   "metadata": {},
+   "source": [
+    ":::tip\n",
+    "\n",
+    "Check out the [LangSmith trace](https://smith.langchain.com/public/da23c4d8-3b33-47fd-84df-a3a582eedf84/r) \n",
+    "\n",
+    ":::"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "580e18de-132d-4009-ba67-4aaf2c7717a2",
+   "metadata": {},
+   "source": [
+    "## Next steps\n",
+    "\n",
+    "That's a lot of content we've covered in a short amount of time. There's plenty of features, integrations, and extensions to explore in each of the above sections. Along from the **Go deeper** sources mentioned above, good next steps include:\n",
+    "\n",
+    "- [Return sources](/docs/use_cases/question_answering/sources): Learn how to return source documents\n",
+    "- [Streaming](/docs/use_cases/question_answering/streaming): Learn how to stream outputs and intermediate steps\n",
+    "- [Add chat history](/docs/use_cases/question_answering/chat_history): Learn how to add chat history to your app"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "poetry-venv",
+   "language": "python",
+   "name": "poetry-venv"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/docs/use_cases/question_answering/sources.ipynb b/docs/docs/use_cases/question_answering/sources.ipynb
new file mode 100644
index 0000000000..25f9dd587d
--- /dev/null
+++ b/docs/docs/use_cases/question_answering/sources.ipynb
@@ -0,0 +1,268 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "id": "dfbff033-6ba5-4326-ba8b-3f4bbe797b4d",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "sidebar_position: 1\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ef893cf-eac1-45e6-9eb6-72e9ca043200",
+   "metadata": {},
+   "source": [
+    "# Returning sources\n",
+    "\n",
+    "Often in Q&A applications it's important to show users the sources that were used to generate the answer. The simplest way to do this is for the chain to return the Documents that were retrieved in each generation.\n",
+    "\n",
+    "We'll work off of the Q&A app we built over the [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilian Weng in the [Quickstart](/docs/use_cases/question_answering/quickstart)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "487d8d79-5ee9-4aa4-9fdf-cd5f4303e099",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "### Dependencies\n",
+    "\n",
+    "We'll use an OpenAI chat model and embeddings and a Chroma vector store in this walkthrough, but everything shown here works with any [ChatModel](/docs/modules/model_io/chat/) or [LLM](/docs/modules/model_io/llms/), [Embeddings](/docs/modules/data_connection/text_embedding/), and [VectorStore](/docs/modules/data_connection/vectorstores/) or [Retriever](/docs/modules/data_connection/retrievers/). \n",
+    "\n",
+    "We'll use the following packages:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "28d272cd-4e31-40aa-bbb4-0be0a1f49a14",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install -U langchain langchain-community langchainhub openai chromadb bs4"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51ef48de-70b6-4f43-8e0b-ab9b84c9c02a",
+   "metadata": {},
+   "source": [
+    "We need to set environment variable `OPENAI_API_KEY`, which can be done directly or loaded from a `.env` file like so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "143787ca-d8e6-4dc9-8281-4374f4d71720",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import getpass\n",
+    "import os\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
+    "\n",
+    "# import dotenv\n",
+    "\n",
+    "# dotenv.load_dotenv()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1665e740-ce01-4f09-b9ed-516db0bd326f",
+   "metadata": {},
+   "source": [
+    "### LangSmith\n",
+    "\n",
+    "Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with [LangSmith](https://smith.langchain.com).\n",
+    "\n",
+    "Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "07411adb-3722-4f65-ab7f-8f6f57663d11",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
+    "os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa6ba684-26cf-4860-904e-a4d51380c134",
+   "metadata": {},
+   "source": [
+    "## Chain without sources\n",
+    "\n",
+    "Here is the Q&A app we built over the [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilian Weng in the [Quickstart](/docs/use_cases/question_answering/quickstart):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "d8a913b1-0eea-442a-8a64-ec73333f104b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import bs4\n",
+    "from langchain import hub\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "from langchain_community.chat_models import ChatOpenAI\n",
+    "from langchain_community.document_loaders import WebBaseLoader\n",
+    "from langchain_community.embeddings import OpenAIEmbeddings\n",
+    "from langchain_community.vectorstores import Chroma\n",
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.runnables import RunnablePassthrough"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "820244ae-74b4-4593-b392-822979dd91b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load, chunk and index the contents of the blog.\n",
+    "bs_strainer = bs4.SoupStrainer(class_=(\"post-content\", \"post-title\", \"post-header\"))\n",
+    "loader = WebBaseLoader(\n",
+    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
+    "    bs_kwargs={\"parse_only\": bs_strainer},\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
+    "splits = text_splitter.split_documents(docs)\n",
+    "vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())\n",
+    "\n",
+    "# Retrieve and generate using the relevant snippets of the blog.\n",
+    "retriever = vectorstore.as_retriever()\n",
+    "prompt = hub.pull(\"rlm/rag-prompt\")\n",
+    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
+    "\n",
+    "\n",
+    "def format_docs(docs):\n",
+    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
+    "\n",
+    "\n",
+    "rag_chain = (\n",
+    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
+    "    | prompt\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "0d3b0f36-7b56-49c0-8e40-a1aa9ebcbf24",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through prompting techniques like Chain of Thought or Tree of Thoughts, or by using task-specific instructions or human inputs. Task decomposition helps agents plan ahead and manage complicated tasks more effectively.'"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "rag_chain.invoke(\"What is Task Decomposition?\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c2f99b5-80b4-4178-bf30-c1c0a152638f",
+   "metadata": {},
+   "source": [
+    "## Adding sources\n",
+    "\n",
+    "With LCEL it's easy to return the retrieved documents:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "ded41680-b749-4e2a-9daa-b1165d74783b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'context': [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1585}),\n",
+       "  Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 2192}),\n",
+       "  Document(page_content='The AI assistant can parse user input to several tasks: [{\"task\": task, \"id\", task_id, \"dep\": dependency_task_ids, \"args\": {\"text\": text, \"image\": URL, \"audio\": URL, \"video\": URL}}]. The \"dep\" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag \"-task_id\" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can\\'t be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 17804}),\n",
+       "  Document(page_content='Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\\nThe system comprises of 4 stages:\\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\\nInstruction:', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 17414}),\n",
+       "  Document(page_content='Resources:\\n1. Internet access for searches and information gathering.\\n2. Long Term memory management.\\n3. GPT-3.5 powered Agents for delegation of simple tasks.\\n4. File output.\\n\\nPerformance Evaluation:\\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\\n2. Constructively self-criticize your big-picture behavior constantly.\\n3. Reflect on past decisions and strategies to refine your approach.\\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 29630}),\n",
+       "  Document(page_content=\"(3) Task execution: Expert models execute on the specific tasks and log results.\\nInstruction:\\n\\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\", metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 19373})],\n",
+       " 'question': 'What is Task Decomposition',\n",
+       " 'answer': 'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks, allowing for a more systematic and organized approach to problem-solving. Thanks for asking!'}"
+      ]
+     },
+     "execution_count": 27,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain_core.runnables import RunnableParallel\n",
+    "\n",
+    "rag_chain_from_docs = (\n",
+    "    RunnablePassthrough.assign(context=(lambda x: format_docs(x[\"context\"])))\n",
+    "    | prompt\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")\n",
+    "\n",
+    "rag_chain_with_source = RunnableParallel(\n",
+    "    {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
+    ").assign(answer=rag_chain_from_docs)\n",
+    "\n",
+    "rag_chain_with_source.invoke(\"What is Task Decomposition\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b437da5d-ca09-4d15-9be2-c35e5a1ace77",
+   "metadata": {},
+   "source": [
+    ":::tip\n",
+    "\n",
+    "Check out the [LangSmith trace](https://smith.langchain.com/public/007d7e01-cb62-4a84-8b71-b24767f953ee/r)\n",
+    "\n",
+    ":::"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "poetry-venv",
+   "language": "python",
+   "name": "poetry-venv"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/docs/use_cases/question_answering/streaming.ipynb b/docs/docs/use_cases/question_answering/streaming.ipynb
new file mode 100644
index 0000000000..0e3504ffde
--- /dev/null
+++ b/docs/docs/use_cases/question_answering/streaming.ipynb
@@ -0,0 +1,893 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "id": "88b8b973-9320-44b6-8753-626d5ccc9247",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "sidebar_position: 3\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ef893cf-eac1-45e6-9eb6-72e9ca043200",
+   "metadata": {},
+   "source": [
+    "# Streaming\n",
+    "\n",
+    "Often in Q&A applications it's important to show users the sources that were used to generate the answer. The simplest way to do this is for the chain to return the Documents that were retrieved in each generation.\n",
+    "\n",
+    "We'll work off of the Q&A app with sources we built over the [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilian Weng in the [Returning sources](/docs/use_cases/question_answering/sources) guide."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "487d8d79-5ee9-4aa4-9fdf-cd5f4303e099",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "### Dependencies\n",
+    "\n",
+    "We'll use an OpenAI chat model and embeddings and a Chroma vector store in this walkthrough, but everything shown here works with any [ChatModel](/docs/modules/model_io/chat/) or [LLM](/docs/modules/model_io/llms/), [Embeddings](/docs/modules/data_connection/text_embedding/), and [VectorStore](/docs/modules/data_connection/vectorstores/) or [Retriever](/docs/modules/data_connection/retrievers/). \n",
+    "\n",
+    "We'll use the following packages:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "28d272cd-4e31-40aa-bbb4-0be0a1f49a14",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install -U langchain langchain-community langchainhub openai chromadb bs4"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51ef48de-70b6-4f43-8e0b-ab9b84c9c02a",
+   "metadata": {},
+   "source": [
+    "We need to set environment variable `OPENAI_API_KEY`, which can be done directly or loaded from a `.env` file like so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "143787ca-d8e6-4dc9-8281-4374f4d71720",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import getpass\n",
+    "import os\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
+    "\n",
+    "# import dotenv\n",
+    "\n",
+    "# dotenv.load_dotenv()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1665e740-ce01-4f09-b9ed-516db0bd326f",
+   "metadata": {},
+   "source": [
+    "### LangSmith\n",
+    "\n",
+    "Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with [LangSmith](https://smith.langchain.com).\n",
+    "\n",
+    "Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "07411adb-3722-4f65-ab7f-8f6f57663d11",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
+    "os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa6ba684-26cf-4860-904e-a4d51380c134",
+   "metadata": {},
+   "source": [
+    "## Chain with sources\n",
+    "\n",
+    "Here is Q&A app with sources we built over the [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilian Weng in the [Returning sources](/docs/use_cases/question_answering/sources) guide:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "d8a913b1-0eea-442a-8a64-ec73333f104b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import bs4\n",
+    "from langchain import hub\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "from langchain_community.chat_models import ChatOpenAI\n",
+    "from langchain_community.document_loaders import WebBaseLoader\n",
+    "from langchain_community.embeddings import OpenAIEmbeddings\n",
+    "from langchain_community.vectorstores import Chroma\n",
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.runnables import RunnableParallel, RunnablePassthrough"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "820244ae-74b4-4593-b392-822979dd91b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load, chunk and index the contents of the blog.\n",
+    "bs_strainer = bs4.SoupStrainer(class_=(\"post-content\", \"post-title\", \"post-header\"))\n",
+    "loader = WebBaseLoader(\n",
+    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
+    "    bs_kwargs={\"parse_only\": bs_strainer},\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
+    "splits = text_splitter.split_documents(docs)\n",
+    "vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())\n",
+    "\n",
+    "# Retrieve and generate using the relevant snippets of the blog.\n",
+    "retriever = vectorstore.as_retriever()\n",
+    "prompt = hub.pull(\"rlm/rag-prompt\")\n",
+    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
+    "\n",
+    "\n",
+    "def format_docs(docs):\n",
+    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
+    "\n",
+    "\n",
+    "rag_chain_from_docs = (\n",
+    "    RunnablePassthrough.assign(context=(lambda x: format_docs(x[\"context\"])))\n",
+    "    | prompt\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")\n",
+    "\n",
+    "rag_chain_with_source = RunnableParallel(\n",
+    "    {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
+    ").assign(answer=rag_chain_from_docs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c2f99b5-80b4-4178-bf30-c1c0a152638f",
+   "metadata": {},
+   "source": [
+    "## Streaming final outputs\n",
+    "\n",
+    "With LCEL it's easy to stream final outputs:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "ded41680-b749-4e2a-9daa-b1165d74783b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'question': 'What is Task Decomposition'}\n",
+      "{'context': [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}), Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}), Document(page_content='The AI assistant can parse user input to several tasks: [{\"task\": task, \"id\", task_id, \"dep\": dependency_task_ids, \"args\": {\"text\": text, \"image\": URL, \"audio\": URL, \"video\": URL}}]. The \"dep\" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag \"-task_id\" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can\\'t be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}), Document(page_content='Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\\nThe system comprises of 4 stages:\\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\\nInstruction:', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})]}\n",
+      "{'answer': ''}\n",
+      "{'answer': 'Task'}\n",
+      "{'answer': ' decomposition'}\n",
+      "{'answer': ' is'}\n",
+      "{'answer': ' a'}\n",
+      "{'answer': ' technique'}\n",
+      "{'answer': ' used'}\n",
+      "{'answer': ' to'}\n",
+      "{'answer': ' break'}\n",
+      "{'answer': ' down'}\n",
+      "{'answer': ' complex'}\n",
+      "{'answer': ' tasks'}\n",
+      "{'answer': ' into'}\n",
+      "{'answer': ' smaller'}\n",
+      "{'answer': ' and'}\n",
+      "{'answer': ' simpler'}\n",
+      "{'answer': ' steps'}\n",
+      "{'answer': '.'}\n",
+      "{'answer': ' It'}\n",
+      "{'answer': ' can'}\n",
+      "{'answer': ' be'}\n",
+      "{'answer': ' done'}\n",
+      "{'answer': ' through'}\n",
+      "{'answer': ' methods'}\n",
+      "{'answer': ' like'}\n",
+      "{'answer': ' Chain'}\n",
+      "{'answer': ' of'}\n",
+      "{'answer': ' Thought'}\n",
+      "{'answer': ' ('}\n",
+      "{'answer': 'Co'}\n",
+      "{'answer': 'T'}\n",
+      "{'answer': ')'}\n",
+      "{'answer': ' or'}\n",
+      "{'answer': ' Tree'}\n",
+      "{'answer': ' of'}\n",
+      "{'answer': ' Thoughts'}\n",
+      "{'answer': ','}\n",
+      "{'answer': ' which'}\n",
+      "{'answer': ' involve'}\n",
+      "{'answer': ' dividing'}\n",
+      "{'answer': ' the'}\n",
+      "{'answer': ' task'}\n",
+      "{'answer': ' into'}\n",
+      "{'answer': ' manageable'}\n",
+      "{'answer': ' sub'}\n",
+      "{'answer': 'tasks'}\n",
+      "{'answer': ' and'}\n",
+      "{'answer': ' exploring'}\n",
+      "{'answer': ' multiple'}\n",
+      "{'answer': ' reasoning'}\n",
+      "{'answer': ' possibilities'}\n",
+      "{'answer': ' at'}\n",
+      "{'answer': ' each'}\n",
+      "{'answer': ' step'}\n",
+      "{'answer': '.'}\n",
+      "{'answer': ' Task'}\n",
+      "{'answer': ' decomposition'}\n",
+      "{'answer': ' can'}\n",
+      "{'answer': ' be'}\n",
+      "{'answer': ' performed'}\n",
+      "{'answer': ' by'}\n",
+      "{'answer': ' using'}\n",
+      "{'answer': ' simple'}\n",
+      "{'answer': ' prompts'}\n",
+      "{'answer': ','}\n",
+      "{'answer': ' task'}\n",
+      "{'answer': '-specific'}\n",
+      "{'answer': ' instructions'}\n",
+      "{'answer': ','}\n",
+      "{'answer': ' or'}\n",
+      "{'answer': ' human'}\n",
+      "{'answer': ' inputs'}\n",
+      "{'answer': '.'}\n",
+      "{'answer': ''}\n"
+     ]
+    }
+   ],
+   "source": [
+    "for chunk in rag_chain_with_source.stream(\"What is Task Decomposition\"):\n",
+    "    print(chunk)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "893e830b-9372-43c2-a700-a823d31de2fc",
+   "metadata": {},
+   "source": [
+    "We can add some logic to compile our stream as it's being returned:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "b2724496-5f5a-438d-be6f-795adc27ed1c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "question: What is Task Decomposition\n",
+      "\n",
+      "context: [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}), Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}), Document(page_content='The AI assistant can parse user input to several tasks: [{\"task\": task, \"id\", task_id, \"dep\": dependency_task_ids, \"args\": {\"text\": text, \"image\": URL, \"audio\": URL, \"video\": URL}}]. The \"dep\" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag \"-task_id\" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can\\'t be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}), Document(page_content='Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\\nThe system comprises of 4 stages:\\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\\nInstruction:', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})]\n",
+      "\n",
+      "answer: Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through methods like Chain of Thought (CoT) or Tree of Thoughts, which involve dividing the task into manageable subtasks and exploring multiple reasoning possibilities at each step. Task decomposition can be performed by using simple prompts, task-specific instructions, or human inputs."
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'question': 'What is Task Decomposition',\n",
+       " 'context': [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
+       "  Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
+       "  Document(page_content='The AI assistant can parse user input to several tasks: [{\"task\": task, \"id\", task_id, \"dep\": dependency_task_ids, \"args\": {\"text\": text, \"image\": URL, \"audio\": URL, \"video\": URL}}]. The \"dep\" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag \"-task_id\" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can\\'t be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
+       "  Document(page_content='Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\\nThe system comprises of 4 stages:\\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\\nInstruction:', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})],\n",
+       " 'answer': 'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through methods like Chain of Thought (CoT) or Tree of Thoughts, which involve dividing the task into manageable subtasks and exploring multiple reasoning possibilities at each step. Task decomposition can be performed by using simple prompts, task-specific instructions, or human inputs.'}"
+      ]
+     },
+     "execution_count": 25,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "output = {}\n",
+    "curr_key = None\n",
+    "for chunk in rag_chain_with_source.stream(\"What is Task Decomposition\"):\n",
+    "    for key in chunk:\n",
+    "        if key not in output:\n",
+    "            output[key] = chunk[key]\n",
+    "        else:\n",
+    "            output[key] += chunk[key]\n",
+    "        if key != curr_key:\n",
+    "            print(f\"\\n\\n{key}: {chunk[key]}\", end=\"\", flush=True)\n",
+    "        else:\n",
+    "            print(chunk[key], end=\"\", flush=True)\n",
+    "        curr_key = key\n",
+    "output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fdee7ae6-4a81-46ab-8efd-d2310b596f8c",
+   "metadata": {},
+   "source": [
+    "## Streaming intermediate steps\n",
+    "\n",
+    "Suppose we want to stream not only the final outputs of the chain, but also some intermediate steps. As an example let's take our [Chat history](/docs/use_cases/question_answering/chat_history) chain. Here we reformulate the user question before passing it to the retriever. This reformulated question is not returned as part of the final output. We could modify our chain to return the new question, but for demonstration purposes we'll leave it as is."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 74,
+   "id": "f4d7714e-bdca-419d-a6c6-7c1a70a69297",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from operator import itemgetter\n",
+    "\n",
+    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
+    "from langchain_core.tracers.log_stream import LogStreamCallbackHandler\n",
+    "\n",
+    "contextualize_q_system_prompt = \"\"\"Given a chat history and the latest user question \\\n",
+    "which might reference context in the chat history, formulate a standalone question \\\n",
+    "which can be understood without the chat history. Do NOT answer the question, \\\n",
+    "just reformulate it if needed and otherwise return it as is.\"\"\"\n",
+    "contextualize_q_prompt = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        (\"system\", contextualize_q_system_prompt),\n",
+    "        MessagesPlaceholder(variable_name=\"chat_history\"),\n",
+    "        (\"human\", \"{question}\"),\n",
+    "    ]\n",
+    ")\n",
+    "contextualize_q_chain = (contextualize_q_prompt | llm | StrOutputParser()).with_config(\n",
+    "    tags=[\"contextualize_q_chain\"]\n",
+    ")\n",
+    "\n",
+    "qa_system_prompt = \"\"\"You are an assistant for question-answering tasks. \\\n",
+    "Use the following pieces of retrieved context to answer the question. \\\n",
+    "If you don't know the answer, just say that you don't know. \\\n",
+    "Use three sentences maximum and keep the answer concise.\\\n",
+    "\n",
+    "{context}\"\"\"\n",
+    "qa_prompt = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        (\"system\", qa_system_prompt),\n",
+    "        MessagesPlaceholder(variable_name=\"chat_history\"),\n",
+    "        (\"human\", \"{question}\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "\n",
+    "def contextualized_question(input: dict):\n",
+    "    if input.get(\"chat_history\"):\n",
+    "        return contextualize_q_chain\n",
+    "    else:\n",
+    "        return input[\"question\"]\n",
+    "\n",
+    "\n",
+    "rag_chain = (\n",
+    "    RunnablePassthrough.assign(context=contextualize_q_chain | retriever | format_docs)\n",
+    "    | qa_prompt\n",
+    "    | llm\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3b074bc-c856-4767-93fd-15e66119548c",
+   "metadata": {},
+   "source": [
+    "To stream intermediate steps we'll use the `astream_log` method. This is an async method that yields JSONPatch ops that when applied in the same order as received build up the RunState:\n",
+    "\n",
+    "```python\n",
+    "class RunState(TypedDict):\n",
+    "    id: str\n",
+    "    \"\"\"ID of the run.\"\"\"\n",
+    "    streamed_output: List[Any]\n",
+    "    \"\"\"List of output chunks streamed by Runnable.stream()\"\"\"\n",
+    "    final_output: Optional[Any]\n",
+    "    \"\"\"Final output of the run, usually the result of aggregating (`+`) streamed_output.\n",
+    "    Only available after the run has finished successfully.\"\"\"\n",
+    "\n",
+    "    logs: Dict[str, LogEntry]\n",
+    "    \"\"\"Map of run names to sub-runs. If filters were supplied, this list will\n",
+    "    contain only the runs that matched the filters.\"\"\"\n",
+    "```\n",
+    "\n",
+    "You can stream all steps (default) or include/exclude steps by name, tags or metadata. In this case we'll only stream intermediate steps that are part of the `contextualize_q_chain` and the final output. Notice that when defining the `contextualize_q_chain` we gave it a corresponding tag, which we can now filter on. \n",
+    "\n",
+    "We only show the first 20 chunks of the stream for readability:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 80,
+   "id": "7ec8127b-0e6d-4633-9523-bd9daaf0264a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Needed for running async functions in Jupyter notebook:\n",
+    "import nest_asyncio\n",
+    "\n",
+    "nest_asyncio.apply()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 81,
+   "id": "b8fb304b-46b0-424b-814b-499e1d80e700",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '',\n",
+      "  'value': {'final_output': None,\n",
+      "            'id': 'df0938b3-3ff2-451b-a233-6c882b640e4d',\n",
+      "            'logs': {},\n",
+      "            'streamed_output': []}})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/RunnableSequence',\n",
+      "  'value': {'end_time': None,\n",
+      "            'final_output': None,\n",
+      "            'id': '2e2af851-9e1f-4260-b004-c30dea4affe9',\n",
+      "            'metadata': {},\n",
+      "            'name': 'RunnableSequence',\n",
+      "            'start_time': '2023-12-29T20:08:28.923',\n",
+      "            'streamed_output': [],\n",
+      "            'streamed_output_str': [],\n",
+      "            'tags': ['seq:step:1', 'contextualize_q_chain'],\n",
+      "            'type': 'chain'}})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatPromptTemplate',\n",
+      "  'value': {'end_time': None,\n",
+      "            'final_output': None,\n",
+      "            'id': '7ad34564-337c-4362-ae7a-655d79cf0ab0',\n",
+      "            'metadata': {},\n",
+      "            'name': 'ChatPromptTemplate',\n",
+      "            'start_time': '2023-12-29T20:08:28.926',\n",
+      "            'streamed_output': [],\n",
+      "            'streamed_output_str': [],\n",
+      "            'tags': ['seq:step:1', 'contextualize_q_chain'],\n",
+      "            'type': 'prompt'}})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatPromptTemplate/final_output',\n",
+      "  'value': ChatPromptValue(messages=[SystemMessage(content='Given a chat history and the latest user question which might reference context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, just reformulate it if needed and otherwise return it as is.'), HumanMessage(content='What is Task Decomposition?'), AIMessage(content='Task decomposition is a technique used to break down complex tasks into smaller and more manageable subtasks. It involves dividing a task into multiple steps or subgoals, allowing an agent or model to better understand and plan for the overall task. Task decomposition can be done through various methods, such as using prompting techniques like Chain of Thought or Tree of Thoughts, task-specific instructions, or human inputs.'), HumanMessage(content='What are common ways of doing it?')])},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatPromptTemplate/end_time',\n",
+      "  'value': '2023-12-29T20:08:28.926'})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI',\n",
+      "  'value': {'end_time': None,\n",
+      "            'final_output': None,\n",
+      "            'id': '228792d6-1d76-4209-8d25-08c484b6df57',\n",
+      "            'metadata': {},\n",
+      "            'name': 'ChatOpenAI',\n",
+      "            'start_time': '2023-12-29T20:08:28.931',\n",
+      "            'streamed_output': [],\n",
+      "            'streamed_output_str': [],\n",
+      "            'tags': ['seq:step:2', 'contextualize_q_chain'],\n",
+      "            'type': 'llm'}})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/StrOutputParser',\n",
+      "  'value': {'end_time': None,\n",
+      "            'final_output': None,\n",
+      "            'id': 'f740f235-2b14-412d-9f54-53bbc4fa8fd8',\n",
+      "            'metadata': {},\n",
+      "            'name': 'StrOutputParser',\n",
+      "            'start_time': '2023-12-29T20:08:29.487',\n",
+      "            'streamed_output': [],\n",
+      "            'streamed_output_str': [],\n",
+      "            'tags': ['seq:step:3', 'contextualize_q_chain'],\n",
+      "            'type': 'parser'}})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add', 'path': '/logs/ChatOpenAI/streamed_output_str/-', 'value': ''},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content='')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output_str/-',\n",
+      "  'value': 'What'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content='What')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output_str/-',\n",
+      "  'value': ' are'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content=' are')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output_str/-',\n",
+      "  'value': ' some'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content=' some')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output_str/-',\n",
+      "  'value': ' commonly'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content=' commonly')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output_str/-',\n",
+      "  'value': ' used'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content=' used')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output_str/-',\n",
+      "  'value': ' methods'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content=' methods')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output_str/-',\n",
+      "  'value': ' or'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content=' or')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output_str/-',\n",
+      "  'value': ' approaches'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content=' approaches')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output_str/-',\n",
+      "  'value': ' for'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content=' for')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output_str/-',\n",
+      "  'value': ' task'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content=' task')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output_str/-',\n",
+      "  'value': ' decomposition'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content=' decomposition')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add', 'path': '/logs/ChatOpenAI/streamed_output_str/-', 'value': '?'},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content='?')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add', 'path': '/logs/ChatOpenAI/streamed_output_str/-', 'value': ''},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/streamed_output/-',\n",
+      "  'value': AIMessageChunk(content='')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/final_output',\n",
+      "  'value': {'generations': [[{'generation_info': {'finish_reason': 'stop'},\n",
+      "                              'message': AIMessageChunk(content='What are some commonly used methods or approaches for task decomposition?'),\n",
+      "                              'text': 'What are some commonly used methods or '\n",
+      "                                      'approaches for task decomposition?',\n",
+      "                              'type': 'ChatGenerationChunk'}]],\n",
+      "            'llm_output': None,\n",
+      "            'run': None}},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/ChatOpenAI/end_time',\n",
+      "  'value': '2023-12-29T20:08:29.688'})\n",
+      "\n",
+      "------------------------------\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_core.messages import HumanMessage\n",
+    "\n",
+    "chat_history = []\n",
+    "\n",
+    "question = \"What is Task Decomposition?\"\n",
+    "ai_msg = rag_chain.invoke({\"question\": question, \"chat_history\": chat_history})\n",
+    "chat_history.extend([HumanMessage(content=question), ai_msg])\n",
+    "\n",
+    "second_question = \"What are common ways of doing it?\"\n",
+    "ct = 0\n",
+    "async for jsonpatch_op in rag_chain.astream_log(\n",
+    "    {\"question\": second_question, \"chat_history\": chat_history},\n",
+    "    include_tags=[\"contextualize_q_chain\"],\n",
+    "):\n",
+    "    print(jsonpatch_op)\n",
+    "    print(\"\\n\" + \"-\" * 30 + \"\\n\")\n",
+    "    ct += 1\n",
+    "    if ct > 20:\n",
+    "        break"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32ba6cfe-43c8-4d75-a068-2eb2a1371ad3",
+   "metadata": {},
+   "source": [
+    "If we wanted to get our retrieved docs, we could filter on name \"Retriever\":"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 85,
+   "id": "ad8aff35-28c4-4a99-a581-88750a63dad4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '',\n",
+      "  'value': {'final_output': None,\n",
+      "            'id': '9d122c72-378c-41f8-96fe-3fd9a214e9bc',\n",
+      "            'logs': {},\n",
+      "            'streamed_output': []}})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/Retriever',\n",
+      "  'value': {'end_time': None,\n",
+      "            'final_output': None,\n",
+      "            'id': 'c83481fb-7ca3-4125-9280-96da0c14eee9',\n",
+      "            'metadata': {},\n",
+      "            'name': 'Retriever',\n",
+      "            'start_time': '2023-12-29T20:10:13.794',\n",
+      "            'streamed_output': [],\n",
+      "            'streamed_output_str': [],\n",
+      "            'tags': ['seq:step:2', 'Chroma', 'OpenAIEmbeddings'],\n",
+      "            'type': 'retriever'}})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'add',\n",
+      "  'path': '/logs/Retriever/final_output',\n",
+      "  'value': {'documents': [Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
+      "                          Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
+      "                          Document(page_content='Resources:\\n1. Internet access for searches and information gathering.\\n2. Long Term memory management.\\n3. GPT-3.5 powered Agents for delegation of simple tasks.\\n4. File output.\\n\\nPerformance Evaluation:\\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\\n2. Constructively self-criticize your big-picture behavior constantly.\\n3. Reflect on past decisions and strategies to refine your approach.\\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
+      "                          Document(page_content='Fig. 9. Comparison of MIPS algorithms, measured in recall@10. (Image source: Google Blog, 2020)\\nCheck more MIPS algorithms and performance comparison in ann-benchmarks.com.\\nComponent Three: Tool Use#\\nTool use is a remarkable and distinguishing characteristic of human beings. We create, modify and utilize external objects to do things that go beyond our physical and cognitive limits. Equipping LLMs with external tools can significantly extend the model capabilities.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})]}},\n",
+      " {'op': 'add',\n",
+      "  'path': '/logs/Retriever/end_time',\n",
+      "  'value': '2023-12-29T20:10:14.234'})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include:\\n')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include:\\n1')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include:\\n1.')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include:\\n1. Using')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include:\\n1. Using prompting')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include:\\n1. Using prompting techniques')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include:\\n1. Using prompting techniques like')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include:\\n1. Using prompting techniques like Chain')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include:\\n1. Using prompting techniques like Chain of')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include:\\n1. Using prompting techniques like Chain of Thought')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "RunLogPatch({'op': 'replace',\n",
+      "  'path': '/final_output',\n",
+      "  'value': AIMessageChunk(content='Common ways of task decomposition include:\\n1. Using prompting techniques like Chain of Thought (')})\n",
+      "\n",
+      "------------------------------\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "ct = 0\n",
+    "async for jsonpatch_op in rag_chain.astream_log(\n",
+    "    {\"question\": second_question, \"chat_history\": chat_history},\n",
+    "    include_names=[\"Retriever\"],\n",
+    "    with_streamed_output_list=False,\n",
+    "):\n",
+    "    print(jsonpatch_op)\n",
+    "    print(\"\\n\" + \"-\" * 30 + \"\\n\")\n",
+    "    ct += 1\n",
+    "    if ct > 20:\n",
+    "        break"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5470a79-258a-4108-8ceb-dfe8180160ca",
+   "metadata": {},
+   "source": [
+    "For more on how to stream intermediate steps check out the [LCEL Interface](https://python.langchain.com/docs/expression_language/interface#async-stream-intermediate-steps) docs."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "poetry-venv",
+   "language": "python",
+   "name": "poetry-venv"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/vercel.json b/docs/vercel.json
index e80e5f8972..b697b48f19 100644
--- a/docs/vercel.json
+++ b/docs/vercel.json
@@ -1,8 +1,16 @@
 {
   "redirects": [
     {
-      "source": "docs/docs/integrations/providers/alibabacloud_opensearch",
-      "destination": "docs/docs/integrations/providers/alibaba_cloud"
+      "source": "/docs/use_cases/question_answering/code_understanding",
+      "destination": "/docs/use_cases/code_understanding"
+    },
+    {
+      "source": "/docs/use_cases/question_answering/document-context-aware-QA",
+      "destination": "/docs/modules/data_connection/document_transformers/"
+    },
+    {
+      "source": "/docs/integrations/providers/alibabacloud_opensearch",
+      "destination": "/docs/integrations/providers/alibaba_cloud"
     },
     {
       "source": "/docs/integrations/chat/pai_eas_chat_endpoint",