Update RAG use case (move to ntbk) (#9340)

pull/9951/head
Lance Martin 1 year ago committed by GitHub
parent 709a67d9bf
commit 985873c497
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -25,8 +25,7 @@
"metadata": {},
"outputs": [],
"source": [
"! pip install gpt4all\n",
"! pip install chromadb"
"pip install gpt4all chromadb"
]
},
{
@ -157,7 +156,7 @@
"metadata": {},
"outputs": [],
"source": [
"! pip install llama-cpp-python"
"pip install llama-cpp-python"
]
},
{
@ -736,7 +735,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
"version": "3.9.16"
}
},
"nbformat": 4,

@ -1,342 +0,0 @@
---
sidebar_position: -1
---
# QA over Documents
## Use case
Suppose you have some text documents (PDF, blog, Notion pages, etc.) and want to ask questions related to the contents of those documents. LLMs, given their proficiency in understanding text, are a great tool for this.
In this walkthrough we'll go over how to build a question-answering over documents application using LLMs. Two very related use cases which we cover elsewhere are:
- [QA over structured data](/docs/use_cases/tabular) (e.g., SQL)
- [QA over code](/docs/use_cases/code) (e.g., Python)
![intro.png](/img/qa_intro.png)
## Overview
The pipeline for converting raw unstructured data into a QA chain looks like this:
1. `Loading`: First we need to load our data. Unstructured data can be loaded from many sources. Use the [LangChain integration hub](https://integrations.langchain.com/) to browse the full set of loaders.
Each loader returns data as a LangChain [`Document`](https://docs.langchain.com/docs/components/schema/document).
2. `Splitting`: [Text splitters](/docs/modules/data_connection/document_transformers/) break `Documents` into splits of specified size
3. `Storage`: Storage (e.g., often a [vectorstore](/docs/modules/data_connection/vectorstores/)) will house [and often embed](https://www.pinecone.io/learn/vector-embeddings/) the splits
4. `Retrieval`: The app retrieves splits from storage (e.g., often [with similar embeddings](https://www.pinecone.io/learn/k-nearest-neighbor/) to the input question)
5. `Generation`: An [LLM](/docs/modules/model_io/models/llms/) produces an answer using a prompt that includes the question and the retrieved data
6. `Conversation` (Extension): Hold a multi-turn conversation by adding [Memory](/docs/modules/memory/) to your QA chain.
![flow.jpeg](/img/qa_flow.jpeg)
## Quickstart
To give you a sneak preview, the above pipeline can be all be wrapped in a single object: `VectorstoreIndexCreator`. Suppose we want a QA app over this [blog post](https://lilianweng.github.io/posts/2023-06-23-agent/). We can create this in a few lines of code:
First set environment variables and install packages:
```bash
pip install openai chromadb
export OPENAI_API_KEY="..."
```
Then run:
```python
from langchain.document_loaders import WebBaseLoader
from langchain.indexes import VectorstoreIndexCreator
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
index = VectorstoreIndexCreator().from_loaders([loader])
```
And now ask your questions:
```python
index.query("What is Task Decomposition?")
```
' Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done using LLM with simple prompting, task-specific instructions, or human inputs. Tree of Thoughts (Yao et al. 2023) is an example of a task decomposition technique that explores multiple reasoning possibilities at each step and generates multiple thoughts per step, creating a tree structure.'
Ok, but what's going on under the hood, and how could we customize this for our specific use case? For that, let's take a look at how we can construct this pipeline piece by piece.
## Step 1. Load
Specify a `DocumentLoader` to load in your unstructured data as `Documents`. A `Document` is a piece of text (the `page_content`) and associated metadata.
```python
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()
```
### Go deeper
- Browse the > 120 data loader integrations [here](https://integrations.langchain.com/).
- See further documentation on loaders [here](/docs/modules/data_connection/document_loaders/).
## Step 2. Split
Split the `Document` into chunks for embedding and vector storage.
```python
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
all_splits = text_splitter.split_documents(data)
```
### Go deeper
- `DocumentSplitters` are just one type of the more generic `DocumentTransformers`, which can all be useful in this preprocessing step.
- See further documentation on transformers [here](/docs/modules/data_connection/document_transformers/).
- `Context-aware splitters` keep the location ("context") of each split in the original `Document`:
- [Markdown files](/docs/use_cases/question_answering/document-context-aware-QA)
- [Code (py or js)](/docs/modules/data_connection/document_loaders/integrations/source_code)
- [Documents](/docs/modules/data_connection/document_loaders/integrations/grobid)
## Step 3. Store
To be able to look up our document splits, we first need to store them where we can later look them up.
The most common way to do this is to embed the contents of each document then store the embedding and document in a vector store, with the embedding being used to index the document.
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())
```
### Go deeper
- Browse the > 40 vectorstores integrations [here](https://integrations.langchain.com/).
- See further documentation on vectorstores [here](/docs/modules/data_connection/vectorstores/).
- Browse the > 30 text embedding integrations [here](https://integrations.langchain.com/).
- See further documentation on embedding models [here](/docs/modules/data_connection/text_embedding/).
Here are Steps 1-3:
![lc.png](/img/qa_data_load.png)
## Step 4. Retrieve
Retrieve relevant splits for any question using [similarity search](https://www.pinecone.io/learn/what-is-similarity-search/).
```python
question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question)
len(docs)
```
4
### Go deeper
Vectorstores are commonly used for retrieval, but they are not the only option. For example, SVMs (see thread [here](https://twitter.com/karpathy/status/1647025230546886658?s=20)) can also be used.
LangChain [has many retrievers](/docs/modules/data_connection/retrievers/) including, but not limited to, vectorstores. All retrievers implement a common method `get_relevant_documents()` (and its asynchronous variant `aget_relevant_documents()`).
```python
from langchain.retrievers import SVMRetriever
svm_retriever = SVMRetriever.from_documents(all_splits,OpenAIEmbeddings())
docs_svm=svm_retriever.get_relevant_documents(question)
len(docs_svm)
```
4
Some common ways to improve on vector similarity search include:
- `MultiQueryRetriever` [generates variants of the input question](/docs/modules/data_connection/retrievers/MultiQueryRetriever) to improve retrieval.
- `Max marginal relevance` selects for [relevance and diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf) among the retrieved documents.
- Documents can be filtered during retrieval using [`metadata` filters](/docs/use_cases/question_answering/how_to/document-context-aware-QA).
```python
import logging
from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever
logging.basicConfig()
logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)
retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectorstore.as_retriever(),
llm=ChatOpenAI(temperature=0))
unique_docs = retriever_from_llm.get_relevant_documents(query=question)
len(unique_docs)
```
INFO:langchain.retrievers.multi_query:Generated queries: ['1. How can Task Decomposition be approached?', '2. What are the different methods for Task Decomposition?', '3. What are the various approaches to decomposing tasks?']
5
## Step 5. Generate
Distill the retrieved documents into an answer using an LLM/Chat model (e.g., `gpt-3.5-turbo`) with `RetrievalQA` chain.
```python
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever())
qa_chain({"query": question})
```
{
'query': 'What are the approaches to Task Decomposition?',
'result': 'The approaches to task decomposition include:\n\n1. Simple prompting: This approach involves using simple prompts or questions to guide the agent in breaking down a task into smaller subgoals. For example, the agent can be prompted with "Steps for XYZ" and asked to list the subgoals for achieving XYZ.\n\n2. Task-specific instructions: In this approach, task-specific instructions are provided to the agent to guide the decomposition process. For example, if the task is to write a novel, the agent can be instructed to "Write a story outline" as a subgoal.\n\n3. Human inputs: This approach involves incorporating human inputs in the task decomposition process. Humans can provide guidance, feedback, and suggestions to help the agent break down complex tasks into manageable subgoals.\n\nThese approaches aim to enable efficient handling of complex tasks by breaking them down into smaller, more manageable parts.'
}
Note, you can pass in an `LLM` or a `ChatModel` (like we did here) to the `RetrievalQA` chain.
### Go deeper
#### Choosing LLMs
- Browse the > 55 LLM and chat model integrations [here](https://integrations.langchain.com/).
- See further documentation on LLMs and chat models [here](/docs/modules/model_io/models/).
- Use local LLMS: The popularity of [PrivateGPT](https://github.com/imartinez/privateGPT) and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally.
Using `GPT4All` is as simple as [downloading the binary]((/docs/integrations/llms/gpt4all)) and then:
from langchain.llms import GPT4All
from langchain.chains import RetrievalQA
llm = GPT4All(model="/Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin",max_tokens=2048)
qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever())
#### Customizing the prompt
The prompt in `RetrievalQA` chain can be easily customized.
```python
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectorstore.as_retriever(),
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)
result = qa_chain({"query": question})
result["result"]
```
'The approaches to Task Decomposition are (1) using simple prompting by LLM, (2) using task-specific instructions, and (3) with human inputs. Thanks for asking!'
#### Return source documents
The full set of retrieved documents used for answer distillation can be returned using `return_source_documents=True`.
```python
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever(),
return_source_documents=True)
result = qa_chain({"query": question})
print(len(result['source_documents']))
result['source_documents'][0]
```
4
Document(page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en'})
#### Return citations
Answer citations can be returned using `RetrievalQAWithSourcesChain`.
```python
from langchain.chains import RetrievalQAWithSourcesChain
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=vectorstore.as_retriever())
result = qa_chain({"question": question})
result
```
{
'question': 'What are the approaches to Task Decomposition?',
'answer': 'The approaches to Task Decomposition include (1) using LLM with simple prompting, (2) using task-specific instructions, and (3) incorporating human inputs.\n',
'sources': 'https://lilianweng.github.io/posts/2023-06-23-agent/'
}
#### Customizing retrieved document processing
Retrieved documents can be fed to an LLM for answer distillation in a few different ways.
`stuff`, `refine`, `map-reduce`, and `map-rerank` chains for passing documents to an LLM prompt are well summarized [here](/docs/modules/chains/document/).
`stuff` is commonly used because it simply "stuffs" all retrieved documents into the prompt.
The [load_qa_chain](/docs/use_cases/question_answering/how_to/question_answering.html) is an easy way to pass documents to an LLM using these various approaches (e.g., see `chain_type`).
```python
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff")
chain({"input_documents": unique_docs, "question": question},return_only_outputs=True)
```
{'output_text': 'The approaches to task decomposition include (1) using simple prompting to break down tasks into subgoals, (2) providing task-specific instructions to guide the decomposition process, and (3) incorporating human inputs for task decomposition.'}
We can also pass the `chain_type` to `RetrievalQA`.
```python
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever(),
chain_type="stuff")
result = qa_chain({"query": question})
```
In summary, the user can choose the desired level of abstraction for QA:
![summary_chains.png](/img/summary_chains.png)
## Step 6. Converse (Extension)
To hold a conversation, a chain needs to be able to refer to past interactions. Chain `Memory` allows us to do this. To keep chat history, we can specify a Memory buffer to track the conversation inputs / outputs.
```python
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
```
The `ConversationalRetrievalChain` uses chat in the `Memory buffer`.
```python
from langchain.chains import ConversationalRetrievalChain
retriever = vectorstore.as_retriever()
chat = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)
```
```python
result = chat({"question": "What are some of the main ideas in self-reflection?"})
result['answer']
```
"Some of the main ideas in self-reflection include:\n1. Iterative improvement: Self-reflection allows autonomous agents to improve by refining past action decisions and correcting mistakes.\n2. Trial and error: Self-reflection is crucial in real-world tasks where trial and error are inevitable.\n3. Two-shot examples: Self-reflection is created by showing pairs of failed trajectories and ideal reflections for guiding future changes in the plan.\n4. Working memory: Reflections are added to the agent's working memory, up to three, to be used as context for querying.\n5. Performance evaluation: Self-reflection involves continuously reviewing and analyzing actions, self-criticizing behavior, and reflecting on past decisions and strategies to refine approaches.\n6. Efficiency: Self-reflection encourages being smart and efficient, aiming to complete tasks in the least number of steps."
The Memory buffer has context to resolve `"it"` ("self-reflection") in the below question.
```python
result = chat({"question": "How does the Reflexion paper handle it?"})
result['answer']
```
"The Reflexion paper handles self-reflection by showing two-shot examples to the Learning Language Model (LLM). Each example consists of a failed trajectory and an ideal reflection that guides future changes in the agent's plan. These reflections are then added to the agent's working memory, up to a maximum of three, to be used as context for querying the LLM. This allows the agent to iteratively improve its reasoning skills by refining past action decisions and correcting previous mistakes."
### Go deeper
The [documentation](/docs/use_cases/question_answering/how_to/chat_vector_db) on `ConversationalRetrievalChain` offers a few extensions, such as streaming and source documents.
## Further reading
- Check out the [How to](/docs/use_cases/question_answer/how_to/) section for all the variations of chains that can be used for QA over docs in different settings.
- Check out the [Integrations-specific](/docs/use_cases/question_answer/integrations/) section for chains that use specific integrations.

@ -0,0 +1,686 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "5151afed",
"metadata": {},
"source": [
"# Question Answering\n",
"\n",
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/question_answering/qa.ipynb)\n",
"\n",
"## Use case\n",
"Suppose you have some text documents (PDF, blog, Notion pages, etc.) and want to ask questions related to the contents of those documents. LLMs, given their proficiency in understanding text, are a great tool for this.\n",
"\n",
"In this walkthrough we'll go over how to build a question-answering over documents application using LLMs. Two very related use cases which we cover elsewhere are:\n",
"- [QA over structured data](/docs/use_cases/sql) (e.g., SQL)\n",
"- [QA over code](/docs/use_cases/code) (e.g., Python)\n",
"\n",
"![intro.png](/img/qa_intro.png)\n",
"\n",
"## Overview\n",
"The pipeline for converting raw unstructured data into a QA chain looks like this:\n",
"1. `Loading`: First we need to load our data. Unstructured data can be loaded from many sources. Use the [LangChain integration hub](https://integrations.langchain.com/) to browse the full set of loaders.\n",
"Each loader returns data as a LangChain [`Document`](/docs/components/schema/document).\n",
"2. `Splitting`: [Text splitters](/docs/modules/data_connection/document_transformers/) break `Documents` into splits of specified size\n",
"3. `Storage`: Storage (e.g., often a [vectorstore](/docs/modules/data_connection/vectorstores/)) will house [and often embed](https://www.pinecone.io/learn/vector-embeddings/) the splits\n",
"4. `Retrieval`: The app retrieves splits from storage (e.g., often [with similar embeddings](https://www.pinecone.io/learn/k-nearest-neighbor/) to the input question)\n",
"5. `Generation`: An [LLM](/docs/modules/model_io/models/llms/) produces an answer using a prompt that includes the question and the retrieved data\n",
"6. `Conversation` (Extension): Hold a multi-turn conversation by adding [Memory](/docs/modules/memory/) to your QA chain.\n",
"\n",
"![flow.jpeg](/img/qa_flow.jpeg)\n",
"\n",
"## Quickstart\n",
"\n",
"To give you a sneak preview, the above pipeline can be all be wrapped in a single object: `VectorstoreIndexCreator`. Suppose we want a QA app over this [blog post](https://lilianweng.github.io/posts/2023-06-23-agent/). We can create this in a few lines of code. First set environment variables and install packages:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e14b744b",
"metadata": {},
"outputs": [],
"source": [
"pip install openai chromadb\n",
"\n",
"# Set env var OPENAI_API_KEY or load from a .env file\n",
"# import dotenv\n",
"\n",
"# dotenv.load_env()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "046cefc0",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import WebBaseLoader\n",
"from langchain.indexes import VectorstoreIndexCreator\n",
"\n",
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
"index = VectorstoreIndexCreator().from_loaders([loader])"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f4bf8740",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done using LLM with simple prompting, task-specific instructions, or with human inputs. Tree of Thoughts (Yao et al. 2023) is an extension of Chain of Thought (Wei et al. 2022) which explores multiple reasoning possibilities at each step.'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"index.query(\"What is Task Decomposition?\")"
]
},
{
"cell_type": "markdown",
"id": "8224aad6",
"metadata": {},
"source": [
"Ok, but what's going on under the hood, and how could we customize this for our specific use case? For that, let's take a look at how we can construct this pipeline piece by piece."
]
},
{
"cell_type": "markdown",
"id": "ba5daed6",
"metadata": {},
"source": [
"## Step 1. Load\n",
"\n",
"Specify a `DocumentLoader` to load in your unstructured data as `Documents`. A `Document` is a piece of text (the `page_content`) and associated metadata."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "cf4d5c72",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import WebBaseLoader\n",
"\n",
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
"data = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "fd2cc9a7",
"metadata": {},
"source": [
"### Go deeper\n",
"- Browse the > 120 data loader integrations [here](https://integrations.langchain.com/).\n",
"- See further documentation on loaders [here](/docs/modules/data_connection/document_loaders/).\n",
"\n",
"## Step 2. Split\n",
"\n",
"Split the `Document` into chunks for embedding and vector storage."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4b11c01d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)\n",
"all_splits = text_splitter.split_documents(data)"
]
},
{
"cell_type": "markdown",
"id": "0a33bd4d",
"metadata": {},
"source": [
"### Go deeper\n",
"\n",
"- `DocumentSplitters` are just one type of the more generic `DocumentTransformers`, which can all be useful in this preprocessing step.\n",
"- See further documentation on transformers [here](/docs/modules/data_connection/document_transformers/).\n",
"- `Context-aware splitters` keep the location (\"context\") of each split in the original `Document`:\n",
" - [Markdown files](/docs/use_cases/question_answering/how_to/document-context-aware-QA)\n",
" - [Code (py or js)](docs/integrations/document_loaders/source_code)\n",
" - [Documents](/docs/integrations/document_loaders/grobid)\n",
"\n",
"## Step 3. Store\n",
"\n",
"To be able to look up our document splits, we first need to store them where we can later look them up.\n",
"The most common way to do this is to embed the contents of each document then store the embedding and document in a vector store, with the embedding being used to index the document."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "e9c302c8",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"\n",
"vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())"
]
},
{
"cell_type": "markdown",
"id": "dc6f22b0",
"metadata": {},
"source": [
"### Go deeper\n",
"- Browse the > 40 vectorstores integrations [here](https://integrations.langchain.com/).\n",
"- See further documentation on vectorstores [here](/docs/modules/data_connection/vectorstores/).\n",
"- Browse the > 30 text embedding integrations [here](https://integrations.langchain.com/).\n",
"- See further documentation on embedding models [here](/docs/modules/data_connection/text_embedding/).\n",
"\n",
" Here are Steps 1-3:\n",
"\n",
"![lc.png](/img/qa_data_load.png)\n",
"\n",
"## Step 4. Retrieve\n",
"\n",
"Retrieve relevant splits for any question using [similarity search](https://www.pinecone.io/learn/what-is-similarity-search/)."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e2c26b7d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"question = \"What are the approaches to Task Decomposition?\"\n",
"docs = vectorstore.similarity_search(question)\n",
"len(docs)"
]
},
{
"cell_type": "markdown",
"id": "5d5a113b",
"metadata": {},
"source": [
"### Go deeper\n",
"\n",
"Vectorstores are commonly used for retrieval, but they are not the only option. For example, SVMs (see thread [here](https://twitter.com/karpathy/status/1647025230546886658?s=20)) can also be used.\n",
"\n",
"LangChain [has many retrievers](/docs/modules/data_connection/retrievers/) including, but not limited to, vectorstores. All retrievers implement a common method `get_relevant_documents()` (and its asynchronous variant `aget_relevant_documents()`)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c901eaee",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.retrievers import SVMRetriever\n",
"\n",
"svm_retriever = SVMRetriever.from_documents(all_splits,OpenAIEmbeddings())\n",
"docs_svm=svm_retriever.get_relevant_documents(question)\n",
"len(docs_svm)"
]
},
{
"cell_type": "markdown",
"id": "69de3d54",
"metadata": {},
"source": [
"Some common ways to improve on vector similarity search include:\n",
"- `MultiQueryRetriever` [generates variants of the input question](/docs/modules/data_connection/retrievers/MultiQueryRetriever) to improve retrieval.\n",
"- `Max marginal relevance` selects for [relevance and diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf) among the retrieved documents.\n",
"- Documents can be filtered during retrieval using [`metadata` filters](/docs/use_cases/question_answering/how_to/document-context-aware-QA)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "c690f01a",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:langchain.retrievers.multi_query:Generated queries: ['1. How can Task Decomposition be approached?', '2. What are the different methods for Task Decomposition?', '3. What are the various approaches to decomposing tasks?']\n"
]
},
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import logging\n",
"\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.retrievers.multi_query import MultiQueryRetriever\n",
"\n",
"logging.basicConfig()\n",
"logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)\n",
"\n",
"retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectorstore.as_retriever(),\n",
" llm=ChatOpenAI(temperature=0))\n",
"unique_docs = retriever_from_llm.get_relevant_documents(query=question)\n",
"len(unique_docs)"
]
},
{
"cell_type": "markdown",
"id": "415d6824",
"metadata": {},
"source": [
"## Step 5. Generate\n",
"\n",
"Distill the retrieved documents into an answer using an LLM/Chat model (e.g., `gpt-3.5-turbo`) with `RetrievalQA` chain.\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "99fa1aec",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'query': 'What are the approaches to Task Decomposition?',\n",
" 'result': 'There are three approaches to task decomposition:\\n\\n1. Using Language Model with simple prompting: This approach involves using a Language Model (LLM) with simple prompts like \"Steps for XYZ\" or \"What are the subgoals for achieving XYZ?\" to guide the task decomposition process.\\n\\n2. Using task-specific instructions: In this approach, task-specific instructions are provided to guide the task decomposition. For example, for the task of writing a novel, an instruction like \"Write a story outline\" can be given to help decompose the task into smaller subtasks.\\n\\n3. Human inputs: Task decomposition can also be done with the help of human inputs. This involves getting input and guidance from humans to break down a complex task into smaller, more manageable subtasks.'}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
"qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever())\n",
"qa_chain({\"query\": question})"
]
},
{
"cell_type": "markdown",
"id": "f7d52c84",
"metadata": {},
"source": [
"Note, you can pass in an `LLM` or a `ChatModel` (like we did here) to the `RetrievalQA` chain.\n",
"\n",
"### Go deeper\n",
"\n",
"#### Choosing LLMs\n",
"- Browse the > 55 LLM and chat model integrations [here](https://integrations.langchain.com/).\n",
"- See further documentation on LLMs and chat models [here](/docs/modules/model_io/models/).\n",
"- Use local LLMS: The popularity of [PrivateGPT](https://github.com/imartinez/privateGPT) and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally.\n",
"Using `GPT4All` is as simple as [downloading the binary]((/docs/integrations/llms/gpt4all)) and then:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "02d6c9dc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found model file at /Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"objc[61331]: Class GGMLMetalClass is implemented in both /Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libreplit-mainline-metal.dylib (0x2e3384208) and /Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libllamamodel-mainline-metal.dylib (0x2e37b0208). One of the two will be used. Which one is undefined.\n",
"llama.cpp: using Metal\n",
"llama.cpp: loading model from /Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\n",
"llama_model_load_internal: format = ggjt v3 (latest)\n",
"llama_model_load_internal: n_vocab = 32001\n",
"llama_model_load_internal: n_ctx = 2048\n",
"llama_model_load_internal: n_embd = 5120\n",
"llama_model_load_internal: n_mult = 256\n",
"llama_model_load_internal: n_head = 40\n",
"llama_model_load_internal: n_layer = 40\n",
"llama_model_load_internal: n_rot = 128\n",
"llama_model_load_internal: ftype = 2 (mostly Q4_0)\n",
"llama_model_load_internal: n_ff = 13824\n",
"llama_model_load_internal: n_parts = 1\n",
"llama_model_load_internal: model size = 13B\n",
"llama_model_load_internal: ggml ctx size = 0.09 MB\n",
"llama_model_load_internal: mem required = 9031.71 MB (+ 1608.00 MB per state)\n",
"llama_new_context_with_model: kv self size = 1600.00 MB\n",
"ggml_metal_init: allocating\n",
"ggml_metal_init: using MPS\n",
"ggml_metal_init: loading '/Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/ggml-metal.metal'\n",
"ggml_metal_init: loaded kernel_add 0x2bbbbc2f0\n",
"ggml_metal_init: loaded kernel_mul 0x2bbbba840\n",
"ggml_metal_init: loaded kernel_mul_row 0x2bb917dd0\n",
"ggml_metal_init: loaded kernel_scale 0x2bb918150\n",
"ggml_metal_init: loaded kernel_silu 0x2bb9184d0\n",
"ggml_metal_init: loaded kernel_relu 0x2bb918850\n",
"ggml_metal_init: loaded kernel_gelu 0x2bbbc3f10\n",
"ggml_metal_init: loaded kernel_soft_max 0x2bbbc5840\n",
"ggml_metal_init: loaded kernel_diag_mask_inf 0x2bbbc4c70\n",
"ggml_metal_init: loaded kernel_get_rows_f16 0x2bbbc5fc0\n",
"ggml_metal_init: loaded kernel_get_rows_q4_0 0x2bbbc6720\n",
"ggml_metal_init: loaded kernel_get_rows_q4_1 0x2bb918c10\n",
"ggml_metal_init: loaded kernel_get_rows_q2_k 0x2bbbc51b0\n",
"ggml_metal_init: loaded kernel_get_rows_q3_k 0x2bbbc7630\n",
"ggml_metal_init: loaded kernel_get_rows_q4_k 0x2d4394e30\n",
"ggml_metal_init: loaded kernel_get_rows_q5_k 0x2bbbc7890\n",
"ggml_metal_init: loaded kernel_get_rows_q6_k 0x2d4395210\n",
"ggml_metal_init: loaded kernel_rms_norm 0x2bbbc8740\n",
"ggml_metal_init: loaded kernel_norm 0x2bbbc8b30\n",
"ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x2d4395470\n",
"ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x2d4395a70\n",
"ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x1242b1a00\n",
"ggml_metal_init: loaded kernel_mul_mat_q2_k_f32 0x29f17d1c0\n",
"ggml_metal_init: loaded kernel_mul_mat_q3_k_f32 0x2d4396050\n",
"ggml_metal_init: loaded kernel_mul_mat_q4_k_f32 0x2bbbc98a0\n",
"ggml_metal_init: loaded kernel_mul_mat_q5_k_f32 0x2bbbca4a0\n",
"ggml_metal_init: loaded kernel_mul_mat_q6_k_f32 0x2bbbcae90\n",
"ggml_metal_init: loaded kernel_rope 0x2bbbca700\n",
"ggml_metal_init: loaded kernel_alibi_f32 0x2bbbcc6e0\n",
"ggml_metal_init: loaded kernel_cpy_f32_f16 0x2bbbccf90\n",
"ggml_metal_init: loaded kernel_cpy_f32_f32 0x2bbbcd900\n",
"ggml_metal_init: loaded kernel_cpy_f16_f16 0x2bbbce1f0\n",
"ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB\n",
"ggml_metal_init: hasUnifiedMemory = true\n",
"ggml_metal_init: maxTransferRate = built-in GPU\n",
"ggml_metal_add_buffer: allocated 'data ' buffer, size = 6984.06 MB, ( 6984.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'eval ' buffer, size = 1024.00 MB, ( 8008.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1602.00 MB, ( 9610.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'scr0 ' buffer, size = 512.00 MB, (10122.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 512.00 MB, (10634.45 / 21845.34)\n"
]
}
],
"source": [
"from langchain.llms import GPT4All\n",
"from langchain.chains import RetrievalQA\n",
"\n",
"llm = GPT4All(model=\"/Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\",max_tokens=2048)\n",
"qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever())"
]
},
{
"cell_type": "markdown",
"id": "fa82f437",
"metadata": {},
"source": [
"#### Customizing the prompt\n",
"\n",
"The prompt in `RetrievalQA` chain can be easily customized."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "e4fee704",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"ggml_metal_free: deallocating\n"
]
},
{
"data": {
"text/plain": [
"'The approaches to task decomposition include using LLM with simple prompting, task-specific instructions, or human inputs. Thanks for asking!'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain.prompts import PromptTemplate\n",
"\n",
"template = \"\"\"Use the following pieces of context to answer the question at the end. \n",
"If you don't know the answer, just say that you don't know, don't try to make up an answer. \n",
"Use three sentences maximum and keep the answer as concise as possible. \n",
"Always say \"thanks for asking!\" at the end of the answer. \n",
"{context}\n",
"Question: {question}\n",
"Helpful Answer:\"\"\"\n",
"QA_CHAIN_PROMPT = PromptTemplate.from_template(template)\n",
"\n",
"llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
"qa_chain = RetrievalQA.from_chain_type(\n",
" llm,\n",
" retriever=vectorstore.as_retriever(),\n",
" chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT}\n",
")\n",
"result = qa_chain({\"query\": question})\n",
"result[\"result\"]"
]
},
{
"cell_type": "markdown",
"id": "ff40e8db",
"metadata": {},
"source": [
"#### Return source documents\n",
"\n",
"The full set of retrieved documents used for answer distillation can be returned using `return_source_documents=True`."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "60004293",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4\n"
]
},
{
"data": {
"text/plain": [
"Document(page_content='Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\"})"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import RetrievalQA\n",
"\n",
"qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever(),\n",
" return_source_documents=True)\n",
"result = qa_chain({\"query\": question})\n",
"print(len(result['source_documents']))\n",
"result['source_documents'][0]"
]
},
{
"cell_type": "markdown",
"id": "1b600236",
"metadata": {},
"source": [
"#### Return citations\n",
"\n",
"Answer citations can be returned using `RetrievalQAWithSourcesChain`."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "948f6d19",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What are the approaches to Task Decomposition?',\n",
" 'answer': 'The approaches to Task Decomposition include:\\n1. Using LLM with simple prompting, such as providing steps or subgoals for achieving a task.\\n2. Using task-specific instructions, such as providing a specific instruction like \"Write a story outline\" for writing a novel.\\n3. Using human inputs to decompose the task.\\nAnother approach is the Tree of Thoughts, which extends the Chain of Thought (CoT) technique by exploring multiple reasoning possibilities at each step and generating multiple thoughts per step, creating a tree structure. The search process can be BFS or DFS, and each state can be evaluated by a classifier or majority vote.\\nSources: https://lilianweng.github.io/posts/2023-06-23-agent/',\n",
" 'sources': ''}"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import RetrievalQAWithSourcesChain\n",
"\n",
"qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=vectorstore.as_retriever())\n",
"\n",
"result = qa_chain({\"question\": question})\n",
"result"
]
},
{
"cell_type": "markdown",
"id": "73d0b138",
"metadata": {},
"source": [
"#### Customizing retrieved document processing\n",
"\n",
"Retrieved documents can be fed to an LLM for answer distillation in a few different ways.\n",
"\n",
"`stuff`, `refine`, `map-reduce`, and `map-rerank` chains for passing documents to an LLM prompt are well summarized [here](/docs/modules/chains/document/).\n",
" \n",
"`stuff` is commonly used because it simply \"stuffs\" all retrieved documents into the prompt.\n",
"\n",
"The [load_qa_chain](/docs/use_cases/question_answering/how_to/question_answering.html) is an easy way to pass documents to an LLM using these various approaches (e.g., see `chain_type`)."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "29aa139f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': 'The approaches to task decomposition mentioned in the provided context are:\\n\\n1. Chain of thought (CoT): This approach involves instructing the language model to \"think step by step\" and decompose complex tasks into smaller and simpler steps. It enhances model performance on complex tasks by utilizing more test-time computation.\\n\\n2. Tree of Thoughts: This approach extends CoT by exploring multiple reasoning possibilities at each step. It decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS or DFS, and each state is evaluated by a classifier or majority vote.\\n\\n3. LLM with simple prompting: This approach involves using a language model with simple prompts like \"Steps for XYZ\" or \"What are the subgoals for achieving XYZ?\" to perform task decomposition.\\n\\n4. Task-specific instructions: This approach involves providing task-specific instructions to guide the language model in decomposing the task. For example, providing the instruction \"Write a story outline\" for the task of writing a novel.\\n\\n5. Human inputs: Task decomposition can also be done with human inputs, where humans provide guidance and input to break down the task into smaller subtasks.'}"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains.question_answering import load_qa_chain\n",
"\n",
"chain = load_qa_chain(llm, chain_type=\"stuff\")\n",
"chain({\"input_documents\": unique_docs, \"question\": question},return_only_outputs=True)"
]
},
{
"cell_type": "markdown",
"id": "a8cb8cd1",
"metadata": {},
"source": [
"We can also pass the `chain_type` to `RetrievalQA`."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "f68574bd",
"metadata": {},
"outputs": [],
"source": [
"qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever(),\n",
" chain_type=\"stuff\")\n",
"result = qa_chain({\"query\": question})"
]
},
{
"cell_type": "markdown",
"id": "b33aeb5f",
"metadata": {},
"source": [
"In summary, the user can choose the desired level of abstraction for QA:\n",
"\n",
"![summary_chains.png](/img/summary_chains.png)\n",
"\n",
"## Step 6. Chat\n",
"\n",
"See our [use-case on chat](/docs/use_cases/chatbots) for detail on this!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading…
Cancel
Save