langchain/docs/examples/data_augmented_generation/qa_with_sources.ipynb

349 lines
14 KiB
Plaintext
Raw Normal View History

2022-12-08 06:56:26 +00:00
{
"cells": [
{
"cell_type": "markdown",
"id": "74148cee",
"metadata": {},
"source": [
"# Question Answering with Sources\n",
"\n",
"This notebook walks through how to use LangChain for question answering with sources over a list of documents. It covers three different chain types: `stuff`, `map_reduce`, and `refine`. For a more in depth explanation of what these chain types are, see [here](../../explanation/combine_docs.md)."
]
},
{
"cell_type": "markdown",
"id": "ca2f0efc",
"metadata": {},
"source": [
"### Prepare Data\n",
"First we prepare the data. For this example we do similarity search over a vector database, but these documents could be fetched in any manner (the point of this notebook to highlight what to do AFTER you fetch the documents)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "78f28130",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.embeddings.cohere import CohereEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch\n",
"from langchain.vectorstores.faiss import FAISS\n",
"from langchain.docstore.document import Document"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4da195a3",
"metadata": {},
"outputs": [],
"source": [
"with open('../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
2022-12-08 15:59:58 +00:00
"execution_count": 3,
2022-12-08 06:56:26 +00:00
"id": "5ec2b55b",
"metadata": {},
"outputs": [],
"source": [
"docsearch = FAISS.from_texts(texts, embeddings, metadatas=[{\"source\": i} for i in range(len(texts))])"
]
},
{
"cell_type": "code",
2022-12-08 15:59:58 +00:00
"execution_count": 4,
2022-12-08 06:56:26 +00:00
"id": "5286f58f",
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
2022-12-08 15:59:58 +00:00
"execution_count": 5,
2022-12-08 06:56:26 +00:00
"id": "005a47e9",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.qa_with_sources import load_qa_with_sources_chain\n",
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "markdown",
"id": "d82f899a",
"metadata": {},
"source": [
"### The `stuff` Chain\n",
"\n",
"This sections shows results of using the `stuff` Chain to do question answering with sources."
]
},
{
"cell_type": "code",
2022-12-08 15:59:58 +00:00
"execution_count": 6,
2022-12-08 06:56:26 +00:00
"id": "fc1a5ed6",
"metadata": {},
"outputs": [],
"source": [
"chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"stuff\")"
]
},
{
"cell_type": "code",
2022-12-08 15:59:58 +00:00
"execution_count": 7,
2022-12-08 06:56:26 +00:00
"id": "e239964b",
"metadata": {},
"outputs": [],
"source": [
"docs = [Document(page_content=t, metadata={\"source\": i}) for i, t in enumerate(texts[:3])]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "7d766417",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': ' The president did not mention Justice Breyer.\\nSOURCES: 0-pl, 1-pl, 2-pl'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
]
},
{
"cell_type": "markdown",
"id": "c5dbb304",
"metadata": {},
"source": [
"### The `map_reduce` Chain\n",
"\n",
"This sections shows results of using the `map_reduce` Chain to do question answering with sources."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "921db0a4",
"metadata": {},
"outputs": [],
"source": [
"chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"map_reduce\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "e417926a",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\n",
"Token indices sequence length is longer than the specified maximum sequence length for this model (1546 > 1024). Running this sequence through the model will result in indexing errors\n"
]
},
2022-12-08 06:56:26 +00:00
{
"data": {
"text/plain": [
"{'output_text': ' The president did not mention Justice Breyer.\\nSOURCES: 0, 1, 2'}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
]
},
{
"cell_type": "markdown",
"id": "ae2f6d97",
"metadata": {},
"source": [
"**Intermediate Steps**\n",
"\n",
"We can also return the intermediate steps for `map_reduce` chains, should we want to inspect them. This is done with the `return_map_steps` variable."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "15af265f",
"metadata": {},
"outputs": [],
"source": [
"chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"map_reduce\", return_map_steps=True)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "21b136e5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'map_steps': [{'text': ' \"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\"'},\n",
" {'text': ' None'},\n",
" {'text': ' None'},\n",
" {'text': ' None'}],\n",
" 'output_text': ' The president thanked Justice Breyer for his service.\\nSOURCES: 30-pl'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
]
},
2022-12-08 06:56:26 +00:00
{
"cell_type": "markdown",
"id": "5bf0e1ab",
"metadata": {},
"source": [
"### The `refine` Chain\n",
"\n",
"This sections shows results of using the `refine` Chain to do question answering with sources."
]
},
{
"cell_type": "code",
2022-12-08 15:59:58 +00:00
"execution_count": 11,
2022-12-08 06:56:26 +00:00
"id": "904835c8",
"metadata": {},
"outputs": [],
"source": [
"chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"refine\")"
]
},
{
"cell_type": "code",
2022-12-11 07:16:32 +00:00
"execution_count": 12,
2022-12-08 06:56:26 +00:00
"id": "f60875c6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': \"\\n\\nThe president did not mention Justice Breyer in his speech to the European Parliament, which focused on building a coalition of freedom-loving nations to confront Putin, unifying European allies, countering Russia's lies with truth, and enforcing powerful economic sanctions. Source: 2\"}"
2022-12-08 06:56:26 +00:00
]
},
2022-12-11 07:16:32 +00:00
"execution_count": 12,
2022-12-08 06:56:26 +00:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
2022-12-08 15:59:58 +00:00
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
2022-12-08 06:56:26 +00:00
]
},
{
"cell_type": "markdown",
"id": "ac357530",
"metadata": {},
"source": [
"**Intermediate Steps**\n",
"\n",
"We can also return the intermediate steps for `refine` chains, should we want to inspect them. This is done with the `return_refine_steps` variable."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3396a773",
"metadata": {},
"outputs": [],
"source": [
"chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"refine\", return_refine_steps=True)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "be5739ef",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'refine_steps': ['\\nThe president said that he was honoring Justice Breyer for his dedication to serving the country and that he was a retiring Justice of the United States Supreme Court.',\n",
" \"\\n\\nThe president said that he was honoring Justice Breyer for his dedication to serving the country, his career as a top litigator in private practice, a former federal public defender, and his family of public school educators and police officers. He also noted Justice Breyer's consensus-building skills and the broad range of support he has received from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. He also highlighted the importance of advancing liberty and justice by securing the border and fixing the immigration system, noting the new technology and joint patrols with Mexico and Guatemala to catch more human traffickers, as well as the dedicated immigration judges and commitments to support partners in South and Central America to host more refugees and secure their own borders. \\nSource: 31\",\n",
" \"\\n\\nThe president said that he was honoring Justice Breyer for his dedication to serving the country, his career as a top litigator in private practice, a former federal public defender, and his family of public school educators and police officers. He also noted Justice Breyer's consensus-building skills and the broad range of support he has received from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. He also highlighted the importance of advancing liberty and justice by securing the border and fixing the immigration system, noting the new technology and joint patrols with Mexico and Guatemala to catch more human traffickers, as well as the dedicated immigration judges and commitments to support partners in South and Central America to host more refugees and secure their own borders. Additionally, he mentioned the need for the bipartisan Equality Act to be passed and signed into law, and the importance of strengthening the Violence Against Women Act. He also offered a Unity Agenda for the Nation, which includes beating the opioid epidemic. \\nSource: 31, 33\",\n",
" \"\\n\\nThe president said that he was honoring Justice Breyer for his dedication to serving the country, his career as a top litigator in private practice, a former federal public defender, and his family of public school educators and police officers. He also noted Justice Breyer's consensus-building skills and the broad range of support he has received from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. He also highlighted the importance of advancing liberty and justice by securing the border and fixing the immigration system, noting the new technology and joint patrols with Mexico and Guatemala to catch more human traffickers, as well as the dedicated immigration judges and commitments to support partners in South and Central America to host more refugees and secure their own borders. Additionally, he mentioned the need for the bipartisan Equality Act to be passed and signed into law, and the importance of strengthening the Violence Against Women Act. He also offered a Unity Agenda for the Nation, which includes beating the opioid epidemic, and announced that the Justice Department will name a chief prosecutor for pandemic fraud. Source: 31, 33, 20\"],\n",
" 'output_text': \"\\n\\nThe president said that he was honoring Justice Breyer for his dedication to serving the country, his career as a top litigator in private practice, a former federal public defender, and his family of public school educators and police officers. He also noted Justice Breyer's consensus-building skills and the broad range of support he has received from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. He also highlighted the importance of advancing liberty and justice by securing the border and fixing the immigration system, noting the new technology and joint patrols with Mexico and Guatemala to catch more human traffickers, as well as the dedicated immigration judges and commitments to support partners in South and Central America to host more refugees and secure their own borders. Additionally, he mentioned the need for the bipartisan Equality Act to be passed and signed into law, and the importance of strengthening the Violence Against Women Act. He also offered a Unity Agenda for the Nation, which includes beating the opioid epidemic, and announced that the Justice Department will name a chief prosecutor for pandemic fraud. Source: 31, 33, 20\"}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
]
},
2022-12-08 06:56:26 +00:00
{
"cell_type": "code",
"execution_count": null,
"id": "7355fedd",
2022-12-08 06:56:26 +00:00
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}