# Question Answering with Sources

This notebook walks through how to use LangChain for question answering with sources over a list of documents. It covers three different chain types: `stuff`, `map_reduce`, and `refine`. For a more in depth explanation of what these chain types are, see [here](../../explanation/combine_docs.md).

### Prepare Data
First we prepare the data. For this example we do similarity search over a vector database, but these documents could be fetched in any manner (the point of this notebook to highlight what to do AFTER you fetch the documents).

In [1]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch
from langchain.vectorstores.faiss import FAISS
from langchain.docstore.document import Document

In [2]:
with open('../state_of_the_union.txt') as f:
 state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

embeddings = OpenAIEmbeddings()

In [3]:
docsearch = FAISS.from_texts(texts, embeddings, metadatas=[{"source": i} for i in range(len(texts))])

In [4]:
query = "What did the president say about Justice Breyer"
docs = docsearch.similarity_search(query)

In [5]:
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.llms import OpenAI

### The `stuff` Chain

This sections shows results of using the `stuff` Chain to do question answering with sources.

In [6]:
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff")

In [7]:
docs = [Document(page_content=t, metadata={"source": i}) for i, t in enumerate(texts[:3])]

In [8]:
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': ' The president did not mention Justice Breyer.\nSOURCES: 0-pl, 1-pl, 2-pl'}

### The `map_reduce` Chain

This sections shows results of using the `map_reduce` Chain to do question answering with sources.

In [9]:
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="map_reduce")

In [10]:
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Token indices sequence length is longer than the specified maximum sequence length for this model (1546 > 1024). Running this sequence through the model will result in indexing errors


{'output_text': ' The president did not mention Justice Breyer.\nSOURCES: 0, 1, 2'}

### The `refine` Chain

This sections shows results of using the `refine` Chain to do question answering with sources.

In [11]:
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="refine")

In [12]:
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': "\n\nThe president did not mention Justice Breyer in his speech to the European Parliament, which focused on building a coalition of freedom-loving nations to confront Putin, unifying European allies, countering Russia's lies with truth, and enforcing powerful economic sanctions. Source: 2"}