# VectorDB Question Answering with Sources

This notebook goes over how to do question-answering with sources over a vector database. It does this by using the `VectorDBQAWithSourcesChain`, which does the lookup of the documents from a vector database. 

In [1]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch
from langchain.vectorstores import Chroma

In [2]:
with open('../../state_of_the_union.txt') as f:
    state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

embeddings = OpenAIEmbeddings()

In [5]:
docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{"source": f"{i}-pl"} for i in range(len(texts))])

Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.
Exiting: Cleaning up .chroma directory


In [6]:
from langchain.chains import VectorDBQAWithSourcesChain

In [7]:
from langchain import OpenAI

chain = VectorDBQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type="stuff", vectorstore=docsearch)

In [8]:
chain({"question": "What did the president say about Justice Breyer"}, return_only_outputs=True)

{'answer': ' The president thanked Justice Breyer for his service and mentioned his legacy of excellence.\n',
 'sources': '30-pl'}

## Chain Type
You can easily specify different chain types to load and use in the VectorDBQAWithSourcesChain chain. For a more detailed walkthrough of these types, please see [this notebook](qa_with_sources.ipynb).

There are two ways to load different chain types. First, you can specify the chain type argument in the `from_chain_type` method. This allows you to pass in the name of the chain type you want to use. For example, in the below we change the chain type to `map_reduce`.

In [8]:
chain = VectorDBQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type="map_reduce", vectorstore=docsearch)

In [9]:
chain({"question": "What did the president say about Justice Breyer"}, return_only_outputs=True)

{'answer': ' The president honored Justice Stephen Breyer for his service.\n',
 'sources': '30-pl'}

The above way allows you to really simply change the chain_type, but it does provide a ton of flexibility over parameters to that chain type. If you want to control those parameters, you can load the chain directly (as you did in [this notebook](qa_with_sources.ipynb)) and then pass that directly to the the VectorDBQA chain with the `combine_documents_chain` parameter. For example:

In [12]:
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
qa_chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff")
qa = VectorDBQAWithSourcesChain(combine_documents_chain=qa_chain, vectorstore=docsearch)

In [11]:
qa({"question": "What did the president say about Justice Breyer"}, return_only_outputs=True)

{'answer': ' The president honored Justice Stephen Breyer for his service.\n',
 'sources': '30-pl'}