langchain/docs/extras/use_cases/question_answering/index.mdx

# Question answering over documents

Question answering in this context refers to question answering over your document data.
For question answering over other types of data, please see other sources documentation like [SQL database Question Answering](../tabular) or [Interacting with APIs](../apis).

For question answering over many documents, you almost always want to create an index over the data.
This can be used to smartly access the most relevant documents for a given question, allowing you to avoid having to pass all the documents to the LLM (saving you time and money).

**Load Your Documents**

```python
from langchain.document_loaders import TextLoader
loader = TextLoader('../state_of_the_union.txt')
```

See [here](/docs/modules/data_connection/document_loaders) for more information on how to get started with document loading.

**Create Your Index**

```python
from langchain.indexes import VectorstoreIndexCreator
index = VectorstoreIndexCreator().from_loaders([loader])
```

The best and most popular index by far at the moment is the VectorStore index.

**Query Your Index**

```python
query = "What did the president say about Ketanji Brown Jackson"
index.query(query)
```

Alternatively, use `query_with_sources` to also get back the sources involved

```python
query = "What did the president say about Ketanji Brown Jackson"
index.query_with_sources(query)
```

Again, these high level interfaces obfuscate a lot of what is going on under the hood, so please see [this notebook](/docs/modules/data_connection/getting_started.html) for a lower level walkthrough.

## Document Question Answering

Question answering involves fetching multiple documents, and then asking a question of them.
The LLM response will contain the answer to your question, based on the content of the documents.

The recommended way to get started using a question answering chain is:

```python
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff")
chain.run(input_documents=docs, question=query)
```

The following resources exist:

- [Question Answering Notebook](/docs/modules/chains/index_examples/question_answering.html): A notebook walking through how to accomplish this task.
- [VectorDB Question Answering Notebook](/docs/modules/chains/index_examples/vector_db_qa.html): A notebook walking through how to do question answering over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.

## Adding in sources

There is also a variant of this, where in addition to responding with the answer the language model will also cite its sources (eg which of the documents passed in it used).

The recommended way to get started using a question answering with sources chain is:

```python
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
chain = load_qa_with_sources_chain(llm, chain_type="stuff")
chain({"input_documents": docs, "question": query}, return_only_outputs=True)
```

The following resources exist:

- [QA With Sources Notebook](/docs/modules/chains/index_examples/qa_with_sources.html): A notebook walking through how to accomplish this task.
- [VectorDB QA With Sources Notebook](/docs/modules/chains/index_examples/vector_db_qa_with_sources.html): A notebook walking through how to do question answering with sources over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.

## Additional Related Resources

Additional related resources include:

- [Building blocks for working with Documents](/docs/modules/data_connection): Guides on how to use several of the utilities which will prove helpful for this task, including Text Splitters (for splitting up long documents) and Embeddings & Vectorstores (useful for the above Vector DB example).
- [CombineDocuments Chains](/docs/modules/chains/documents): A conceptual overview of specific types of chains by which you can accomplish this task.

## End-to-end examples

For examples to this done in an end-to-end manner, please see the following resources:

- [Semantic search over a group chat with Sources Notebook](./semantic-search-over-chat.html): A notebook that semantically searches over a group chat conversation.
Doc refactor (#6300) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2023-06-16 18:52:56 +00:00			`# Question answering over documents`
big docs refactor (#1978) Co-authored-by: Ankush Gola <ankush.gola@gmail.com> 2023-03-27 02:49:46 +00:00
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00			`Question answering in this context refers to question answering over your document data.`
Doc refactor (#6300) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2023-06-16 18:52:56 +00:00			`For question answering over other types of data, please see other sources documentation like [SQL database Question Answering](../tabular) or [Interacting with APIs](../apis).`
bump version to 0.0.95 (#1324) 2023-02-27 15:45:54 +00:00
			`For question answering over many documents, you almost always want to create an index over the data.`
			`This can be used to smartly access the most relevant documents for a given question, allowing you to avoid having to pass all the documents to the LLM (saving you time and money).`

			`Load Your Documents`
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
bump version to 0.0.95 (#1324) 2023-02-27 15:45:54 +00:00			```python
			`from langchain.document_loaders import TextLoader`
			`loader = TextLoader('../state_of_the_union.txt')`
			```
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
Doc refactor (#6300) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2023-06-16 18:52:56 +00:00			`See [here](/docs/modules/data_connection/document_loaders) for more information on how to get started with document loading.`
bump version to 0.0.95 (#1324) 2023-02-27 15:45:54 +00:00
			`Create Your Index`
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
bump version to 0.0.95 (#1324) 2023-02-27 15:45:54 +00:00			```python
			`from langchain.indexes import VectorstoreIndexCreator`
			`index = VectorstoreIndexCreator().from_loaders([loader])`
			```
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
bump version to 0.0.95 (#1324) 2023-02-27 15:45:54 +00:00			`The best and most popular index by far at the moment is the VectorStore index.`

			`Query Your Index`
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
bump version to 0.0.95 (#1324) 2023-02-27 15:45:54 +00:00			```python
			`query = "What did the president say about Ketanji Brown Jackson"`
			`index.query(query)`
			```
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
bump version to 0.0.95 (#1324) 2023-02-27 15:45:54 +00:00			Alternatively, use `query_with_sources` to also get back the sources involved
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
bump version to 0.0.95 (#1324) 2023-02-27 15:45:54 +00:00			```python
			`query = "What did the president say about Ketanji Brown Jackson"`
			`index.query_with_sources(query)`
			```
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
Doc refactor (#6300) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2023-06-16 18:52:56 +00:00			`Again, these high level interfaces obfuscate a lot of what is going on under the hood, so please see [this notebook](/docs/modules/data_connection/getting_started.html) for a lower level walkthrough.`
bump version to 0.0.95 (#1324) 2023-02-27 15:45:54 +00:00
			`## Document Question Answering`

Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00			`Question answering involves fetching multiple documents, and then asking a question of them.`
			`The LLM response will contain the answer to your question, based on the content of the documents.`

improve documentation on how to pass in custom prompts (#561) 2023-01-09 03:20:13 +00:00			`The recommended way to get started using a question answering chain is:`

			```python
			`from langchain.chains.question_answering import load_qa_chain`
			`chain = load_qa_chain(llm, chain_type="stuff")`
			`chain.run(input_documents=docs, question=query)`
			```

Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00			`The following resources exist:`
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
Doc refactor (#6300) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2023-06-16 18:52:56 +00:00			`- [Question Answering Notebook](/docs/modules/chains/index_examples/question_answering.html): A notebook walking through how to accomplish this task.`
			`- [VectorDB Question Answering Notebook](/docs/modules/chains/index_examples/vector_db_qa.html): A notebook walking through how to do question answering over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.`
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00
bump version to 0.0.95 (#1324) 2023-02-27 15:45:54 +00:00			`## Adding in sources`
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00
			`There is also a variant of this, where in addition to responding with the answer the language model will also cite its sources (eg which of the documents passed in it used).`

improve documentation on how to pass in custom prompts (#561) 2023-01-09 03:20:13 +00:00			`The recommended way to get started using a question answering with sources chain is:`

			```python
			`from langchain.chains.qa_with_sources import load_qa_with_sources_chain`
			`chain = load_qa_with_sources_chain(llm, chain_type="stuff")`
			`chain({"input_documents": docs, "question": query}, return_only_outputs=True)`
			```

Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00			`The following resources exist:`
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
Doc refactor (#6300) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2023-06-16 18:52:56 +00:00			`- [QA With Sources Notebook](/docs/modules/chains/index_examples/qa_with_sources.html): A notebook walking through how to accomplish this task.`
			`- [VectorDB QA With Sources Notebook](/docs/modules/chains/index_examples/vector_db_qa_with_sources.html): A notebook walking through how to do question answering with sources over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.`
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00
bump version to 0.0.95 (#1324) 2023-02-27 15:45:54 +00:00			`## Additional Related Resources`
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00
			`Additional related resources include:`
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
Doc refactor (#6300) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2023-06-16 18:52:56 +00:00			`- [Building blocks for working with Documents](/docs/modules/data_connection): Guides on how to use several of the utilities which will prove helpful for this task, including Text Splitters (for splitting up long documents) and Embeddings & Vectorstores (useful for the above Vector DB example).`
			`- [CombineDocuments Chains](/docs/modules/chains/documents): A conceptual overview of specific types of chains by which you can accomplish this task.`
Harrison/qa eg (#3052) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com> 2023-04-18 03:56:42 +00:00
			`## End-to-end examples`

			`For examples to this done in an end-to-end manner, please see the following resources:`

Doc refactor (#6300) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2023-06-16 18:52:56 +00:00			`- [Semantic search over a group chat with Sources Notebook](./semantic-search-over-chat.html): A notebook that semantically searches over a group chat conversation.`