2023-03-27 02:49:46 +00:00
# Question Answering over Docs
2023-03-27 04:43:51 +00:00
> [Conceptual Guide](https://docs.langchain.com/docs/use-cases/qa-docs)
2023-03-27 02:49:46 +00:00
2023-04-18 03:56:42 +00:00
Question answering in this context refers to question answering over your document data.
2023-03-27 02:49:46 +00:00
For question answering over other types of data, please see other sources documentation like [SQL database Question Answering ](./tabular.md ) or [Interacting with APIs ](./apis.md ).
2023-02-27 15:45:54 +00:00
For question answering over many documents, you almost always want to create an index over the data.
This can be used to smartly access the most relevant documents for a given question, allowing you to avoid having to pass all the documents to the LLM (saving you time and money).
See [this notebook ](../modules/indexes/getting_started.ipynb ) for a more detailed introduction to this, but for a super quick start the steps involved are:
**Load Your Documents**
2023-04-18 03:56:42 +00:00
2023-02-27 15:45:54 +00:00
```python
from langchain.document_loaders import TextLoader
loader = TextLoader('../state_of_the_union.txt')
```
2023-04-18 03:56:42 +00:00
2023-03-27 23:28:08 +00:00
See [here ](../modules/indexes/document_loaders.rst ) for more information on how to get started with document loading.
2023-02-27 15:45:54 +00:00
**Create Your Index**
2023-04-18 03:56:42 +00:00
2023-02-27 15:45:54 +00:00
```python
from langchain.indexes import VectorstoreIndexCreator
index = VectorstoreIndexCreator().from_loaders([loader])
```
2023-04-18 03:56:42 +00:00
2023-02-27 15:45:54 +00:00
The best and most popular index by far at the moment is the VectorStore index.
**Query Your Index**
2023-04-18 03:56:42 +00:00
2023-02-27 15:45:54 +00:00
```python
query = "What did the president say about Ketanji Brown Jackson"
index.query(query)
```
2023-04-18 03:56:42 +00:00
2023-02-27 15:45:54 +00:00
Alternatively, use `query_with_sources` to also get back the sources involved
2023-04-18 03:56:42 +00:00
2023-02-27 15:45:54 +00:00
```python
query = "What did the president say about Ketanji Brown Jackson"
index.query_with_sources(query)
```
2023-04-18 03:56:42 +00:00
2023-02-27 15:45:54 +00:00
Again, these high level interfaces obfuscate a lot of what is going on under the hood, so please see [this notebook ](../modules/indexes/getting_started.ipynb ) for a lower level walkthrough.
## Document Question Answering
Docs refactor (#480)
Big docs refactor! Motivation is to make it easier for people to find
resources they are looking for. To accomplish this, there are now three
main sections:
- Getting Started: steps for getting started, walking through most core
functionality
- Modules: these are different modules of functionality that langchain
provides. Each part here has a "getting started", "how to", "key
concepts" and "reference" section (except in a few select cases where it
didnt easily fit).
- Use Cases: this is to separate use cases (like summarization, question
answering, evaluation, etc) from the modules, and provide a different
entry point to the code base.
There is also a full reference section, as well as extra resources
(glossary, gallery, etc)
Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com>
2023-01-02 16:24:09 +00:00
Question answering involves fetching multiple documents, and then asking a question of them.
The LLM response will contain the answer to your question, based on the content of the documents.
2023-01-09 03:20:13 +00:00
The recommended way to get started using a question answering chain is:
```python
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff")
chain.run(input_documents=docs, question=query)
```
Docs refactor (#480)
Big docs refactor! Motivation is to make it easier for people to find
resources they are looking for. To accomplish this, there are now three
main sections:
- Getting Started: steps for getting started, walking through most core
functionality
- Modules: these are different modules of functionality that langchain
provides. Each part here has a "getting started", "how to", "key
concepts" and "reference" section (except in a few select cases where it
didnt easily fit).
- Use Cases: this is to separate use cases (like summarization, question
answering, evaluation, etc) from the modules, and provide a different
entry point to the code base.
There is also a full reference section, as well as extra resources
(glossary, gallery, etc)
Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com>
2023-01-02 16:24:09 +00:00
The following resources exist:
2023-04-18 03:56:42 +00:00
2023-04-08 15:33:28 +00:00
- [Question Answering Notebook ](../modules/chains/index_examples/question_answering.ipynb ): A notebook walking through how to accomplish this task.
- [VectorDB Question Answering Notebook ](../modules/chains/index_examples/vector_db_qa.ipynb ): A notebook walking through how to do question answering over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.
Docs refactor (#480)
Big docs refactor! Motivation is to make it easier for people to find
resources they are looking for. To accomplish this, there are now three
main sections:
- Getting Started: steps for getting started, walking through most core
functionality
- Modules: these are different modules of functionality that langchain
provides. Each part here has a "getting started", "how to", "key
concepts" and "reference" section (except in a few select cases where it
didnt easily fit).
- Use Cases: this is to separate use cases (like summarization, question
answering, evaluation, etc) from the modules, and provide a different
entry point to the code base.
There is also a full reference section, as well as extra resources
(glossary, gallery, etc)
Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com>
2023-01-02 16:24:09 +00:00
2023-02-27 15:45:54 +00:00
## Adding in sources
Docs refactor (#480)
Big docs refactor! Motivation is to make it easier for people to find
resources they are looking for. To accomplish this, there are now three
main sections:
- Getting Started: steps for getting started, walking through most core
functionality
- Modules: these are different modules of functionality that langchain
provides. Each part here has a "getting started", "how to", "key
concepts" and "reference" section (except in a few select cases where it
didnt easily fit).
- Use Cases: this is to separate use cases (like summarization, question
answering, evaluation, etc) from the modules, and provide a different
entry point to the code base.
There is also a full reference section, as well as extra resources
(glossary, gallery, etc)
Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com>
2023-01-02 16:24:09 +00:00
There is also a variant of this, where in addition to responding with the answer the language model will also cite its sources (eg which of the documents passed in it used).
2023-01-09 03:20:13 +00:00
The recommended way to get started using a question answering with sources chain is:
```python
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
chain = load_qa_with_sources_chain(llm, chain_type="stuff")
chain({"input_documents": docs, "question": query}, return_only_outputs=True)
```
Docs refactor (#480)
Big docs refactor! Motivation is to make it easier for people to find
resources they are looking for. To accomplish this, there are now three
main sections:
- Getting Started: steps for getting started, walking through most core
functionality
- Modules: these are different modules of functionality that langchain
provides. Each part here has a "getting started", "how to", "key
concepts" and "reference" section (except in a few select cases where it
didnt easily fit).
- Use Cases: this is to separate use cases (like summarization, question
answering, evaluation, etc) from the modules, and provide a different
entry point to the code base.
There is also a full reference section, as well as extra resources
(glossary, gallery, etc)
Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com>
2023-01-02 16:24:09 +00:00
The following resources exist:
2023-04-18 03:56:42 +00:00
2023-04-08 15:33:28 +00:00
- [QA With Sources Notebook ](../modules/chains/index_examples/qa_with_sources.ipynb ): A notebook walking through how to accomplish this task.
- [VectorDB QA With Sources Notebook ](../modules/chains/index_examples/vector_db_qa_with_sources.ipynb ): A notebook walking through how to do question answering with sources over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.
Docs refactor (#480)
Big docs refactor! Motivation is to make it easier for people to find
resources they are looking for. To accomplish this, there are now three
main sections:
- Getting Started: steps for getting started, walking through most core
functionality
- Modules: these are different modules of functionality that langchain
provides. Each part here has a "getting started", "how to", "key
concepts" and "reference" section (except in a few select cases where it
didnt easily fit).
- Use Cases: this is to separate use cases (like summarization, question
answering, evaluation, etc) from the modules, and provide a different
entry point to the code base.
There is also a full reference section, as well as extra resources
(glossary, gallery, etc)
Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com>
2023-01-02 16:24:09 +00:00
2023-02-27 15:45:54 +00:00
## Additional Related Resources
Docs refactor (#480)
Big docs refactor! Motivation is to make it easier for people to find
resources they are looking for. To accomplish this, there are now three
main sections:
- Getting Started: steps for getting started, walking through most core
functionality
- Modules: these are different modules of functionality that langchain
provides. Each part here has a "getting started", "how to", "key
concepts" and "reference" section (except in a few select cases where it
didnt easily fit).
- Use Cases: this is to separate use cases (like summarization, question
answering, evaluation, etc) from the modules, and provide a different
entry point to the code base.
There is also a full reference section, as well as extra resources
(glossary, gallery, etc)
Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com>
2023-01-02 16:24:09 +00:00
Additional related resources include:
2023-04-18 03:56:42 +00:00
Docs refactor (#480)
Big docs refactor! Motivation is to make it easier for people to find
resources they are looking for. To accomplish this, there are now three
main sections:
- Getting Started: steps for getting started, walking through most core
functionality
- Modules: these are different modules of functionality that langchain
provides. Each part here has a "getting started", "how to", "key
concepts" and "reference" section (except in a few select cases where it
didnt easily fit).
- Use Cases: this is to separate use cases (like summarization, question
answering, evaluation, etc) from the modules, and provide a different
entry point to the code base.
There is also a full reference section, as well as extra resources
(glossary, gallery, etc)
Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com>
2023-01-02 16:24:09 +00:00
- [Utilities for working with Documents ](/modules/utils/how_to_guides.rst ): Guides on how to use several of the utilities which will prove helpful for this task, including Text Splitters (for splitting up long documents) and Embeddings & Vectorstores (useful for the above Vector DB example).
2023-02-21 06:54:26 +00:00
- [CombineDocuments Chains ](/modules/indexes/combine_docs.md ): A conceptual overview of specific types of chains by which you can accomplish this task.
2023-04-18 03:56:42 +00:00
## End-to-end examples
For examples to this done in an end-to-end manner, please see the following resources:
- [Semantic search over a group chat with Sources Notebook ](question_answering/semantic-search-over-chat.ipynb ): A notebook that semantically searches over a group chat conversation.