mirror of
https://github.com/hwchase17/langchain
synced 2024-10-31 15:20:26 +00:00
0adc282d70
Co-authored-by: Bytestorm <31070777+Bytestorm5@users.noreply.github.com>
154 lines
6.8 KiB
Plaintext
154 lines
6.8 KiB
Plaintext
```python
|
|
from langchain.chains import RetrievalQA
|
|
from langchain.document_loaders import TextLoader
|
|
from langchain.embeddings.openai import OpenAIEmbeddings
|
|
from langchain.llms import OpenAI
|
|
from langchain.text_splitter import CharacterTextSplitter
|
|
from langchain.vectorstores import Chroma
|
|
```
|
|
|
|
|
|
```python
|
|
loader = TextLoader("../../state_of_the_union.txt")
|
|
documents = loader.load()
|
|
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
|
|
texts = text_splitter.split_documents(documents)
|
|
|
|
embeddings = OpenAIEmbeddings()
|
|
docsearch = Chroma.from_documents(texts, embeddings)
|
|
|
|
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=docsearch.as_retriever())
|
|
```
|
|
|
|
|
|
```python
|
|
query = "What did the president say about Ketanji Brown Jackson"
|
|
qa.run(query)
|
|
```
|
|
|
|
<CodeOutputBlock lang="python">
|
|
|
|
```
|
|
" The president said that she is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support, from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."
|
|
```
|
|
|
|
</CodeOutputBlock>
|
|
|
|
## Chain Type
|
|
You can easily specify different chain types to load and use in the RetrievalQA chain. For a more detailed walkthrough of these types, please see [this notebook](/docs/modules/chains/additional/question_answering.html).
|
|
|
|
There are two ways to load different chain types. First, you can specify the chain type argument in the `from_chain_type` method. This allows you to pass in the name of the chain type you want to use. For example, in the below we change the chain type to `map_reduce`.
|
|
|
|
|
|
```python
|
|
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="map_reduce", retriever=docsearch.as_retriever())
|
|
```
|
|
|
|
|
|
```python
|
|
query = "What did the president say about Ketanji Brown Jackson"
|
|
qa.run(query)
|
|
```
|
|
|
|
<CodeOutputBlock lang="python">
|
|
|
|
```
|
|
" The president said that Judge Ketanji Brown Jackson is one of our nation's top legal minds, a former top litigator in private practice and a former federal public defender, from a family of public school educators and police officers, a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."
|
|
```
|
|
|
|
</CodeOutputBlock>
|
|
|
|
The above way allows you to really simply change the chain_type, but it doesn't provide a ton of flexibility over parameters to that chain type. If you want to control those parameters, you can load the chain directly (as you did in [this notebook](/docs/modules/chains/additional/question_answering.html)) and then pass that directly to the the RetrievalQA chain with the `combine_documents_chain` parameter. For example:
|
|
|
|
|
|
```python
|
|
from langchain.chains.question_answering import load_qa_chain
|
|
qa_chain = load_qa_chain(OpenAI(temperature=0), chain_type="stuff")
|
|
qa = RetrievalQA(combine_documents_chain=qa_chain, retriever=docsearch.as_retriever())
|
|
```
|
|
|
|
|
|
```python
|
|
query = "What did the president say about Ketanji Brown Jackson"
|
|
qa.run(query)
|
|
```
|
|
|
|
<CodeOutputBlock lang="python">
|
|
|
|
```
|
|
" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."
|
|
```
|
|
|
|
</CodeOutputBlock>
|
|
|
|
## Custom Prompts
|
|
You can pass in custom prompts to do question answering. These prompts are the same prompts as you can pass into the [base question answering chain](/docs/modules/chains/additional/question_answering.html)
|
|
|
|
|
|
```python
|
|
from langchain.prompts import PromptTemplate
|
|
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
|
|
|
|
{context}
|
|
|
|
Question: {question}
|
|
Answer in Italian:"""
|
|
PROMPT = PromptTemplate(
|
|
template=prompt_template, input_variables=["context", "question"]
|
|
)
|
|
```
|
|
|
|
|
|
```python
|
|
chain_type_kwargs = {"prompt": PROMPT}
|
|
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=docsearch.as_retriever(), chain_type_kwargs=chain_type_kwargs)
|
|
```
|
|
|
|
|
|
```python
|
|
query = "What did the president say about Ketanji Brown Jackson"
|
|
qa.run(query)
|
|
```
|
|
|
|
<CodeOutputBlock lang="python">
|
|
|
|
```
|
|
" Il presidente ha detto che Ketanji Brown Jackson è una delle menti legali più importanti del paese, che continuerà l'eccellenza di Justice Breyer e che ha ricevuto un ampio sostegno, da Fraternal Order of Police a ex giudici nominati da democratici e repubblicani."
|
|
```
|
|
|
|
</CodeOutputBlock>
|
|
|
|
## Vectorstore Retriever Options
|
|
You can adjust how documents are retrieved from your vectorstore depending on the specific task.
|
|
|
|
There are two main ways to retrieve documents relevant to a query- Similarity Search and Max Marginal Relevance Search (MMR Search). Similarity Search is the default, but you can use MMR by adding the `search_type` parameter:
|
|
|
|
```python
|
|
docsearch.as_retriever(search_type="mmr")
|
|
```
|
|
|
|
You can also modify the search by passing specific search arguments through the retriever to the search function, using the `search_kwargs` keyword argument.
|
|
|
|
- `k` defines how many documents are returned; defaults to 4.
|
|
- `score_threshold` allows you to set a minimum relevance for documents returned by the retriever, if you are using the "similarity_score_threshold" search type.
|
|
- `fetch_k` determines the amount of documents to pass to the MMR algorithm; defaults to 20.
|
|
- `lambda_mult` controls the diversity of results returned by the MMR algorithm, with 1 being minimum diversity and 0 being maximum. Defaults to 0.5.
|
|
- `filter` allows you to define a filter on what documents should be retrieved, based on the documents' metadata. This has no effect if the Vectorstore doesn't store any metadata.
|
|
|
|
Some examples for how these parameters can be used:
|
|
```python
|
|
# Retrieve more documents with higher diversity- useful if your dataset has many similar documents
|
|
docsearch.as_retriever(search_type="mmr", search_kwargs={'k': 6, 'lambda_mult': 0.25})
|
|
|
|
# Fetch more documents for the MMR algorithm to consider, but only return the top 5
|
|
docsearch.as_retriever(search_type="mmr", search_kwargs={'k': 5, 'fetch_k': 50})
|
|
|
|
# Only retrieve documents that have a relevance score above a certain threshold
|
|
docsearch.as_retriever(search_type="similarity_score_threshold", search_kwargs={'score_threshold': 0.8})
|
|
|
|
# Only get the single most similar document from the dataset
|
|
docsearch.as_retriever(search_kwargs={'k': 1})
|
|
|
|
# Use a filter to only retrieve documents from a specific paper
|
|
docsearch.as_retriever(search_kwargs={'filter': {'paper_title':'GPT-4 Technical Report'}})
|
|
``` |