langchain/docs/use_cases/evaluation/data_augmented_question_answering.ipynb
2023-02-12 23:02:01 -08:00

298 lines
11 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "e78b7bb1",
"metadata": {},
"source": [
"# Data Augmented Question Answering\n",
"\n",
"This notebook uses some generic prompts/language models to evaluate an question answering system that uses other sources of data besides what is in the model. For example, this can be used to evaluate a question answering system over your propritary data.\n",
"\n",
"## Setup\n",
"Let's set up an example with our favorite example - the state of the union address."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ab4a6931",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain import OpenAI, VectorDBQA"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4fdc211d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../modules/state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"docsearch = Chroma.from_documents(texts, embeddings)\n",
"qa = VectorDBQA.from_llm(llm=OpenAI(), vectorstore=docsearch)"
]
},
{
"cell_type": "markdown",
"id": "30fd72f2",
"metadata": {},
"source": [
"## Examples\n",
"Now we need some examples to evaluate. We can do this in two ways:\n",
"\n",
"1. Hard code some examples ourselves\n",
"2. Generate examples automatically, using a language model"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "3459b001",
"metadata": {},
"outputs": [],
"source": [
"# Hard-coded examples\n",
"examples = [\n",
" {\n",
" \"query\": \"What did the president say about Ketanji Brown Jackson\",\n",
" \"answer\": \"He praised her legal ability and said he nominated her for the supreme court.\"\n",
" },\n",
" {\n",
" \"query\": \"What did the president say about Michael Jackson\",\n",
" \"answer\": \"Nothing\"\n",
" }\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b9c3fa75",
"metadata": {},
"outputs": [],
"source": [
"# Generated examples\n",
"from langchain.evaluation.qa import QAGenerateChain\n",
"example_gen_chain = QAGenerateChain.from_llm(OpenAI())"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c24543a9",
"metadata": {},
"outputs": [],
"source": [
"new_examples = example_gen_chain.apply_and_parse([{\"doc\": t} for t in texts[:5]])"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a2d27560",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'query': 'What did Vladimir Putin miscalculate when he sought to shake the foundations of the free world? ',\n",
" 'answer': 'He miscalculated that the world would roll over and that he could roll into Ukraine without facing resistance.'},\n",
" {'query': 'What is the purpose of NATO?',\n",
" 'answer': 'The purpose of NATO is to secure peace and stability in Europe after World War 2.'},\n",
" {'query': \"What did the author do to prepare for Putin's attack on Ukraine?\",\n",
" 'answer': \"The author spent months building a coalition of freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin, shared with the world in advance what they knew Putin was planning, and countered Russia's lies with truth.\"},\n",
" {'query': 'What are the US and its allies doing to isolate Russia from the world?',\n",
" 'answer': \"Enforcing powerful economic sanctions, cutting off Russia's largest banks from the international financial system, preventing Russia's central bank from defending the Russian Ruble, choking off Russia's access to technology, and joining with European allies to find and seize assets of Russian oligarchs.\"},\n",
" {'query': 'How much direct assistance is the U.S. providing to Ukraine?',\n",
" 'answer': 'The U.S. is providing more than $1 Billion in direct assistance to Ukraine.'}]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_examples"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "558da6f3",
"metadata": {},
"outputs": [],
"source": [
"# Combine examples\n",
"examples += new_examples"
]
},
{
"cell_type": "markdown",
"id": "443dc34e",
"metadata": {},
"source": [
"## Evaluate\n",
"Now that we have examples, we can use the question answering evaluator to evaluate our question answering chain."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "782169a5",
"metadata": {},
"outputs": [],
"source": [
"from langchain.evaluation.qa import QAEvalChain"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "1bb77416",
"metadata": {},
"outputs": [],
"source": [
"predictions = qa.apply(examples)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "bcd0ad7f",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"eval_chain = QAEvalChain.from_llm(llm)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "2e6af79a",
"metadata": {},
"outputs": [],
"source": [
"graded_outputs = eval_chain.evaluate(examples, predictions)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "32fac2dc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Example 0:\n",
"Question: What did the president say about Ketanji Brown Jackson\n",
"Real Answer: He praised her legal ability and said he nominated her for the supreme court.\n",
"Predicted Answer: The president said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\n",
"Predicted Grade: CORRECT\n",
"\n",
"Example 1:\n",
"Question: What did the president say about Michael Jackson\n",
"Real Answer: Nothing\n",
"Predicted Answer: \n",
"The president did not mention Michael Jackson in this context.\n",
"Predicted Grade: CORRECT\n",
"\n",
"Example 2:\n",
"Question: What did Vladimir Putin miscalculate when he sought to shake the foundations of the free world? \n",
"Real Answer: He miscalculated that the world would roll over and that he could roll into Ukraine without facing resistance.\n",
"Predicted Answer: Putin miscalculated that the West and NATO wouldn't respond to his attack on Ukraine and that he could divide the US and its allies.\n",
"Predicted Grade: CORRECT\n",
"\n",
"Example 3:\n",
"Question: What is the purpose of NATO?\n",
"Real Answer: The purpose of NATO is to secure peace and stability in Europe after World War 2.\n",
"Predicted Answer: The purpose of NATO is to secure peace and stability in Europe after World War 2.\n",
"Predicted Grade: CORRECT\n",
"\n",
"Example 4:\n",
"Question: What did the author do to prepare for Putin's attack on Ukraine?\n",
"Real Answer: The author spent months building a coalition of freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin, shared with the world in advance what they knew Putin was planning, and countered Russia's lies with truth.\n",
"Predicted Answer: The author prepared extensively and carefully. They spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin, and they spent countless hours unifying their European allies. They also shared with the world in advance what they knew Putin was planning and precisely how he would try to falsely justify his aggression. They countered Russias lies with truth.\n",
"Predicted Grade: CORRECT\n",
"\n",
"Example 5:\n",
"Question: What are the US and its allies doing to isolate Russia from the world?\n",
"Real Answer: Enforcing powerful economic sanctions, cutting off Russia's largest banks from the international financial system, preventing Russia's central bank from defending the Russian Ruble, choking off Russia's access to technology, and joining with European allies to find and seize assets of Russian oligarchs.\n",
"Predicted Answer: The US and its allies are enforcing economic sanctions on Russia, cutting off its largest banks from the international financial system, preventing its central bank from defending the Russian Ruble, choking off Russia's access to technology, closing American airspace to all Russian flights, and providing support to Ukraine.\n",
"Predicted Grade: CORRECT\n",
"\n",
"Example 6:\n",
"Question: How much direct assistance is the U.S. providing to Ukraine?\n",
"Real Answer: The U.S. is providing more than $1 Billion in direct assistance to Ukraine.\n",
"Predicted Answer: The U.S. is providing more than $1 Billion in direct assistance to Ukraine.\n",
"Predicted Grade: CORRECT\n",
"\n"
]
}
],
"source": [
"for i, eg in enumerate(examples):\n",
" print(f\"Example {i}:\")\n",
" print(\"Question: \" + predictions[i]['query'])\n",
" print(\"Real Answer: \" + predictions[i]['answer'])\n",
" print(\"Predicted Answer: \" + predictions[i]['result'])\n",
" print(\"Predicted Grade: \" + graded_outputs[i]['text'])\n",
" print()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd0b01dc",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}