mv popular and additional chains to use cases (#8242)

pull/8379/head
Bagatur 1 year ago committed by GitHub
parent ff98fad2d9
commit 68763bd25f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,6 @@
# Preventing harmful outputs
One of the key concerns with using LLMs is that they may generate harmful or unethical text. This is an area of active research in the field. Here we present some built-in chains inspired by this research, which are intended to make the outputs of LLMs safer.
- [Moderation chain](/docs/use_cases/safety/moderation): Explicitly check if any output text is harmful and flag it.
- [Constitutional chain](/docs/use_cases/safety/constitutional_chain): Prompt the model with a set of principles which should guide it's behavior.

@ -1,8 +0,0 @@
---
sidebar_position: 4
---
# Additional
import DocCardList from "@theme/DocCardList";
<DocCardList />

@ -1,7 +0,0 @@
# Dynamically selecting from multiple prompts
This notebook demonstrates how to use the `RouterChain` paradigm to create a chain that dynamically selects the prompt to use for a given input. Specifically we show how to use the `MultiPromptChain` to create a question-answering chain that selects the prompt which is most relevant for a given question, and then answers the question using that prompt.
import Example from "@snippets/modules/chains/additional/multi_prompt_router.mdx"
<Example/>

@ -1,8 +0,0 @@
---
sidebar_position: 3
---
# Popular
import DocCardList from "@theme/DocCardList";
<DocCardList />

@ -2,7 +2,7 @@
sidebar_position: 2 sidebar_position: 2
--- ---
# Conversational Retrieval QA # Store and reference chat history
The ConversationalRetrievalQA chain builds on RetrievalQAChain to provide a chat history component. The ConversationalRetrievalQA chain builds on RetrievalQAChain to provide a chat history component.
It first combines the chat history (either explicitly passed in or retrieved from the provided memory) and the question into a standalone question, then looks up relevant documents from the retriever, and finally passes those documents and the question to a question answering chain to return a response. It first combines the chat history (either explicitly passed in or retrieved from the provided memory) and the question into a standalone question, then looks up relevant documents from the retriever, and finally passes those documents and the question to a question answering chain to return a response.

@ -1,4 +1,4 @@
# Dynamically selecting from multiple retrievers # Dynamically select from multiple retrievers
This notebook demonstrates how to use the `RouterChain` paradigm to create a chain that dynamically selects which Retrieval system to use. Specifically we show how to use the `MultiRetrievalQAChain` to create a question-answering chain that selects the retrieval QA chain which is most relevant for a given question, and then answers the question using it. This notebook demonstrates how to use the `RouterChain` paradigm to create a chain that dynamically selects which Retrieval system to use. Specifically we show how to use the `MultiRetrievalQAChain` to create a question-answering chain that selects the retrieval QA chain which is most relevant for a given question, and then answers the question using it.

@ -1,4 +1,4 @@
# Document QA # QA over in-memory documents
Here we walk through how to use LangChain for question answering over a list of documents. Under the hood we'll be using our [Document chains](/docs/modules/chains/document/). Here we walk through how to use LangChain for question answering over a list of documents. Under the hood we'll be using our [Document chains](/docs/modules/chains/document/).

@ -1,7 +1,7 @@
--- ---
sidebar_position: 1 sidebar_position: 1
--- ---
# Retrieval QA # QA using a Retriever
This example showcases question answering over an index. This example showcases question answering over an index.

@ -1610,59 +1610,59 @@
}, },
{ {
"source": "/en/latest/modules/chains/examples/flare.html", "source": "/en/latest/modules/chains/examples/flare.html",
"destination": "/docs/modules/chains/additional/flare" "destination": "/docs/use_cases/question_answering/how_to/flare"
}, },
{ {
"source": "/en/latest/modules/chains/examples/graph_cypher_qa.html", "source": "/en/latest/modules/chains/examples/graph_cypher_qa.html",
"destination": "/docs/modules/chains/additional/graph_cypher_qa" "destination": "/docs/use_cases/graph/graph_cypher_qa"
}, },
{ {
"source": "/en/latest/modules/chains/examples/graph_nebula_qa.html", "source": "/en/latest/modules/chains/examples/graph_nebula_qa.html",
"destination": "/docs/modules/chains/additional/graph_nebula_qa" "destination": "/docs/use_cases/graph/graph_nebula_qa"
}, },
{ {
"source": "/en/latest/modules/chains/index_examples/graph_qa.html", "source": "/en/latest/modules/chains/index_examples/graph_qa.html",
"destination": "/docs/modules/chains/additional/graph_qa" "destination": "/docs/use_cases/graph/graph_qa"
}, },
{ {
"source": "/en/latest/modules/chains/index_examples/hyde.html", "source": "/en/latest/modules/chains/index_examples/hyde.html",
"destination": "/docs/modules/chains/additional/hyde" "destination": "/docs/use_cases/question_answering/how_to/hyde"
}, },
{ {
"source": "/en/latest/modules/chains/examples/llm_bash.html", "source": "/en/latest/modules/chains/examples/llm_bash.html",
"destination": "/docs/modules/chains/additional/llm_bash" "destination": "/docs/use_cases/code_writing/llm_bash"
}, },
{ {
"source": "/en/latest/modules/chains/examples/llm_checker.html", "source": "/en/latest/modules/chains/examples/llm_checker.html",
"destination": "/docs/modules/chains/additional/llm_checker" "destination": "/docs/use_cases/self_check/llm_checker"
}, },
{ {
"source": "/en/latest/modules/chains/examples/llm_math.html", "source": "/en/latest/modules/chains/examples/llm_math.html",
"destination": "/docs/modules/chains/additional/llm_math" "destination": "/docs/use_cases/code_writing/llm_math"
}, },
{ {
"source": "/en/latest/modules/chains/examples/llm_requests.html", "source": "/en/latest/modules/chains/examples/llm_requests.html",
"destination": "/docs/modules/chains/additional/llm_requests" "destination": "/docs/use_cases/apis/llm_requests"
}, },
{ {
"source": "/en/latest/modules/chains/examples/llm_summarization_checker.html", "source": "/en/latest/modules/chains/examples/llm_summarization_checker.html",
"destination": "/docs/modules/chains/additional/llm_summarization_checker" "destination": "/docs/use_cases/self_check/llm_summarization_checker"
}, },
{ {
"source": "/en/latest/modules/chains/examples/openapi.html", "source": "/en/latest/modules/chains/examples/openapi.html",
"destination": "/docs/modules/chains/additional/openapi" "destination": "/docs/use_cases/apis/openapi"
}, },
{ {
"source": "/en/latest/modules/chains/examples/pal.html", "source": "/en/latest/modules/chains/examples/pal.html",
"destination": "/docs/modules/chains/additional/pal" "destination": "/docs/use_cases/code_writing/pal"
}, },
{ {
"source": "/en/latest/modules/chains/examples/tagging.html", "source": "/en/latest/modules/chains/examples/tagging.html",
"destination": "/docs/modules/chains/additional/tagging" "destination": "/docs/use_cases/tagging"
}, },
{ {
"source": "/en/latest/modules/chains/index_examples/vector_db_text_generation.html", "source": "/en/latest/modules/chains/index_examples/vector_db_text_generation.html",
"destination": "/docs/modules/chains/additional/vector_db_text_generation" "destination": "/docs/use_cases/question_answering/how_to/vector_db_text_generation"
}, },
{ {
"source": "/en/latest/modules/chains/generic/router.html", "source": "/en/latest/modules/chains/generic/router.html",
@ -3771,6 +3771,170 @@
{ {
"source": "/en/latest/:path*", "source": "/en/latest/:path*",
"destination": "/docs/:path*" "destination": "/docs/:path*"
},
{
"source": "/docs/modules/chains/additional/constitutional_chain",
"destination": "/docs/guides/safety/constitutional_chain"
},
{
"source": "/docs/modules/chains/additional/moderation",
"destination": "/docs/guides/safety/moderation"
},
{
"source": "/docs/modules/chains/popular/api",
"destination": "/docs/use_cases/apis/api"
},
{
"source": "/docs/modules/chains/additional/analyze_document",
"destination": "/docs/use_cases/question_answering/how_to/analyze_document"
},
{
"source": "/docs/modules/chains/popular/chat_vector_db",
"destination": "/docs/use_cases/question_answering/how_to/chat_vector_db"
},
{
"source": "/docs/modules/chains/additional/multi_retrieval_qa_router",
"destination": "/docs/use_cases/question_answering/how_to/multi_retrieval_qa_router"
},
{
"source": "/docs/modules/chains/additional/question_answering",
"destination": "/docs/use_cases/question_answering/how_to/question_answering"
},
{
"source": "/docs/modules/chains/popular/vector_db_qa",
"destination": "/docs/use_cases/question_answering/how_to/vector_db_qa"
},
{
"source": "/docs/modules/chains/popular/summarize",
"destination": "/docs/use_cases/summarization/summarize"
},
{
"source": "/docs/modules/chains/popular/sqlite",
"destination": "/docs/use_cases/tabular/sqlite"
},
{
"source": "/docs/modules/chains/popular/openai_functions",
"destination": "/docs/modules/chains/how_to/openai_functions"
},
{
"source": "/docs/modules/chains/additional/llm_requests",
"destination": "/docs/use_cases/apis/llm_requests"
},
{
"source": "/docs/modules/chains/additional/openai_openapi",
"destination": "/docs/use_cases/apis/openai_openapi"
},
{
"source": "/docs/modules/chains/additional/openapi",
"destination": "/docs/use_cases/apis/openapi"
},
{
"source": "/docs/modules/chains/additional/openapi_openai",
"destination": "/docs/use_cases/apis/openapi_openai"
},
{
"source": "/docs/modules/chains/additional/cpal",
"destination": "/docs/use_cases/code_writing/cpal"
},
{
"source": "/docs/modules/chains/additional/llm_bash",
"destination": "/docs/use_cases/code_writing/llm_bash"
},
{
"source": "/docs/modules/chains/additional/llm_math",
"destination": "/docs/use_cases/code_writing/llm_math"
},
{
"source": "/docs/modules/chains/additional/llm_symbolic_math",
"destination": "/docs/use_cases/code_writing/llm_symbolic_math"
},
{
"source": "/docs/modules/chains/additional/pal",
"destination": "/docs/use_cases/code_writing/pal"
},
{
"source": "/docs/modules/chains/additional/graph_arangodb_qa",
"destination": "/docs/use_cases/graph/graph_arangodb_qa"
},
{
"source": "/docs/modules/chains/additional/graph_cypher_qa",
"destination": "/docs/use_cases/graph/graph_cypher_qa"
},
{
"source": "/docs/modules/chains/additional/graph_hugegraph_qa",
"destination": "/docs/use_cases/graph/graph_hugegraph_qa"
},
{
"source": "/docs/modules/chains/additional/graph_kuzu_qa",
"destination": "/docs/use_cases/graph/graph_kuzu_qa"
},
{
"source": "/docs/modules/chains/additional/graph_nebula_qa",
"destination": "/docs/use_cases/graph/graph_nebula_qa"
},
{
"source": "/docs/modules/chains/additional/graph_qa",
"destination": "/docs/use_cases/graph/graph_qa"
},
{
"source": "/docs/modules/chains/additional/graph_sparql_qa",
"destination": "/docs/use_cases/graph/graph_sparql_qa"
},
{
"source": "/docs/modules/chains/additional/neptune_cypher_qa",
"destination": "/docs/use_cases/graph/neptune_cypher_qa"
},
{
"source": "/docs/modules/chains/additional/tot",
"destination": "/docs/use_cases/graph/tot"
},
{
"source": "/docs/use_cases/question_answering//document-context-aware-QA",
"destination": "/docs/use_cases/question_answering/how_to/document-context-aware-QA"
},
{
"source": "/docs/modules/chains/additional/flare",
"destination": "/docs/use_cases/question_answering/how_to/flare"
},
{
"source": "/docs/modules/chains/additional/hyde",
"destination": "/docs/use_cases/question_answering/how_to/hyde"
},
{
"source": "/docs/use_cases/question_answering//local_retrieval_qa",
"destination": "/docs/use_cases/question_answering/how_to/local_retrieval_qa"
},
{
"source": "/docs/modules/chains/additional/qa_citations",
"destination": "/docs/use_cases/question_answering/how_to/qa_citations"
},
{
"source": "/docs/modules/chains/additional/vector_db_text_generation",
"destination": "/docs/use_cases/question_answering/how_to/vector_db_text_generation"
},
{
"source": "/docs/modules/chains/additional/openai_functions_retrieval_qa",
"destination": "/docs/use_cases/question_answering/integrations/openai_functions_retrieval_qa"
},
{
"source": "/docs/use_cases/question_answering//semantic-search-over-chat",
"destination": "/docs/use_cases/question_answering/integrations/semantic-search-over-chat"
},
{
"source": "/docs/modules/chains/additional/llm_checker",
"destination": "/docs/use_cases/self_check/llm_checker"
},
{
"source": "/docs/modules/chains/additional/llm_summarization_checker",
"destination": "/docs/use_cases/self_check/llm_summarization_checker"
},
{
"source": "/docs/modules/chains/additional/elasticsearch_database",
"destination": "/docs/use_cases/tabular/elasticsearch_database"
},
{
"source": "/docs/modules/chains/additional/tagging",
"destination": "/docs/use_cases/tagging"
} }
] ]
} }

@ -5,7 +5,7 @@
"id": "920a3c1a", "id": "920a3c1a",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Model Comparison\n", "# Model comparison\n",
"\n", "\n",
"Constructing your language model application will likely involved choosing between many different options of prompts, models, and even chains to use. When doing so, you will want to compare these different options on different inputs in an easy, flexible, and intuitive way. \n", "Constructing your language model application will likely involved choosing between many different options of prompts, models, and even chains to use. When doing so, you will want to compare these different options on different inputs in an easy, flexible, and intuitive way. \n",
"\n", "\n",
@ -254,7 +254,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.9" "version": "3.11.3"
} }
}, },
"nbformat": 4, "nbformat": 4,

@ -14,7 +14,7 @@
"> using both human and machine feedback. We provide support for each step in the MLOps cycle, \n", "> using both human and machine feedback. We provide support for each step in the MLOps cycle, \n",
"> from data labeling to model monitoring.\n", "> from data labeling to model monitoring.\n",
"\n", "\n",
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/hwchase17/langchain/blob/master/docs/modules/callbacks/integrations/argilla.html\">\n", "<a target=\"_blank\" href=\"https://colab.research.google.com/github/hwchase17/langchain/blob/master/docs/integrations/callbacks/argilla.html\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n", " <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a>" "</a>"
] ]

@ -113,7 +113,7 @@
"\n", "\n",
"The modules are (from least to most complex):\n", "The modules are (from least to most complex):\n",
"\n", "\n",
"- [Models](https://python.langchain.com/en/latest/modules/models.html): Supported model types and integrations.\n", "- [Models](https://python.langchain.com/docs/modules/model_io/models/): Supported model types and integrations.\n",
"\n", "\n",
"- [Prompts](https://python.langchain.com/en/latest/modules/prompts.html): Prompt management, optimization, and serialization.\n", "- [Prompts](https://python.langchain.com/en/latest/modules/prompts.html): Prompt management, optimization, and serialization.\n",
"\n", "\n",

@ -13,7 +13,7 @@ pip install python-arango
Connect your ArangoDB Database with a Chat Model to get insights on your data. Connect your ArangoDB Database with a Chat Model to get insights on your data.
See the notebook example [here](/docs/modules/chains/additional/graph_arangodb_qa.html). See the notebook example [here](/docs/use_cases/graph/graph_arangodb_qa.html).
```python ```python
from arango import ArangoClient from arango import ArangoClient

@ -22,7 +22,7 @@ If you don't you can refer to [Argilla - 🚀 Quickstart](https://docs.argilla.i
## Tracking ## Tracking
See a [usage example of `ArgillaCallbackHandler`](/docs/modules/callbacks/integrations/argilla.html). See a [usage example of `ArgillaCallbackHandler`](/docs/integrations/callbacks/argilla.html).
```python ```python
from langchain.callbacks import ArgillaCallbackHandler from langchain.callbacks import ArgillaCallbackHandler

@ -28,7 +28,7 @@ from langchain.memory import CassandraChatMessageHistory
## Memory ## Memory
See a [usage example](/docs/modules/memory/integrations/cassandra_chat_message_history). See a [usage example](/docs/integrations/memory/cassandra_chat_message_history).
```python ```python
from langchain.memory import CassandraChatMessageHistory from langchain.memory import CassandraChatMessageHistory

@ -166,7 +166,7 @@
"source": [ "source": [
"### SQL Database Agent example\n", "### SQL Database Agent example\n",
"\n", "\n",
"This example demonstrates the use of the [SQL Database Agent](/docs/modules/agents/toolkits/sql_database.html) for answering questions over a Databricks database." "This example demonstrates the use of the [SQL Database Agent](/docs/integrations/toolkits/sql_database.html) for answering questions over a Databricks database."
] ]
}, },
{ {

@ -32,11 +32,11 @@ See [MLflow AI Gateway](/docs/ecosystem/integrations/mlflow_ai_gateway).
Databricks as an LLM provider Databricks as an LLM provider
----------------------------- -----------------------------
The notebook [Wrap Databricks endpoints as LLMs](/docs/modules/model_io/models/llms/integrations/databricks.html) illustrates the method to wrap Databricks endpoints as LLMs in LangChain. It supports two types of endpoints: the serving endpoint, which is recommended for both production and development, and the cluster driver proxy app, which is recommended for interactive development. The notebook [Wrap Databricks endpoints as LLMs](/docs/integrations/llms/databricks.html) illustrates the method to wrap Databricks endpoints as LLMs in LangChain. It supports two types of endpoints: the serving endpoint, which is recommended for both production and development, and the cluster driver proxy app, which is recommended for interactive development.
Databricks endpoints support Dolly, but are also great for hosting models like MPT-7B or any other models from the Hugging Face ecosystem. Databricks endpoints can also be used with proprietary models like OpenAI to provide a governance layer for enterprises. Databricks endpoints support Dolly, but are also great for hosting models like MPT-7B or any other models from the Hugging Face ecosystem. Databricks endpoints can also be used with proprietary models like OpenAI to provide a governance layer for enterprises.
Databricks Dolly Databricks Dolly
---------------- ----------------
Databricks Dolly is an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. The model is available on Hugging Face Hub as databricks/dolly-v2-12b. See the notebook [Hugging Face Hub](/docs/modules/model_io/models/llms/integrations/huggingface_hub.html) for instructions to access it through the Hugging Face Hub integration with LangChain. Databricks Dolly is an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. The model is available on Hugging Face Hub as databricks/dolly-v2-12b. See the notebook [Hugging Face Hub](/docs/integrations/llms/huggingface_hub.html) for instructions to access it through the Hugging Face Hub integration with LangChain.

@ -51,4 +51,4 @@ Momento can be used as a distributed memory store for LLMs.
### Chat Message History Memory ### Chat Message History Memory
See [this notebook](/docs/modules/memory/integrations/momento_chat_message_history.html) for a walkthrough of how to use Momento as a memory store for chat message history. See [this notebook](/docs/integrations/memory/momento_chat_message_history.html) for a walkthrough of how to use Momento as a memory store for chat message history.

@ -31,7 +31,7 @@ db = SQLDatabase.from_uri(conn_str)
db_chain = SQLDatabaseChain.from_llm(OpenAI(temperature=0), db, verbose=True) db_chain = SQLDatabaseChain.from_llm(OpenAI(temperature=0), db, verbose=True)
``` ```
From here, see the [SQL Chain](/docs/modules/chains/popular/sqlite.html) documentation on how to use. From here, see the [SQL Chain](/docs/use_cases/tabular/sqlite.html) documentation on how to use.
## LLMCache ## LLMCache

@ -58,7 +58,7 @@ For a more detailed walkthrough of this, see [this notebook](/docs/modules/data_
## Chain ## Chain
See a [usage example](/docs/modules/chains/additional/moderation). See a [usage example](/docs/guides/safety/moderation).
```python ```python
from langchain.chains import OpenAIModerationChain from langchain.chains import OpenAIModerationChain

@ -106,4 +106,4 @@ Redis can be used to persist LLM conversations.
For a more detailed walkthrough of the `VectorStoreRetrieverMemory` wrapper, see [this notebook](/docs/modules/memory/integrations/vectorstore_retriever_memory.html). For a more detailed walkthrough of the `VectorStoreRetrieverMemory` wrapper, see [this notebook](/docs/modules/memory/integrations/vectorstore_retriever_memory.html).
#### Chat Message History Memory #### Chat Message History Memory
For a detailed example of Redis to cache conversation message history, see [this notebook](/docs/modules/memory/integrations/redis_chat_message_history.html). For a detailed example of Redis to cache conversation message history, see [this notebook](/docs/integrations/memory/redis_chat_message_history.html).

@ -9,7 +9,7 @@
"\n", "\n",
"Natural Language API Toolkits (NLAToolkits) permit LangChain Agents to efficiently plan and combine calls across endpoints. This notebook demonstrates a sample composition of the Speak, Klarna, and Spoonacluar APIs.\n", "Natural Language API Toolkits (NLAToolkits) permit LangChain Agents to efficiently plan and combine calls across endpoints. This notebook demonstrates a sample composition of the Speak, Klarna, and Spoonacluar APIs.\n",
"\n", "\n",
"For a detailed walkthrough of the OpenAPI chains wrapped within the NLAToolkit, see the [OpenAPI Operation Chain](/docs/modules/chains/additional/openapi.html) notebook.\n", "For a detailed walkthrough of the OpenAPI chains wrapped within the NLAToolkit, see the [OpenAPI Operation Chain](/docs/use_cases/apis/openapi.html) notebook.\n",
"\n", "\n",
"### First, import dependencies and load the LLM" "### First, import dependencies and load the LLM"
] ]

@ -6,7 +6,7 @@
"source": [ "source": [
"# Spark SQL Agent\n", "# Spark SQL Agent\n",
"\n", "\n",
"This notebook shows how to use agents to interact with a Spark SQL. Similar to [SQL Database Agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/sql_database.html), it is designed to address general inquiries about Spark SQL and facilitate error recovery.\n", "This notebook shows how to use agents to interact with a Spark SQL. Similar to [SQL Database Agent](https://python.langchain.com/docs/integrations/toolkits/sql_database), it is designed to address general inquiries about Spark SQL and facilitate error recovery.\n",
"\n", "\n",
"**NOTE: Note that, as this agent is in active development, all answers might not be correct. Additionally, it is not guaranteed that the agent won't perform DML statements on your Spark cluster given certain questions. Be careful running it on sensitive data!**" "**NOTE: Note that, as this agent is in active development, all answers might not be correct. Additionally, it is not guaranteed that the agent won't perform DML statements on your Spark cluster given certain questions. Be careful running it on sensitive data!**"
] ]

@ -8,7 +8,7 @@
"source": [ "source": [
"# SQL Database Agent\n", "# SQL Database Agent\n",
"\n", "\n",
"This notebook showcases an agent designed to interact with a sql databases. The agent builds off of [SQLDatabaseChain](https://python.langchain.com/docs/modules/chains/popular/sqlite) and is designed to answer more general questions about a database, as well as recover from errors.\n", "This notebook showcases an agent designed to interact with a sql databases. The agent builds off of [SQLDatabaseChain](https://python.langchain.com/docs/use_cases/tabular/sqlite) and is designed to answer more general questions about a database, as well as recover from errors.\n",
"\n", "\n",
"Note that, as this agent is in active development, all answers might not be correct. Additionally, it is not guaranteed that the agent won't perform DML statements on your database given certain questions. Be careful running it on sensitive data!\n", "Note that, as this agent is in active development, all answers might not be correct. Additionally, it is not guaranteed that the agent won't perform DML statements on your database given certain questions. Be careful running it on sensitive data!\n",
"\n", "\n",

@ -1,566 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "6605e7f7",
"metadata": {},
"source": [
"# Extraction\n",
"\n",
"The extraction chain uses the OpenAI `functions` parameter to specify a schema to extract entities from a document. This helps us make sure that the model outputs exactly the schema of entities and properties that we want, with their appropriate types.\n",
"\n",
"The extraction chain is to be used when we want to extract several entities with their properties from the same passage (i.e. what people were mentioned in this passage?)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "34f04daf",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
" warnings.warn(\n"
]
}
],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import create_extraction_chain, create_extraction_chain_pydantic\n",
"from langchain.prompts import ChatPromptTemplate"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a2648974",
"metadata": {},
"outputs": [],
"source": [
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
]
},
{
"cell_type": "markdown",
"id": "5ef034ce",
"metadata": {},
"source": [
"## Extracting entities"
]
},
{
"cell_type": "markdown",
"id": "78ff9df9",
"metadata": {},
"source": [
"To extract entities, we need to create a schema where we specify all the properties we want to find and the type we expect them to have. We can also specify which of these properties are required and which are optional."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4ac43eba",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"name\": {\"type\": \"string\"},\n",
" \"height\": {\"type\": \"integer\"},\n",
" \"hair_color\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [\"name\", \"height\"],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "640bd005",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
" \"\"\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "64313214",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "17c48adb",
"metadata": {},
"source": [
"As we can see, we extracted the required entities and their properties in the required format (it even calculated Claudia's height before returning!)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "cc5436ed",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "8d51fcdc",
"metadata": {},
"source": [
"## Several entity types"
]
},
{
"cell_type": "markdown",
"id": "5813affe",
"metadata": {},
"source": [
"Notice that we are using OpenAI functions under the hood and thus the model can only call one function per request (with one, unique schema)"
]
},
{
"cell_type": "markdown",
"id": "511b9838",
"metadata": {},
"source": [
"If we want to extract more than one entity type, we need to introduce a little hack - we will define our properties with an included entity type. \n",
"\n",
"Following we have an example where we also want to extract dog attributes from the passage. Notice the 'person_' and 'dog_' prefixes we use for each property; this tells the model which entity type the property refers to. In this way, the model can return properties from several entity types in one single call."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "cf243a26",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [\"person_name\", \"person_height\"],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "52841fb3",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
" \"\"\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "93f904ab",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "eb074f7b",
"metadata": {},
"source": [
"People attributes and dog attributes were correctly extracted from the text in the same call"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "db3e9e17",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex',\n",
" 'person_height': 5,\n",
" 'person_hair_color': 'blonde',\n",
" 'dog_name': 'Frosty',\n",
" 'dog_breed': 'labrador'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'}]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "0273e0e2",
"metadata": {},
"source": [
"## Unrelated entities"
]
},
{
"cell_type": "markdown",
"id": "c07b3480",
"metadata": {},
"source": [
"What if our entities are unrelated? In that case, the model will return the unrelated entities in different dictionaries, allowing us to successfully extract several unrelated entity types in the same call."
]
},
{
"cell_type": "markdown",
"id": "01d98af0",
"metadata": {},
"source": [
"Notice that we use `required: []`: we need to allow the model to return **only** person attributes or **only** dog attributes for a single entity (person or dog)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "e584c993",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "ad6b105f",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"\n",
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "6bfe5a33",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "24fe09af",
"metadata": {},
"source": [
"We have each entity in its own separate dictionary, with only the appropriate attributes being returned"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "f6e1fd89",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'},\n",
" {'dog_name': 'Willow', 'dog_breed': 'German Shepherd'},\n",
" {'dog_name': 'Milo', 'dog_breed': 'border collie'}]"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "0ac466d1",
"metadata": {},
"source": [
"## Extra info for an entity"
]
},
{
"cell_type": "markdown",
"id": "d240ffc1",
"metadata": {},
"source": [
"What if.. _we don't know what we want?_ More specifically, say we know a few properties we want to extract for a given entity but we also want to know if there's any extra information in the passage. Fortunately, we don't need to structure everything - we can have unstructured extraction as well. \n",
"\n",
"We can do this by introducing another hack, namely the *extra_info* attribute - let's see an example."
]
},
{
"cell_type": "code",
"execution_count": 68,
"id": "f19685f6",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" \"dog_extra_info\": {\"type\": \"string\"},\n",
" },\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 81,
"id": "200c3477",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"\n",
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 82,
"id": "ddad7dc6",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "e5c0dbbc",
"metadata": {},
"source": [
"It is nice to know more about Willow and Milo!"
]
},
{
"cell_type": "code",
"execution_count": 83,
"id": "c22cfd30",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'},\n",
" {'dog_name': 'Willow',\n",
" 'dog_breed': 'German Shepherd',\n",
" 'dog_extra_information': 'likes to play with other dogs'},\n",
" {'dog_name': 'Milo',\n",
" 'dog_breed': 'border collie',\n",
" 'dog_extra_information': 'lives close by'}]"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "698b4c4d",
"metadata": {},
"source": [
"## Pydantic example"
]
},
{
"cell_type": "markdown",
"id": "6504a6d9",
"metadata": {},
"source": [
"We can also use a Pydantic schema to choose the required properties and types and we will set as 'Optional' those that are not strictly required.\n",
"\n",
"By using the `create_extraction_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
"\n",
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6792866b",
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional, List\n",
"from pydantic import BaseModel, Field"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "36a63761",
"metadata": {},
"outputs": [],
"source": [
"class Properties(BaseModel):\n",
" person_name: str\n",
" person_height: int\n",
" person_hair_color: str\n",
" dog_breed: Optional[str]\n",
" dog_name: Optional[str]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "8ffd1e57",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain_pydantic(pydantic_schema=Properties, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "24baa954",
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
" \"\"\""
]
},
{
"cell_type": "markdown",
"id": "84e0a241",
"metadata": {},
"source": [
"As we can see, we extracted the required entities and their properties in the required format:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "f771df58",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Properties(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed='labrador', dog_name='Frosty'),\n",
" Properties(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0df61283",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -9,7 +9,7 @@
"\n", "\n",
"LangChain provides async support for Chains by leveraging the [asyncio](https://docs.python.org/3/library/asyncio.html) library.\n", "LangChain provides async support for Chains by leveraging the [asyncio](https://docs.python.org/3/library/asyncio.html) library.\n",
"\n", "\n",
"Async methods are currently supported in `LLMChain` (through `arun`, `apredict`, `acall`) and `LLMMathChain` (through `arun` and `acall`), `ChatVectorDBChain`, and [QA chains](/docs/modules/chains/additional/question_answering.html). Async support for other chains is on the roadmap." "Async methods are currently supported in `LLMChain` (through `arun`, `apredict`, `acall`) and `LLMMathChain` (through `arun` and `acall`), `ChatVectorDBChain`, and [QA chains](/docs/use_cases/question_answering/how_to/question_answering.html). Async support for other chains is on the roadmap."
] ]
}, },
{ {

@ -494,9 +494,9 @@
"\n", "\n",
"There are a number of more specific chains that use OpenAI functions.\n", "There are a number of more specific chains that use OpenAI functions.\n",
"- [Extraction](/docs/modules/chains/additional/extraction): very similar to structured output chain, intended for information/entity extraction specifically.\n", "- [Extraction](/docs/modules/chains/additional/extraction): very similar to structured output chain, intended for information/entity extraction specifically.\n",
"- [Tagging](/docs/modules/chains/additional/tagging): tag inputs.\n", "- [Tagging](/docs/use_cases/tagging): tag inputs.\n",
"- [OpenAPI](/docs/modules/chains/additional/openapi_openai): take an OpenAPI spec and create + execute valid requests against the API, using OpenAI functions under the hood.\n", "- [OpenAPI](/docs/use_cases/apis/openapi_openai): take an OpenAPI spec and create + execute valid requests against the API, using OpenAI functions under the hood.\n",
"- [QA with citations](/docs/modules/chains/additional/qa_citations): use OpenAI functions ability to extract citations from text." "- [QA with citations](/docs/use_cases/question_answering/how_to/qa_citations): use OpenAI functions ability to extract citations from text."
] ]
} }
], ],

@ -10,7 +10,7 @@
"This notebook combines two concepts in order to build a custom agent that can interact with AI Plugins:\n", "This notebook combines two concepts in order to build a custom agent that can interact with AI Plugins:\n",
"\n", "\n",
"1. [Custom Agent with Tool Retrieval](/docs/modules/agents/how_to/custom_agent_with_tool_retrieval.html): This introduces the concept of retrieving many tools, which is useful when trying to work with arbitrarily many plugins.\n", "1. [Custom Agent with Tool Retrieval](/docs/modules/agents/how_to/custom_agent_with_tool_retrieval.html): This introduces the concept of retrieving many tools, which is useful when trying to work with arbitrarily many plugins.\n",
"2. [Natural Language API Chains](/docs/modules/chains/additional/openapi.html): This creates Natural Language wrappers around OpenAPI endpoints. This is useful because (1) plugins use OpenAPI endpoints under the hood, (2) wrapping them in an NLAChain allows the router agent to call it more easily.\n", "2. [Natural Language API Chains](/docs/use_cases/apis/openapi.html): This creates Natural Language wrappers around OpenAPI endpoints. This is useful because (1) plugins use OpenAPI endpoints under the hood, (2) wrapping them in an NLAChain allows the router agent to call it more easily.\n",
"\n", "\n",
"The novel idea introduced in this notebook is the idea of using retrieval to select not the tools explicitly, but the set of OpenAPI specs to use. We can then generate tools from those OpenAPI specs. The use case for this is when trying to get agents to use plugins. It may be more efficient to choose plugins first, then the endpoints, rather than the endpoints directly. This is because the plugins may contain more useful information for selection." "The novel idea introduced in this notebook is the idea of using retrieval to select not the tools explicitly, but the set of OpenAPI specs to use. We can then generate tools from those OpenAPI specs. The use case for this is when trying to get agents to use plugins. It may be more efficient to choose plugins first, then the endpoints, rather than the endpoints directly. This is because the plugins may contain more useful information for selection."
] ]

@ -13,7 +13,7 @@ If you are just getting started, and you have relatively simple apis, you should
Chains are a sequence of predetermined steps, so they are good to get started with as they give you more control and let you Chains are a sequence of predetermined steps, so they are good to get started with as they give you more control and let you
understand what is happening better. understand what is happening better.
- [API Chain](/docs/modules/chains/popular/api.html) - [API Chain](/docs/use_cases/apis/api.html)
## Agents ## Agents
@ -21,4 +21,4 @@ Agents are more complex, and involve multiple queries to the LLM to understand w
The downside of agents are that you have less control. The upside is that they are more powerful, The downside of agents are that you have less control. The upside is that they are more powerful,
which allows you to use them on larger and more complex schemas. which allows you to use them on larger and more complex schemas.
- [OpenAPI Agent](/docs/modules/agents/toolkits/openapi.html) - [OpenAPI Agent](/docs/integrations/toolkits/openapi.html)

@ -2,7 +2,7 @@
sidebar_position: 6 sidebar_position: 6
--- ---
# Code Understanding # Code understanding
Overview Overview

@ -0,0 +1,14 @@
# Code writing
:::warning
All program-writing chains should be treated as *VERY* experimental and should not be used in any environment where sensitive/important data is stored, as there is arbitrary code execution involved in using these.
:::
Much like humans, LLMs are great at writing out programs, but not always great at executing them. For example, they can write down complex mathematical equations far better than they can compute the results. In such cases, it is useful to combine an LLM with a program runtime, so that the LLM converts unstructured text to a program and then a simpler tool (like a calculator) actually executes the program.
In other cases, only a program can be used to access the desired information (e.g., the contents of a directory on your computer). In such cases it is again useful to let an LLM generate the code and a separate tool to execute it.
import DocCardList from "@theme/DocCardList";
<DocCardList />

@ -0,0 +1,566 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "6605e7f7",
"metadata": {},
"source": [
"# Extraction with OpenAI Functions\n",
"\n",
"The extraction chain uses the OpenAI `functions` parameter to specify a schema to extract entities from a document. This helps us make sure that the model outputs exactly the schema of entities and properties that we want, with their appropriate types.\n",
"\n",
"The extraction chain is to be used when we want to extract several entities with their properties from the same passage (i.e. what people were mentioned in this passage?)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "34f04daf",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
" warnings.warn(\n"
]
}
],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import create_extraction_chain, create_extraction_chain_pydantic\n",
"from langchain.prompts import ChatPromptTemplate"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a2648974",
"metadata": {},
"outputs": [],
"source": [
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
]
},
{
"cell_type": "markdown",
"id": "5ef034ce",
"metadata": {},
"source": [
"## Extracting entities"
]
},
{
"cell_type": "markdown",
"id": "78ff9df9",
"metadata": {},
"source": [
"To extract entities, we need to create a schema where we specify all the properties we want to find and the type we expect them to have. We can also specify which of these properties are required and which are optional."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4ac43eba",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"name\": {\"type\": \"string\"},\n",
" \"height\": {\"type\": \"integer\"},\n",
" \"hair_color\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [\"name\", \"height\"],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "640bd005",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
" \"\"\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "64313214",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "17c48adb",
"metadata": {},
"source": [
"As we can see, we extracted the required entities and their properties in the required format (it even calculated Claudia's height before returning!)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "cc5436ed",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "8d51fcdc",
"metadata": {},
"source": [
"## Several entity types"
]
},
{
"cell_type": "markdown",
"id": "5813affe",
"metadata": {},
"source": [
"Notice that we are using OpenAI functions under the hood and thus the model can only call one function per request (with one, unique schema)"
]
},
{
"cell_type": "markdown",
"id": "511b9838",
"metadata": {},
"source": [
"If we want to extract more than one entity type, we need to introduce a little hack - we will define our properties with an included entity type. \n",
"\n",
"Following we have an example where we also want to extract dog attributes from the passage. Notice the 'person_' and 'dog_' prefixes we use for each property; this tells the model which entity type the property refers to. In this way, the model can return properties from several entity types in one single call."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "cf243a26",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [\"person_name\", \"person_height\"],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "52841fb3",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
" \"\"\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "93f904ab",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "eb074f7b",
"metadata": {},
"source": [
"People attributes and dog attributes were correctly extracted from the text in the same call"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "db3e9e17",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex',\n",
" 'person_height': 5,\n",
" 'person_hair_color': 'blonde',\n",
" 'dog_name': 'Frosty',\n",
" 'dog_breed': 'labrador'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'}]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "0273e0e2",
"metadata": {},
"source": [
"## Unrelated entities"
]
},
{
"cell_type": "markdown",
"id": "c07b3480",
"metadata": {},
"source": [
"What if our entities are unrelated? In that case, the model will return the unrelated entities in different dictionaries, allowing us to successfully extract several unrelated entity types in the same call."
]
},
{
"cell_type": "markdown",
"id": "01d98af0",
"metadata": {},
"source": [
"Notice that we use `required: []`: we need to allow the model to return **only** person attributes or **only** dog attributes for a single entity (person or dog)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "e584c993",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "ad6b105f",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"\n",
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "6bfe5a33",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "24fe09af",
"metadata": {},
"source": [
"We have each entity in its own separate dictionary, with only the appropriate attributes being returned"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "f6e1fd89",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'},\n",
" {'dog_name': 'Willow', 'dog_breed': 'German Shepherd'},\n",
" {'dog_name': 'Milo', 'dog_breed': 'border collie'}]"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "0ac466d1",
"metadata": {},
"source": [
"## Extra info for an entity"
]
},
{
"cell_type": "markdown",
"id": "d240ffc1",
"metadata": {},
"source": [
"What if.. _we don't know what we want?_ More specifically, say we know a few properties we want to extract for a given entity but we also want to know if there's any extra information in the passage. Fortunately, we don't need to structure everything - we can have unstructured extraction as well. \n",
"\n",
"We can do this by introducing another hack, namely the *extra_info* attribute - let's see an example."
]
},
{
"cell_type": "code",
"execution_count": 68,
"id": "f19685f6",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" \"dog_extra_info\": {\"type\": \"string\"},\n",
" },\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 81,
"id": "200c3477",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"\n",
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 82,
"id": "ddad7dc6",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "e5c0dbbc",
"metadata": {},
"source": [
"It is nice to know more about Willow and Milo!"
]
},
{
"cell_type": "code",
"execution_count": 83,
"id": "c22cfd30",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'},\n",
" {'dog_name': 'Willow',\n",
" 'dog_breed': 'German Shepherd',\n",
" 'dog_extra_information': 'likes to play with other dogs'},\n",
" {'dog_name': 'Milo',\n",
" 'dog_breed': 'border collie',\n",
" 'dog_extra_information': 'lives close by'}]"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "698b4c4d",
"metadata": {},
"source": [
"## Pydantic example"
]
},
{
"cell_type": "markdown",
"id": "6504a6d9",
"metadata": {},
"source": [
"We can also use a Pydantic schema to choose the required properties and types and we will set as 'Optional' those that are not strictly required.\n",
"\n",
"By using the `create_extraction_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
"\n",
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6792866b",
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional, List\n",
"from pydantic import BaseModel, Field"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "36a63761",
"metadata": {},
"outputs": [],
"source": [
"class Properties(BaseModel):\n",
" person_name: str\n",
" person_height: int\n",
" person_hair_color: str\n",
" dog_breed: Optional[str]\n",
" dog_name: Optional[str]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "8ffd1e57",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain_pydantic(pydantic_schema=Properties, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "24baa954",
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
" \"\"\""
]
},
{
"cell_type": "markdown",
"id": "84e0a241",
"metadata": {},
"source": [
"As we can see, we extracted the required entities and their properties in the required format:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "f771df58",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Properties(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed='labrador', dog_name='Frosty'),\n",
" Properties(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0df61283",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,7 @@
# Analyzing graph data
Graph databases give us a powerful way to represent and query real-world relationships. There are a number of chains that make it easy to use LLMs to interact with various graph DBs.
import DocCardList from "@theme/DocCardList";
<DocCardList />

@ -5,7 +5,7 @@
"id": "88d7cc8c", "id": "88d7cc8c",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Context aware text splitting and QA / Chat\n", "# Perform context-aware text splitting\n",
"\n", "\n",
"Text splitting for vector storage often uses sentences or other delimiters [to keep related text together](https://www.pinecone.io/learn/chunking-strategies/). \n", "Text splitting for vector storage often uses sentences or other delimiters [to keep related text together](https://www.pinecone.io/learn/chunking-strategies/). \n",
"\n", "\n",
@ -327,7 +327,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.9.16" "version": "3.11.3"
}, },
"vscode": { "vscode": {
"interpreter": { "interpreter": {

@ -5,7 +5,7 @@
"id": "0f0b9afa", "id": "0f0b9afa",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# FLARE\n", "# Retrieve as you generate with FLARE\n",
"\n", "\n",
"This notebook is an implementation of Forward-Looking Active REtrieval augmented generation (FLARE).\n", "This notebook is an implementation of Forward-Looking Active REtrieval augmented generation (FLARE).\n",
"\n", "\n",
@ -56,8 +56,7 @@
"source": [ "source": [
"import os\n", "import os\n",
"\n", "\n",
"os.environ[\"SERPER_API_KEY\"] = \"\"", "os.environ[\"SERPER_API_KEY\"] = \"\"os.environ[\"OPENAI_API_KEY\"] = \"\""
"os.environ[\"OPENAI_API_KEY\"] = \"\""
] ]
}, },
{ {
@ -490,7 +489,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.9.1" "version": "3.11.3"
} }
}, },
"nbformat": 4, "nbformat": 4,

@ -5,7 +5,7 @@
"id": "ccb74c9b", "id": "ccb74c9b",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Hypothetical Document Embeddings\n", "# Improve document indexing with HyDE\n",
"This notebook goes over how to use Hypothetical Document Embeddings (HyDE), as described in [this paper](https://arxiv.org/abs/2212.10496). \n", "This notebook goes over how to use Hypothetical Document Embeddings (HyDE), as described in [this paper](https://arxiv.org/abs/2212.10496). \n",
"\n", "\n",
"At a high level, HyDE is an embedding technique that takes queries, generates a hypothetical answer, and then embeds that generated document and uses that as the final example. \n", "At a high level, HyDE is an embedding technique that takes queries, generates a hypothetical answer, and then embeds that generated document and uses that as the final example. \n",
@ -255,7 +255,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.9.1" "version": "3.11.3"
}, },
"vscode": { "vscode": {
"interpreter": { "interpreter": {

@ -5,7 +5,7 @@
"id": "3ea857b1", "id": "3ea857b1",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Running LLMs locally\n", "# Use local LLMs\n",
"\n", "\n",
"The popularity of projects like [PrivateGPT](https://github.com/imartinez/privateGPT), [llama.cpp](https://github.com/ggerganov/llama.cpp), and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally.\n", "The popularity of projects like [PrivateGPT](https://github.com/imartinez/privateGPT), [llama.cpp](https://github.com/ggerganov/llama.cpp), and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally.\n",
"\n", "\n",
@ -736,7 +736,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.9.16" "version": "3.11.3"
} }
}, },
"nbformat": 4, "nbformat": 4,

@ -5,7 +5,7 @@
"id": "9b5c258f", "id": "9b5c258f",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Question-Answering Citations\n", "# Cite sources\n",
"\n", "\n",
"This notebook shows how to use OpenAI functions ability to extract citations from text." "This notebook shows how to use OpenAI functions ability to extract citations from text."
] ]
@ -171,7 +171,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.9.1" "version": "3.11.3"
} }
}, },
"nbformat": 4, "nbformat": 4,

@ -4,7 +4,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Vector store-augmented text generation\n", "# Retrieve from vector stores directly\n",
"\n", "\n",
"This notebook walks through how to use LangChain for text generation over a vector index. This is useful if we want to generate text that is able to draw from a large body of custom text, for example, generating blog posts that have an understanding of previous blog posts written, or product tutorials that can refer to product documentation." "This notebook walks through how to use LangChain for text generation over a vector index. This is useful if we want to generate text that is able to draw from a large body of custom text, for example, generating blog posts that have an understanding of previous blog posts written, or product tutorials that can refer to product documentation."
] ]

@ -2,160 +2,116 @@
sidebar_position: 0 sidebar_position: 0
--- ---
# QA and Chat over Documents # QA over Documents
Chat and Question-Answering (QA) over `data` are popular LLM use-cases. ## Use case
Suppose you have some text documents (PDF, blog, Notion pages, etc.) and want to ask questions related to the contents of those documents. LLMs, given their proficiency in understanding text, are a great tool for this.
`data` can include many things, including: In this walkthrough we'll go over how to build a question-answering over documents application using LLMs. Two very related use cases which we cover elsewhere are:
- [QA over structured data](/docs/use_cases/tabular) (e.g., SQL)
* `Unstructured data` (e.g., PDFs) - [QA over code](/docs/use_cases/code) (e.g., Python)
* `Structured data` (e.g., SQL)
* `Code` (e.g., Python)
LangChain supports Chat and QA on various `data` types:
* See [here](https://python.langchain.com/docs/use_cases/code/) and [here](https://twitter.com/cristobal_dev/status/1675745314592915456?s=20) for `Code`
* See [here](https://python.langchain.com/docs/use_cases/tabular) for `Structured data`
Below we will review Chat and QA on `Unstructured data`.
![intro.png](/img/qa_intro.png) ![intro.png](/img/qa_intro.png)
`Unstructured data` can be loaded from many sources. ## Overview
The pipeline for converting raw unstructured data into a QA chain looks like this:
Use the [LangChain integration hub](https://integrations.langchain.com/) to browse the full set of loaders. 1. `Loading`: First we need to load our data. Unstructured data can be loaded from many sources. Use the [LangChain integration hub](https://integrations.langchain.com/) to browse the full set of loaders.
Each loader returns data as a LangChain [`Document`](https://docs.langchain.com/docs/components/schema/document). Each loader returns data as a LangChain [`Document`](https://docs.langchain.com/docs/components/schema/document).
2. `Splitting`: [Text splitters](/docs/modules/data_connection/document_transformers/) break `Documents` into splits of specified size
`Documents` are turned into a Chat or QA app following the general steps below: 3. `Storage`: Storage (e.g., often a [vectorstore](/docs/modules/data_connection/vectorstores/)) will house [and often embed](https://www.pinecone.io/learn/vector-embeddings/) the splits
4. `Retrieval`: The app retrieves splits from storage (e.g., often [with similar embeddings](https://www.pinecone.io/learn/k-nearest-neighbor/) to the input question)
* `Splitting`: [Text splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/) break `Documents` into splits of specified size 5. `Generation`: An [LLM](/docs/modules/model_io/models/llms/) produces an answer using a prompt that includes the question and the retrieved data
* `Storage`: Storage (e.g., often a [vectorstore](https://python.langchain.com/docs/modules/data_connection/vectorstores/)) will house [and often embed](https://www.pinecone.io/learn/vector-embeddings/) the splits 6. `Conversation` (Extension): Hold a multi-turn conversation by adding [Memory](/docs/modules/memory/) to your QA chain.
* `Retrieval`: The app retrieves splits from storage (e.g., often [with similar embeddings](https://www.pinecone.io/learn/k-nearest-neighbor/) to the input question)
* `Output`: An [LLM](https://python.langchain.com/docs/modules/model_io/models/llms/) produces an answer using a prompt that includes the question and the retrieved splits
![flow.jpeg](/img/qa_flow.jpeg) ![flow.jpeg](/img/qa_flow.jpeg)
## Quickstart ## Quickstart
To give you a sneak preview, the above pipeline can be all be wrapped in a single object: `VectorstoreIndexCreator`. Suppose we want a QA app over this [blog post](https://lilianweng.github.io/posts/2023-06-23-agent/). We can create this in a few lines of code:
The above pipeline can be wrapped with a `VectorstoreIndexCreator`.
In particular:
* Specify a `Document` loader First set environment variables and install packages:
* The `splitting`, `storage`, `retrieval`, and `output` generation stages are wrapped ```bash
pip install openai chromadb
Let's load this [blog post](https://lilianweng.github.io/posts/2023-06-23-agent/) on agents as an example `Document`.
We have a QA app in a few lines of code.
Set environment variables and get packages:
```python
pip install openai
pip install chromadb
export OPENAI_API_KEY="..." export OPENAI_API_KEY="..."
``` ```
Run: Then run:
```python ```python
from langchain.document_loaders import WebBaseLoader from langchain.document_loaders import WebBaseLoader
from langchain.indexes import VectorstoreIndexCreator from langchain.indexes import VectorstoreIndexCreator
# Document loader
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/") loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
# Index that wraps above steps
index = VectorstoreIndexCreator().from_loaders([loader]) index = VectorstoreIndexCreator().from_loaders([loader])
# Question-answering
question = "What is Task Decomposition?"
index.query(question)
``` ```
And now ask your questions:
```python
index.query("What is Task Decomposition?")
```
' Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done using LLM with simple prompting, task-specific instructions, or human inputs. Tree of Thoughts (Yao et al. 2023) is an example of a task decomposition technique that explores multiple reasoning possibilities at each step and generates multiple thoughts per step, creating a tree structure.' ' Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done using LLM with simple prompting, task-specific instructions, or human inputs. Tree of Thoughts (Yao et al. 2023) is an example of a task decomposition technique that explores multiple reasoning possibilities at each step and generates multiple thoughts per step, creating a tree structure.'
Ok, but what's going on under the hood, and how could we customize this for our specific use case? For that, let's take a look at how we can construct this pipeline piece by piece.
## Step 1. Load
Of course, some users do not want this level of abstraction. Specify a `DocumentLoader` to load in your unstructured data as `Documents`. A `Document` is a piece of text (the `page_content`) and associated metadata.
Below, we will discuss each stage in more detail.
## 1. Loading, Splitting, Storage
### 1.1 Getting started
Specify a `Document` loader.
```python ```python
# Document loader
from langchain.document_loaders import WebBaseLoader from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/") loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load() data = loader.load()
``` ```
Split the `Document` into chunks for embedding and vector storage. ### Go deeper
- Browse the > 120 data loader integrations [here](https://integrations.langchain.com/).
- See further documentation on loaders [here](/docs/modules/data_connection/document_loaders/).
## Step 2. Split
Split the `Document` into chunks for embedding and vector storage.
```python ```python
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0) text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
all_splits = text_splitter.split_documents(data) all_splits = text_splitter.split_documents(data)
``` ```
Embed and store the splits in a vector database ([Chroma](https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/chroma)). ### Go deeper
```python
# Store
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
vectorstore = Chroma.from_documents(documents=all_splits,embedding=OpenAIEmbeddings())
```
Here are the three pieces together:
![lc.png](/img/qa_data_load.png) - `DocumentSplitters` are just one type of the more generic `DocumentTransformers`, which can all be useful in this preprocessing step.
- See further documentation on transformers [here](/docs/modules/data_connection/document_transformers/).
### 1.2 Going Deeper - `Context-aware splitters` keep the location ("context") of each split in the original `Document`:
- [Markdown files](/docs/use_cases/question_answering/document-context-aware-QA)
#### 1.2.1 Integrations - [Code (py or js)](/docs/modules/data_connection/document_loaders/integrations/source_code)
- [Documents](/docs/modules/data_connection/document_loaders/integrations/grobid)
`Document Loaders`
* Browse the > 120 data loader integrations [here](https://integrations.langchain.com/).
* See further documentation on loaders [here](https://python.langchain.com/docs/modules/data_connection/document_loaders/). ## Step 3. Store
`Document Transformers` To be able to look up our document splits, we first need to store them where we can later look them up.
The most common way to do this is to embed the contents of each document then store the embedding and document in a vector store, with the embedding being used to index the document.
* All can ingest loaded `Documents` and process them (e.g., split). ```python
from langchain.embeddings import OpenAIEmbeddings
* See further documentation on transformers [here](https://python.langchain.com/docs/modules/data_connection/document_transformers/). from langchain.vectorstores import Chroma
`Vectorstores` vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())
```
* Browse the > 35 vectorstores integrations [here](https://integrations.langchain.com/).
* See further documentation on vectorstores [here](https://python.langchain.com/docs/modules/data_connection/vectorstores/).
#### 1.2.2 Retaining metadata
`Context-aware splitters` keep the location ("context") of each split in the original `Document`: ### Go deeper
- Browse the > 40 vectorstores integrations [here](https://integrations.langchain.com/).
- See further documentation on vectorstores [here](/docs/modules/data_connection/vectorstores/).
- Browse the > 30 text embedding integrations [here](https://integrations.langchain.com/).
- See further documentation on embedding models [here](/docs/modules/data_connection/text_embedding/).
* [Markdown files](https://python.langchain.com/docs/use_cases/question_answering/document-context-aware-QA) Here are Steps 1-3:
* [Code (py or js)](https://python.langchain.com/docs/modules/data_connection/document_loaders/integrations/source_code)
* [Documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/integrations/grobid)
## 2. Retrieval ![lc.png](/img/qa_data_load.png)
### 2.1 Getting started ## Step 4. Retrieve
Retrieve [relevant splits](https://www.pinecone.io/learn/what-is-similarity-search/) for any question using `similarity_search`.
Retrieve relevant splits for any question using [similarity search](https://www.pinecone.io/learn/what-is-similarity-search/).
```python ```python
question = "What are the approaches to Task Decomposition?" question = "What are the approaches to Task Decomposition?"
@ -163,60 +119,39 @@ docs = vectorstore.similarity_search(question)
len(docs) len(docs)
``` ```
4 4
### Go deeper
Vectorstores are commonly used for retrieval, but they are not the only option. For example, SVMs (see thread [here](https://twitter.com/karpathy/status/1647025230546886658?s=20)) can also be used.
### 2.2 Going Deeper LangChain [has many retrievers](/docs/modules/data_connection/retrievers/) including, but not limited to, vectorstores. All retrievers implement a common method `get_relevant_documents()` (and its asynchronous variant `aget_relevant_documents()`).
#### 2.2.1 Retrieval
Vectorstores are commonly used for retrieval.
But, they are not the only option.
For example, SVMs (see thread [here](https://twitter.com/karpathy/status/1647025230546886658?s=20)) can also be used.
LangChain [has many retrievers](https://python.langchain.com/docs/modules/data_connection/retrievers/) including, but not limited to, vectorstores.
All retrievers implement some common methods, such as `get_relevant_documents()`.
```python ```python
from langchain.retrievers import SVMRetriever from langchain.retrievers import SVMRetriever
svm_retriever = SVMRetriever.from_documents(all_splits,OpenAIEmbeddings()) svm_retriever = SVMRetriever.from_documents(all_splits,OpenAIEmbeddings())
docs_svm=svm_retriever.get_relevant_documents(question) docs_svm=svm_retriever.get_relevant_documents(question)
len(docs_svm) len(docs_svm)
``` ```
4 4
Some common ways to improve on vector similarity search include:
- `MultiQueryRetriever` [generates variants of the input question](/docs/modules/data_connection/retrievers/how_to/MultiQueryRetriever) to improve retrieval.
#### 2.2.2 Advanced retrieval - `Max marginal relevance` selects for [relevance and diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf) among the retrieved documents.
- Documents can be filtered during retrieval using [`metadata` filters](/docs/use_cases/question_answering/document-context-aware-QA).
Improve on `similarity_search`:
* `MultiQueryRetriever` [generates variants of the input question](https://python.langchain.com/docs/modules/data_connection/retrievers/how_to/MultiQueryRetriever) to improve retrieval.
* `Max marginal relevance` selects for [relevance and diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf) among the retrieved documents.
* Documents can be filtered during retrieval using [`metadata` filters](https://python.langchain.com/docs/use_cases/question_answering/document-context-aware-QA).
```python ```python
# MultiQueryRetriever
import logging import logging
from langchain.chat_models import ChatOpenAI from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever from langchain.retrievers.multi_query import MultiQueryRetriever
logging.basicConfig() logging.basicConfig()
logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO) logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)
retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectorstore.as_retriever(), retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectorstore.as_retriever(),
llm=ChatOpenAI(temperature=0)) llm=ChatOpenAI(temperature=0))
unique_docs = retriever_from_llm.get_relevant_documents(query=question) unique_docs = retriever_from_llm.get_relevant_documents(query=question)
@ -226,79 +161,48 @@ len(unique_docs)
INFO:langchain.retrievers.multi_query:Generated queries: ['1. How can Task Decomposition be approached?', '2. What are the different methods for Task Decomposition?', '3. What are the various approaches to decomposing tasks?'] INFO:langchain.retrievers.multi_query:Generated queries: ['1. How can Task Decomposition be approached?', '2. What are the different methods for Task Decomposition?', '3. What are the various approaches to decomposing tasks?']
5 5
## Step 5. Generate
Distill the retrieved documents into an answer using an LLM/Chat model (e.g., `gpt-3.5-turbo`) with `RetrievalQA` chain.
## 3. QA
### 3.1 Getting started
Distill the retrieved documents into an answer using an LLM (e.g., `gpt-3.5-turbo`) with `RetrievalQA` chain.
```python ```python
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
from langchain.chains import RetrievalQA from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever()) from langchain.chat_models import ChatOpenAI
qa_chain({"query": question})
```
{'query': 'What are the approaches to Task Decomposition?',
'result': 'The approaches to task decomposition include:\n\n1. Simple prompting: This approach involves using simple prompts or questions to guide the agent in breaking down a task into smaller subgoals. For example, the agent can be prompted with "Steps for XYZ" and asked to list the subgoals for achieving XYZ.\n\n2. Task-specific instructions: In this approach, task-specific instructions are provided to the agent to guide the decomposition process. For example, if the task is to write a novel, the agent can be instructed to "Write a story outline" as a subgoal.\n\n3. Human inputs: This approach involves incorporating human inputs in the task decomposition process. Humans can provide guidance, feedback, and suggestions to help the agent break down complex tasks into manageable subgoals.\n\nThese approaches aim to enable efficient handling of complex tasks by breaking them down into smaller, more manageable parts.'}
### 3.2 Going Deeper
#### 3.2.1 Integrations
`LLMs`
* Browse the > 55 LLM integrations [here](https://integrations.langchain.com/).
* See further documentation on LLMs [here](https://python.langchain.com/docs/modules/model_io/models/).
#### 3.2.2 Running LLMs locally
The popularity of [PrivateGPT](https://github.com/imartinez/privateGPT) and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally.
LangChain has integrations with many open source LLMs that can be run locally.
Using `GPT4All` is as simple as [downloading the binary]((https://python.langchain.com/docs/integrations/llms/gpt4all)) and then:
```python llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
from langchain.llms import GPT4All
from langchain.chains import RetrievalQA
llm = GPT4All(model="/Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin",max_tokens=2048)
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever()) qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever())
```
```python
qa_chain({"query": question}) qa_chain({"query": question})
``` ```
{
'query': 'What are the approaches to Task Decomposition?',
'result': 'The approaches to task decomposition include:\n\n1. Simple prompting: This approach involves using simple prompts or questions to guide the agent in breaking down a task into smaller subgoals. For example, the agent can be prompted with "Steps for XYZ" and asked to list the subgoals for achieving XYZ.\n\n2. Task-specific instructions: In this approach, task-specific instructions are provided to the agent to guide the decomposition process. For example, if the task is to write a novel, the agent can be instructed to "Write a story outline" as a subgoal.\n\n3. Human inputs: This approach involves incorporating human inputs in the task decomposition process. Humans can provide guidance, feedback, and suggestions to help the agent break down complex tasks into manageable subgoals.\n\nThese approaches aim to enable efficient handling of complex tasks by breaking them down into smaller, more manageable parts.'
}
Note, you can pass in an `LLM` or a `ChatModel` (like we did here) to the `RetrievalQA` chain.
### Go deeper
{'query': 'What are the approaches to Task Decomposition?', #### Choosing LLMs
'result': ' There are three main approaches to task decomposition: (1) using language models like GPT-3 for simple prompting such as "Steps for XYZ.\\n1.", (2) using task-specific instructions, and (3) with human inputs.'} - Browse the > 55 LLM and chat model integrations [here](https://integrations.langchain.com/).
- See further documentation on LLMs and chat models [here](/docs/modules/model_io/models/).
- Use local LLMS: The popularity of [PrivateGPT](https://github.com/imartinez/privateGPT) and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally.
Using `GPT4All` is as simple as [downloading the binary]((/docs/integrations/llms/gpt4all)) and then:
from langchain.llms import GPT4All
from langchain.chains import RetrievalQA
llm = GPT4All(model="/Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin",max_tokens=2048)
qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever())
#### 3.2.2 Customizing the prompt #### Customizing the prompt
The prompt in `RetrievalQA` chain can be easily customized. The prompt in `RetrievalQA` chain can be easily customized.
```python ```python
# Build prompt from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate from langchain.prompts import PromptTemplate
template = """Use the following pieces of context to answer the question at the end. template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer. If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible. Use three sentences maximum and keep the answer as concise as possible.
@ -306,33 +210,28 @@ Always say "thanks for asking!" at the end of the answer.
{context} {context}
Question: {question} Question: {question}
Helpful Answer:""" Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,) QA_CHAIN_PROMPT = PromptTemplate.from_template(template)
# Run chain
from langchain.chains import RetrievalQA
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0) llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm, qa_chain = RetrievalQA.from_chain_type(
retriever=vectorstore.as_retriever(), llm,
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}) retriever=vectorstore.as_retriever(),
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)
result = qa_chain({"query": question}) result = qa_chain({"query": question})
result["result"] result["result"]
``` ```
'The approaches to Task Decomposition are (1) using simple prompting by LLM, (2) using task-specific instructions, and (3) with human inputs. Thanks for asking!' 'The approaches to Task Decomposition are (1) using simple prompting by LLM, (2) using task-specific instructions, and (3) with human inputs. Thanks for asking!'
#### Return source documents
#### 3.2.3 Returning source documents
The full set of retrieved documents used for answer distillation can be returned using `return_source_documents=True`. The full set of retrieved documents used for answer distillation can be returned using `return_source_documents=True`.
```python ```python
from langchain.chains import RetrievalQA from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever(), qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever(),
return_source_documents=True) return_source_documents=True)
result = qa_chain({"query": question}) result = qa_chain({"query": question})
@ -345,51 +244,46 @@ result['source_documents'][0]
#### 3.2.4 Citations #### Return citations
Answer citations can be returned using `RetrievalQAWithSourcesChain`. Answer citations can be returned using `RetrievalQAWithSourcesChain`.
```python ```python
from langchain.chains import RetrievalQAWithSourcesChain from langchain.chains import RetrievalQAWithSourcesChain
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=vectorstore.as_retriever()) qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=vectorstore.as_retriever())
result = qa_chain({"question": question}) result = qa_chain({"question": question})
result result
``` ```
{
'question': 'What are the approaches to Task Decomposition?',
'answer': 'The approaches to Task Decomposition include (1) using LLM with simple prompting, (2) using task-specific instructions, and (3) incorporating human inputs.\n',
'sources': 'https://lilianweng.github.io/posts/2023-06-23-agent/'
}
#### Customizing retrieved document processing
{'question': 'What are the approaches to Task Decomposition?',
'answer': 'The approaches to Task Decomposition include (1) using LLM with simple prompting, (2) using task-specific instructions, and (3) incorporating human inputs.\n',
'sources': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
#### 3.2.5 Customizing retrieved docs in the LLM prompt
Retrieved documents can be fed to an LLM for answer distillation in a few different ways. Retrieved documents can be fed to an LLM for answer distillation in a few different ways.
`stuff`, `refine`, `map-reduce`, and `map-rerank` chains for passing documents to an LLM prompt are well summarized [here](https://python.langchain.com/docs/modules/chains/document/). `stuff`, `refine`, `map-reduce`, and `map-rerank` chains for passing documents to an LLM prompt are well summarized [here](/docs/modules/chains/document/).
`stuff` is commonly used because it simply "stuffs" all retrieved documents into the prompt. `stuff` is commonly used because it simply "stuffs" all retrieved documents into the prompt.
The [load_qa_chain](https://python.langchain.com/docs/modules/chains/additional/question_answering.html) is an easy way to pass documents to an LLM using these various approaches (e.g., see `chain_type`). The [load_qa_chain](/docs/use_cases/question_answering/how_to/question_answering.html) is an easy way to pass documents to an LLM using these various approaches (e.g., see `chain_type`).
```python ```python
from langchain.chains.question_answering import load_qa_chain from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff") chain = load_qa_chain(llm, chain_type="stuff")
chain({"input_documents": unique_docs, "question": question},return_only_outputs=True) chain({"input_documents": unique_docs, "question": question},return_only_outputs=True)
``` ```
{'output_text': 'The approaches to task decomposition include (1) using simple prompting to break down tasks into subgoals, (2) providing task-specific instructions to guide the decomposition process, and (3) incorporating human inputs for task decomposition.'} {'output_text': 'The approaches to task decomposition include (1) using simple prompting to break down tasks into subgoals, (2) providing task-specific instructions to guide the decomposition process, and (3) incorporating human inputs for task decomposition.'}
We can also pass the `chain_type` to `RetrievalQA`. We can also pass the `chain_type` to `RetrievalQA`.
@ -403,55 +297,46 @@ In summary, the user can choose the desired level of abstraction for QA:
![summary_chains.png](/img/summary_chains.png) ![summary_chains.png](/img/summary_chains.png)
## 4. Chat ## Step 6. Converse (Extension)
### 4.1 Getting started
To keep chat history, first specify a `Memory buffer` to track the conversation inputs / outputs.
To hold a conversation, a chain needs to be able to refer to past interactions. Chain `Memory` allows us to do this. To keep chat history, we can specify a Memory buffer to track the conversation inputs / outputs.
```python ```python
from langchain.memory import ConversationBufferMemory from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
``` ```
The `ConversationalRetrievalChain` uses chat in the `Memory buffer`. The `ConversationalRetrievalChain` uses chat in the `Memory buffer`.
```python ```python
from langchain.chains import ConversationalRetrievalChain from langchain.chains import ConversationalRetrievalChain
retriever=vectorstore.as_retriever()
chat = ConversationalRetrievalChain.from_llm(llm,retriever=retriever,memory=memory)
```
retriever = vectorstore.as_retriever()
chat = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)
```
```python ```python
result = chat({"question": "What are some of the main ideas in self-reflection?"}) result = chat({"question": "What are some of the main ideas in self-reflection?"})
result['answer'] result['answer']
``` ```
"Some of the main ideas in self-reflection include:\n1. Iterative improvement: Self-reflection allows autonomous agents to improve by refining past action decisions and correcting mistakes.\n2. Trial and error: Self-reflection is crucial in real-world tasks where trial and error are inevitable.\n3. Two-shot examples: Self-reflection is created by showing pairs of failed trajectories and ideal reflections for guiding future changes in the plan.\n4. Working memory: Reflections are added to the agent's working memory, up to three, to be used as context for querying.\n5. Performance evaluation: Self-reflection involves continuously reviewing and analyzing actions, self-criticizing behavior, and reflecting on past decisions and strategies to refine approaches.\n6. Efficiency: Self-reflection encourages being smart and efficient, aiming to complete tasks in the least number of steps." "Some of the main ideas in self-reflection include:\n1. Iterative improvement: Self-reflection allows autonomous agents to improve by refining past action decisions and correcting mistakes.\n2. Trial and error: Self-reflection is crucial in real-world tasks where trial and error are inevitable.\n3. Two-shot examples: Self-reflection is created by showing pairs of failed trajectories and ideal reflections for guiding future changes in the plan.\n4. Working memory: Reflections are added to the agent's working memory, up to three, to be used as context for querying.\n5. Performance evaluation: Self-reflection involves continuously reviewing and analyzing actions, self-criticizing behavior, and reflecting on past decisions and strategies to refine approaches.\n6. Efficiency: Self-reflection encourages being smart and efficient, aiming to complete tasks in the least number of steps."
The Memory buffer has context to resolve `"it"` ("self-reflection") in the below question.
The `Memory buffer` has context to resolve `"it"` ("self-reflection") in the below question.
```python ```python
result = chat({"question": "How does the Reflexion paper handle it?"}) result = chat({"question": "How does the Reflexion paper handle it?"})
result['answer'] result['answer']
``` ```
"The Reflexion paper handles self-reflection by showing two-shot examples to the Learning Language Model (LLM). Each example consists of a failed trajectory and an ideal reflection that guides future changes in the agent's plan. These reflections are then added to the agent's working memory, up to a maximum of three, to be used as context for querying the LLM. This allows the agent to iteratively improve its reasoning skills by refining past action decisions and correcting previous mistakes." "The Reflexion paper handles self-reflection by showing two-shot examples to the Learning Language Model (LLM). Each example consists of a failed trajectory and an ideal reflection that guides future changes in the agent's plan. These reflections are then added to the agent's working memory, up to a maximum of three, to be used as context for querying the LLM. This allows the agent to iteratively improve its reasoning skills by refining past action decisions and correcting previous mistakes."
### Go deeper
The [documentation](/docs/use_cases/question_answering/how_to/chat_vector_db) on `ConversationalRetrievalChain` offers a few extensions, such as streaming and source documents.
### 4.2 Going deeper
The [documentation](https://python.langchain.com/docs/modules/chains/popular/chat_vector_db) on `ConversationalRetrievalChain` offers a few extensions, such as streaming and source documents. ## Further reading
- Check out the [How to](/docs/use_cases/question_answer/how_to/) section for all the variations of chains that can be used for QA over docs in different settings.
- Check out the [Integrations-specific](/docs/use_cases/question_answer/integrations/) section for chains that use specific integrations.

@ -5,7 +5,7 @@
"id": "71a43144", "id": "71a43144",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Retrieval QA using OpenAI functions\n", "# Structure answers with OpenAI functions\n",
"\n", "\n",
"OpenAI functions allows for structuring of response output. This is often useful in question answering when you want to not only get the final answer but also supporting evidence, citations, etc.\n", "OpenAI functions allows for structuring of response output. This is often useful in question answering when you want to not only get the final answer but also supporting evidence, citations, etc.\n",
"\n", "\n",
@ -337,7 +337,6 @@
] ]
}, },
{ {
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"id": "ac9e4626", "id": "ac9e4626",
"metadata": {}, "metadata": {},
@ -431,7 +430,7 @@
], ],
"metadata": { "metadata": {
"kernelspec": { "kernelspec": {
"display_name": "Python 3", "display_name": "Python 3 (ipykernel)",
"language": "python", "language": "python",
"name": "python3" "name": "python3"
}, },
@ -445,7 +444,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.5" "version": "3.11.3"
} }
}, },
"nbformat": 4, "nbformat": 4,

@ -1,18 +1,16 @@
{ {
"cells": [ "cells": [
{ {
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Question answering over a group chat messages using Activeloop's DeepLake\n", "# QA using Activeloop's DeepLake\n",
"In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT4 to semantically search and ask questions over a group chat.\n", "In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT4 to semantically search and ask questions over a group chat.\n",
"\n", "\n",
"View a working demo [here](https://twitter.com/thisissukh_/status/1647223328363679745)" "View a working demo [here](https://twitter.com/thisissukh_/status/1647223328363679745)"
] ]
}, },
{ {
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
@ -29,7 +27,6 @@
] ]
}, },
{ {
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
@ -37,7 +34,6 @@
] ]
}, },
{ {
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [] "source": []
@ -73,7 +69,6 @@
] ]
}, },
{ {
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
@ -83,7 +78,6 @@
] ]
}, },
{ {
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
@ -124,7 +118,6 @@
] ]
}, },
{ {
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
@ -155,7 +148,6 @@
] ]
}, },
{ {
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
@ -213,7 +205,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.9.1" "version": "3.11.3"
} }
}, },
"nbformat": 4, "nbformat": 4,

@ -0,0 +1,8 @@
# Self-checking
One of the main issues with using LLMs is that they can often hallucinate and make false claims. One of the surprisingly effective ways to remediate this is to use the LLM itself to check its own answers.
import DocCardList from "@theme/DocCardList";
<DocCardList />

@ -16,7 +16,7 @@ chain.run(docs)
``` ```
The following resources exist: The following resources exist:
- [Summarization notebook](/docs/modules/chains/popular/summarize.html): A notebook walking through how to accomplish this task. - [Summarization notebook](/docs/use_cases/summarization/summarize.html): A notebook walking through how to accomplish this task.
Additional related resources include: Additional related resources include:
- [Modules for working with documents](/docs/modules/data_connection): Core components for working with documents. - [Modules for working with documents](/docs/modules/data_connection): Core components for working with documents.

@ -10,7 +10,7 @@ This page covers all resources available in LangChain for working with data in t
## Document loading ## Document loading
If you have text data stored in a tabular format, you may want to load the data into a Document and then index it as you would If you have text data stored in a tabular format, you may want to load the data into a Document and then index it as you would
other text/unstructured data. For this, you should use a document loader like the [CSVLoader](/docs/modules/data_connection/document_loaders/how_to/csv.html) other text/unstructured data. For this, you should use a document loader like the [CSVLoader](/docs/modules/data_connection/document_loaders/how_to/csv.html)
and then you should [create an index](/docs/modules/data_connection) over that data, and [query it that way](/docs/modules/chains/popular/vector_db_qa.html). and then you should [create an index](/docs/modules/data_connection) over that data, and [query it that way](/docs/use_cases/question_answering/how_to/vector_db_qa.html).
## Querying ## Querying
If you have more numeric tabular data, or have a large amount of data and don't want to index it, you should get started If you have more numeric tabular data, or have a large amount of data and don't want to index it, you should get started
@ -22,7 +22,7 @@ If you are just getting started, and you have relatively small/simple tabular da
Chains are a sequence of predetermined steps, so they are good to get started with as they give you more control and let you Chains are a sequence of predetermined steps, so they are good to get started with as they give you more control and let you
understand what is happening better. understand what is happening better.
- [SQL Database Chain](/docs/modules/chains/popular/sqlite.html) - [SQL Database Chain](/docs/use_cases/tabular/sqlite.html)
### Agents ### Agents
@ -30,6 +30,6 @@ Agents are more complex, and involve multiple queries to the LLM to understand w
The downside of agents are that you have less control. The upside is that they are more powerful, The downside of agents are that you have less control. The upside is that they are more powerful,
which allows you to use them on larger databases and more complex schemas. which allows you to use them on larger databases and more complex schemas.
- [SQL Agent](/docs/modules/agents/toolkits/sql_database.html) - [SQL Agent](/docs/integrations/toolkits/sql_database.html)
- [Pandas Agent](/docs/modules/agents/toolkits/pandas.html) - [Pandas Agent](/docs/integrations/toolkits/pandas.html)
- [CSV Agent](/docs/modules/agents/toolkits/csv.html) - [CSV Agent](/docs/integrations/toolkits/csv.html)

@ -1,107 +0,0 @@
```python
from langchain.chains.router import MultiPromptChain
from langchain.llms import OpenAI
```
```python
physics_template = """You are a very smart physics professor. \
You are great at answering questions about physics in a concise and easy to understand manner. \
When you don't know the answer to a question you admit that you don't know.
Here is a question:
{input}"""
math_template = """You are a very good mathematician. You are great at answering math questions. \
You are so good because you are able to break down hard problems into their component parts, \
answer the component parts, and then put them together to answer the broader question.
Here is a question:
{input}"""
```
```python
prompt_infos = [
{
"name": "physics",
"description": "Good for answering questions about physics",
"prompt_template": physics_template
},
{
"name": "math",
"description": "Good for answering math questions",
"prompt_template": math_template
}
]
```
```python
chain = MultiPromptChain.from_prompts(OpenAI(), prompt_infos, verbose=True)
```
```python
print(chain.run("What is black body radiation?"))
```
<CodeOutputBlock lang="python">
```
> Entering new MultiPromptChain chain...
physics: {'input': 'What is black body radiation?'}
> Finished chain.
Black body radiation is the emission of electromagnetic radiation from a body due to its temperature. It is a type of thermal radiation that is emitted from the surface of all objects that are at a temperature above absolute zero. It is a spectrum of radiation that is influenced by the temperature of the body and is independent of the composition of the emitting material.
```
</CodeOutputBlock>
```python
print(chain.run("What is the first prime number greater than 40 such that one plus the prime number is divisible by 3"))
```
<CodeOutputBlock lang="python">
```
> Entering new MultiPromptChain chain...
math: {'input': 'What is the first prime number greater than 40 such that one plus the prime number is divisible by 3'}
> Finished chain.
?
The first prime number greater than 40 such that one plus the prime number is divisible by 3 is 43. To solve this problem, we can break down the question into two parts: finding the first prime number greater than 40, and then finding a number that is divisible by 3.
The first step is to find the first prime number greater than 40. A prime number is a number that is only divisible by 1 and itself. The next prime number after 40 is 41.
The second step is to find a number that is divisible by 3. To do this, we can add 1 to 41, which gives us 42. Now, we can check if 42 is divisible by 3. 42 divided by 3 is 14, so 42 is divisible by 3.
Therefore, the answer to the question is 43.
```
</CodeOutputBlock>
```python
print(chain.run("What is the name of the type of cloud that rins"))
```
<CodeOutputBlock lang="python">
```
> Entering new MultiPromptChain chain...
None: {'input': 'What is the name of the type of cloud that rains?'}
> Finished chain.
The type of cloud that typically produces rain is called a cumulonimbus cloud. This type of cloud is characterized by its large vertical extent and can produce thunderstorms and heavy precipitation. Is there anything else you'd like to know?
```
</CodeOutputBlock>
Loading…
Cancel
Save