Add summarization use-case (#8376)

Co-authored-by: Bagatur <baskaryan@gmail.com>
pull/8650/head
Lance Martin 1 year ago committed by GitHub
parent ee1d13678e
commit 59194c2214
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,8 +0,0 @@
# Summarization
A summarization chain can be used to summarize multiple documents. One way is to input multiple smaller documents, after they have been divided into chunks, and operate over them with a MapReduceDocumentsChain. You can also choose instead for the chain that does summarization to be a StuffDocumentsChain, or a RefineDocumentsChain.
import Example from "@snippets/modules/chains/popular/summarize.mdx"
<Example/>

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

@ -1,436 +1,440 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# OpenSearch\n",
"\n",
"> [OpenSearch](https://opensearch.org/) is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications licensed under Apache 2.0. `OpenSearch` is a distributed search and analytics engine based on `Apache Lucene`.\n",
"\n",
"\n",
"This notebook shows how to use functionality related to the `OpenSearch` database.\n",
"\n",
"To run, you should have an OpenSearch instance up and running: [see here for an easy Docker installation](https://hub.docker.com/r/opensearchproject/opensearch).\n",
"\n",
"`similarity_search` by default performs the Approximate k-NN Search which uses one of the several algorithms like lucene, nmslib, faiss recommended for\n",
"large datasets. To perform brute force search we have other search methods known as Script Scoring and Painless Scripting.\n",
"Check [this](https://opensearch.org/docs/latest/search-plugins/knn/index/) for more details."
]
},
{
"cell_type": "markdown",
"id": "94963977-9dfc-48b7-872a-53f2947f46c6",
"metadata": {},
"source": [
"## Installation\n",
"Install the Python client."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e606066-9386-4427-8a87-1b93f435c57e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install opensearch-py"
]
},
{
"cell_type": "markdown",
"id": "b1fa637e-4fbf-4d5a-9188-2cad826a193e",
"metadata": {},
"source": [
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "28e5455e-322d-4010-9e3b-491d522ef5db",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import OpenSearchVectorSearch\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "markdown",
"id": "01a9a035",
"metadata": {},
"source": [
"## similarity_search using Approximate k-NN\n",
"\n",
"`similarity_search` using `Approximate k-NN` Search with Custom Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "803fe12b",
"metadata": {},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs, embeddings, opensearch_url=\"http://localhost:9200\"\n",
")\n",
"\n",
"# If using the default Docker installation, use this instantiation instead:\n",
"# docsearch = OpenSearchVectorSearch.from_documents(\n",
"# docs,\n",
"# embeddings,\n",
"# opensearch_url=\"https://localhost:9200\",\n",
"# http_auth=(\"admin\", \"admin\"),\n",
"# use_ssl = False,\n",
"# verify_certs = False,\n",
"# ssl_assert_hostname = False,\n",
"# ssl_show_warn = False,\n",
"# )"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db3fa309",
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query, k=10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c160d5bb",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "96215c90",
"metadata": {},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs,\n",
" embeddings,\n",
" opensearch_url=\"http://localhost:9200\",\n",
" engine=\"faiss\",\n",
" space_type=\"innerproduct\",\n",
" ef_construction=256,\n",
" m=48,\n",
")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62a7cea0",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "0d0cd877",
"metadata": {},
"source": [
"## similarity_search using Script Scoring\n",
"\n",
"`similarity_search` using `Script Scoring` with Custom Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0a8e3c0e",
"metadata": {},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False\n",
")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(\n",
" \"What did the president say about Ketanji Brown Jackson\",\n",
" k=1,\n",
" search_type=\"script_scoring\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92bc40db",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "a4af96cc",
"metadata": {},
"source": [
"## similarity_search using Painless Scripting\n",
"\n",
"`similarity_search` using `Painless Scripting` with Custom Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d9f436e",
"metadata": {},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False\n",
")\n",
"filter = {\"bool\": {\"filter\": {\"term\": {\"text\": \"smuggling\"}}}}\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(\n",
" \"What did the president say about Ketanji Brown Jackson\",\n",
" search_type=\"painless_scripting\",\n",
" space_type=\"cosineSimilarity\",\n",
" pre_filter=filter,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ca50bce",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "4f8fb0d0",
"metadata": {},
"source": [
"## Maximum marginal relevance search (MMR)\n",
"If youd like to look up for some similar documents, but youd also like to receive diverse results, MMR is method you should consider. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba85e092",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10, lambda_param=0.5)"
]
},
{
"cell_type": "markdown",
"id": "73264864",
"metadata": {},
"source": [
"## Using a preexisting OpenSearch instance\n",
"\n",
"It's also possible to use a preexisting OpenSearch instance with documents that already have vectors present."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "82a23440",
"metadata": {},
"outputs": [],
"source": [
"# this is just an example, you would need to change these values to point to another opensearch instance\n",
"docsearch = OpenSearchVectorSearch(\n",
" index_name=\"index-*\",\n",
" embedding_function=embeddings,\n",
" opensearch_url=\"http://localhost:9200\",\n",
")\n",
"\n",
"# you can specify custom field names to match the fields you're using to store your embedding, document text value, and metadata\n",
"docs = docsearch.similarity_search(\n",
" \"Who was asking about getting lunch today?\",\n",
" search_type=\"script_scoring\",\n",
" space_type=\"cosinesimil\",\n",
" vector_field=\"message_embedding\",\n",
" text_field=\"message\",\n",
" metadata_field=\"message_metadata\",\n",
")"
]
},
{
"cell_type": "markdown",
"source": [
"## Using AOSS (Amazon OpenSearch Service Serverless)"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# OpenSearch\n",
"\n",
"> [OpenSearch](https://opensearch.org/) is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications licensed under Apache 2.0. `OpenSearch` is a distributed search and analytics engine based on `Apache Lucene`.\n",
"\n",
"\n",
"This notebook shows how to use functionality related to the `OpenSearch` database.\n",
"\n",
"To run, you should have an OpenSearch instance up and running: [see here for an easy Docker installation](https://hub.docker.com/r/opensearchproject/opensearch).\n",
"\n",
"`similarity_search` by default performs the Approximate k-NN Search which uses one of the several algorithms like lucene, nmslib, faiss recommended for\n",
"large datasets. To perform brute force search we have other search methods known as Script Scoring and Painless Scripting.\n",
"Check [this](https://opensearch.org/docs/latest/search-plugins/knn/index/) for more details."
]
},
{
"cell_type": "markdown",
"id": "94963977-9dfc-48b7-872a-53f2947f46c6",
"metadata": {},
"source": [
"## Installation\n",
"Install the Python client."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e606066-9386-4427-8a87-1b93f435c57e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install opensearch-py"
]
},
{
"cell_type": "markdown",
"id": "b1fa637e-4fbf-4d5a-9188-2cad826a193e",
"metadata": {},
"source": [
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "28e5455e-322d-4010-9e3b-491d522ef5db",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import OpenSearchVectorSearch\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "markdown",
"id": "01a9a035",
"metadata": {},
"source": [
"## similarity_search using Approximate k-NN\n",
"\n",
"`similarity_search` using `Approximate k-NN` Search with Custom Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "803fe12b",
"metadata": {},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs, embeddings, opensearch_url=\"http://localhost:9200\"\n",
")\n",
"\n",
"# If using the default Docker installation, use this instantiation instead:\n",
"# docsearch = OpenSearchVectorSearch.from_documents(\n",
"# docs,\n",
"# embeddings,\n",
"# opensearch_url=\"https://localhost:9200\",\n",
"# http_auth=(\"admin\", \"admin\"),\n",
"# use_ssl = False,\n",
"# verify_certs = False,\n",
"# ssl_assert_hostname = False,\n",
"# ssl_show_warn = False,\n",
"# )"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db3fa309",
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query, k=10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c160d5bb",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "96215c90",
"metadata": {},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs,\n",
" embeddings,\n",
" opensearch_url=\"http://localhost:9200\",\n",
" engine=\"faiss\",\n",
" space_type=\"innerproduct\",\n",
" ef_construction=256,\n",
" m=48,\n",
")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62a7cea0",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "0d0cd877",
"metadata": {},
"source": [
"## similarity_search using Script Scoring\n",
"\n",
"`similarity_search` using `Script Scoring` with Custom Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0a8e3c0e",
"metadata": {},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False\n",
")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(\n",
" \"What did the president say about Ketanji Brown Jackson\",\n",
" k=1,\n",
" search_type=\"script_scoring\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92bc40db",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "a4af96cc",
"metadata": {},
"source": [
"## similarity_search using Painless Scripting\n",
"\n",
"`similarity_search` using `Painless Scripting` with Custom Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d9f436e",
"metadata": {},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False\n",
")\n",
"filter = {\"bool\": {\"filter\": {\"term\": {\"text\": \"smuggling\"}}}}\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(\n",
" \"What did the president say about Ketanji Brown Jackson\",\n",
" search_type=\"painless_scripting\",\n",
" space_type=\"cosineSimilarity\",\n",
" pre_filter=filter,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ca50bce",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "4f8fb0d0",
"metadata": {},
"source": [
"## Maximum marginal relevance search (MMR)\n",
"If you\u2019d like to look up for some similar documents, but you\u2019d also like to receive diverse results, MMR is method you should consider. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba85e092",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10, lambda_param=0.5)"
]
},
{
"cell_type": "markdown",
"id": "73264864",
"metadata": {},
"source": [
"## Using a preexisting OpenSearch instance\n",
"\n",
"It's also possible to use a preexisting OpenSearch instance with documents that already have vectors present."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "82a23440",
"metadata": {},
"outputs": [],
"source": [
"# this is just an example, you would need to change these values to point to another opensearch instance\n",
"docsearch = OpenSearchVectorSearch(\n",
" index_name=\"index-*\",\n",
" embedding_function=embeddings,\n",
" opensearch_url=\"http://localhost:9200\",\n",
")\n",
"\n",
"# you can specify custom field names to match the fields you're using to store your embedding, document text value, and metadata\n",
"docs = docsearch.similarity_search(\n",
" \"Who was asking about getting lunch today?\",\n",
" search_type=\"script_scoring\",\n",
" space_type=\"cosinesimil\",\n",
" vector_field=\"message_embedding\",\n",
" text_field=\"message\",\n",
" metadata_field=\"message_metadata\",\n",
")"
]
},
{
"cell_type": "markdown",
"source": [
"## Using AOSS (Amazon OpenSearch Service Serverless)"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
},
"id": "5f590d35"
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# This is just an example to show how to use AOSS with faiss engine and efficient_filter, you need to set proper values.\n",
"\n",
"service = 'aoss' # must set the service as 'aoss'\n",
"region = 'us-east-2'\n",
"credentials = boto3.Session(aws_access_key_id='xxxxxx',aws_secret_access_key='xxxxx').get_credentials()\n",
"awsauth = AWS4Auth('xxxxx', 'xxxxxx', region,service, session_token=credentials.token)\n",
"\n",
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs,\n",
" embeddings,\n",
" opensearch_url=\"host url\",\n",
" http_auth=awsauth,\n",
" timeout = 300,\n",
" use_ssl = True,\n",
" verify_certs = True,\n",
" connection_class = RequestsHttpConnection,\n",
" index_name=\"test-index-using-aoss\",\n",
" engine=\"faiss\",\n",
")\n",
"\n",
"docs = docsearch.similarity_search(\n",
" \"What is feature selection\",\n",
" efficient_filter=filter,\n",
" k=200,\n",
")"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
"id": "de397be7"
},
{
"cell_type": "markdown",
"source": [
"## Using AOS (Amazon OpenSearch Service)"
],
"metadata": {
"collapsed": false
},
"id": "0aa012c8"
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# This is just an example to show how to use AOS , you need to set proper values.\n",
"\n",
"service = 'es' # must set the service as 'es'\n",
"region = 'us-east-2'\n",
"credentials = boto3.Session(aws_access_key_id='xxxxxx',aws_secret_access_key='xxxxx').get_credentials()\n",
"awsauth = AWS4Auth('xxxxx', 'xxxxxx', region,service, session_token=credentials.token)\n",
"\n",
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs,\n",
" embeddings,\n",
" opensearch_url=\"host url\",\n",
" http_auth=awsauth,\n",
" timeout = 300,\n",
" use_ssl = True,\n",
" verify_certs = True,\n",
" connection_class = RequestsHttpConnection,\n",
" index_name=\"test-index\",\n",
")\n",
"\n",
"docs = docsearch.similarity_search(\n",
" \"What is feature selection\",\n",
" k=200,\n",
")"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
"id": "2c47e408"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# This is just an example to show how to use AOSS with faiss engine and efficient_filter, you need to set proper values.\n",
"\n",
"service = 'aoss' # must set the service as 'aoss'\n",
"region = 'us-east-2'\n",
"credentials = boto3.Session(aws_access_key_id='xxxxxx',aws_secret_access_key='xxxxx').get_credentials()\n",
"awsauth = AWS4Auth('xxxxx', 'xxxxxx', region,service, session_token=credentials.token)\n",
"\n",
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs,\n",
" embeddings,\n",
" opensearch_url=\"host url\",\n",
" http_auth=awsauth,\n",
" timeout = 300,\n",
" use_ssl = True,\n",
" verify_certs = True,\n",
" connection_class = RequestsHttpConnection,\n",
" index_name=\"test-index-using-aoss\",\n",
" engine=\"faiss\",\n",
")\n",
"\n",
"docs = docsearch.similarity_search(\n",
" \"What is feature selection\",\n",
" efficient_filter=filter,\n",
" k=200,\n",
")"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Using AOS (Amazon OpenSearch Service)"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# This is just an example to show how to use AOS , you need to set proper values.\n",
"\n",
"service = 'es' # must set the service as 'es'\n",
"region = 'us-east-2'\n",
"credentials = boto3.Session(aws_access_key_id='xxxxxx',aws_secret_access_key='xxxxx').get_credentials()\n",
"awsauth = AWS4Auth('xxxxx', 'xxxxxx', region,service, session_token=credentials.token)\n",
"\n",
"docsearch = OpenSearchVectorSearch.from_documents(\n",
" docs,\n",
" embeddings,\n",
" opensearch_url=\"host url\",\n",
" http_auth=awsauth,\n",
" timeout = 300,\n",
" use_ssl = True,\n",
" verify_certs = True,\n",
" connection_class = RequestsHttpConnection,\n",
" index_name=\"test-index\",\n",
")\n",
"\n",
"docs = docsearch.similarity_search(\n",
" \"What is feature selection\",\n",
" k=200,\n",
")"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,519 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "cf13f702",
"metadata": {},
"source": [
"# Summarization\n",
"\n",
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/summarization.ipynb)\n",
"\n",
"## Use case\n",
"\n",
"--- \n",
"\n",
"Suppose you have a set of documents (PDFs, Notion pages, customer questions, etc.) and you want to summarize the content. \n",
"\n",
"LLMs are a great tool for this given their proficiency in understanding and synthesizing text.\n",
"\n",
"In this walkthrough we'll go over how to perform document summarization using LLMs."
]
},
{
"cell_type": "markdown",
"id": "8e233997",
"metadata": {},
"source": [
"![Image description](/img/summarization_use_case_1.png)"
]
},
{
"cell_type": "markdown",
"id": "4715b4ff",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"## Overview\n",
"\n",
"--- \n",
"\n",
"A central question for building a summarizer is how to pass your documents into the LLM's context window. Two common approaches for this are:\n",
"\n",
"1. `Stuff`: Simply \"stuff\" all your documents into a single prompt. This is the simplest approach (see [here](/docs/modules/chains/document/stuff) for more on the `StuffDocumentsChains`, which is used for this method).\n",
"\n",
"2. `Map-reduce`: Summarize each document on it's own in a \"map\" step and then \"reduce\" the summaries into a final summary (see [here](/docs/modules/chains/document/map_reduce) for more on the `MapReduceDocumentsChain`, which is used for this method)."
]
},
{
"cell_type": "markdown",
"id": "08ec66bc",
"metadata": {},
"source": [
"![Image description](/img/summarization_use_case_2.png)"
]
},
{
"cell_type": "markdown",
"id": "bea785ac",
"metadata": {},
"source": [
"## Quickstart\n",
"\n",
"--- \n",
"\n",
"To give you a sneak preview, either pipeline can be wrapped in a single object: `load_summarize_chain`. \n",
"\n",
"Suppose we want to summarize a blog post. We can create this in a few lines of code.\n",
"\n",
"First set environment variables and install packages:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "578d6a90",
"metadata": {},
"outputs": [],
"source": [
"!pip install openai tiktoken chromadb langchain\n",
"\n",
"# Set env var OPENAI_API_KEY or load from a .env file\n",
"# import dotenv\n",
"\n",
"# dotenv.load_env()"
]
},
{
"cell_type": "markdown",
"id": "36138740",
"metadata": {},
"source": [
"We can use `chain_type=\"stuff\"`, especially if using larger context window models such as:\n",
"\n",
"* 16k token OpenAI `gpt-3.5-turbo-16k` \n",
"* 100k token Anthropic [Claude-2](https://www.anthropic.com/index/claude-2)\n",
"\n",
"We can also supply `chain_type=\"map_reduce\"` or `chain_type=\"refine\"` (read more [here](/docs/modules/chains/document/refine))."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fd271681",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'The article discusses the concept of building autonomous agents powered by large language models (LLMs). It explores the components of such agents, including planning, memory, and tool use. The article provides case studies and proof-of-concept examples of LLM-powered agents in various domains. It also highlights the challenges and limitations of using LLMs in agent systems.'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.document_loaders import WebBaseLoader\n",
"from langchain.chains.summarize import load_summarize_chain\n",
"\n",
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
"docs = loader.load()\n",
"\n",
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo-16k\")\n",
"chain = load_summarize_chain(llm, chain_type=\"stuff\")\n",
"\n",
"chain.run(docs)"
]
},
{
"cell_type": "markdown",
"id": "615b36e1",
"metadata": {},
"source": [
"## Option 1. Stuff\n",
"\n",
"--- \n",
"\n",
"When we use `load_summarize_chain` with `chain_type=\"stuff\"`, we will use the [StuffDocumentsChain](/docs/modules/chains/document/stuff).\n",
"\n",
"The chain will take a list of documents, inserts them all into a prompt, and passes that prompt to an LLM:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ef45585d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The article discusses the concept of building autonomous agents powered by large language models (LLMs). It explores the components of such agents, including planning, memory, and tool use. The article provides case studies and examples of proof-of-concept demos, highlighting the challenges and limitations of LLM-powered agents. It also includes references to related research papers and provides a citation for the article.\n"
]
}
],
"source": [
"from langchain.chains.llm import LLMChain\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.chains.combine_documents.stuff import StuffDocumentsChain\n",
"\n",
"# Define prompt\n",
"prompt_template = \"\"\"Write a concise summary of the following:\n",
"\"{text}\"\n",
"CONCISE SUMMARY:\"\"\"\n",
"prompt = PromptTemplate.from_template(prompt_template)\n",
"\n",
"# Define LLM chain\n",
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo-16k\")\n",
"llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
"\n",
"# Define StuffDocumentsChain\n",
"stuff_chain = StuffDocumentsChain(\n",
" llm_chain=llm_chain, document_variable_name=\"text\"\n",
")\n",
"\n",
"docs = loader.load()\n",
"print(stuff_chain.run(docs))"
]
},
{
"cell_type": "markdown",
"id": "4e4e4a43",
"metadata": {},
"source": [
"Great! We can see that we reproduce the earlier result using the `load_summarize_chain`.\n",
"\n",
"### Go deeper\n",
"\n",
"* You can easily customize the prompt. \n",
"* You can easily try different LLMs, (e.g., [Claude](/docs/integrations/chat/anthropic)) via the `llm` parameter."
]
},
{
"cell_type": "markdown",
"id": "ad6cabee",
"metadata": {},
"source": [
"## Option 2. Map-Reduce\n",
"\n",
"---\n",
"\n",
"Let's unpack the map reduce approach. For this, we'll first map each document to an individual summary using an `LLMChain`. Then we'll use a `ReduceDocumentsChain` to combine those summaries into a single global summary.\n",
" \n",
"First, we specfy the LLMChain to use for mapping each document to an individual summary:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a1e6773c",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.mapreduce import MapReduceChain\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.chains import ReduceDocumentsChain, MapReduceDocumentsChain\n",
"\n",
"llm = ChatOpenAI(temperature=0)\n",
"\n",
"# Map\n",
"map_template = \"\"\"The following is a set of documents\n",
"{docs}\n",
"Based on this list of docs, please identify the main themes \n",
"Helpful Answer:\"\"\"\n",
"map_prompt = PromptTemplate.from_template(map_template)\n",
"map_chain = LLMChain(llm=llm, prompt=map_prompt)"
]
},
{
"cell_type": "markdown",
"id": "bee3c331",
"metadata": {},
"source": [
"The `ReduceDocumentsChain` handles taking the document mapping results and reducing them into a single output. It wraps a generic `CombineDocumentsChain` (like `StuffDocumentsChain`) but adds the ability to collapse documents before passing it to the `CombineDocumentsChain` if their cumulative size exceeds `token_max`. In this example, we can actually re-use our chain for combining our docs to also collapse our docs.\n",
"\n",
"So if the cumulative number of tokens in our mapped documents exceeds 4000 tokens, then we'll recursively pass in the documents in batches of < 4000 tokens to our `StuffDocumentsChain` to create batched summaries. And once those batched summaries are cumulatively less than 4000 tokens, we'll pass them all one last time to the `StuffDocumentsChain` to create the final summary."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1edb1b0d",
"metadata": {},
"outputs": [],
"source": [
"# Reduce\n",
"reduce_template = \"\"\"The following is set of summaries:\n",
"{doc_summaries}\n",
"Take these and distill it into a final, consolidated summary of the main themes. \n",
"Helpful Answer:\"\"\"\n",
"reduce_prompt = PromptTemplate.from_template(reduce_template)\n",
"reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)\n",
"\n",
"# Takes a list of documents, combines them into a single string, and passes this to an LLMChain\n",
"combine_documents_chain = StuffDocumentsChain(\n",
" llm_chain=reduce_chain, document_variable_name=\"doc_summaries\"\n",
")\n",
"\n",
"# Combines and iteravely reduces the mapped documents\n",
"reduce_documents_chain = ReduceDocumentsChain(\n",
" # This is final chain that is called.\n",
" combine_documents_chain=combine_documents_chain,\n",
" # If documents exceed context for `StuffDocumentsChain`\n",
" collapse_documents_chain=combine_documents_chain,\n",
" # The maximum number of tokens to group documents into.\n",
" token_max=4000,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "fdb5ae1a",
"metadata": {},
"source": [
"Combining our map and reduce chains into one:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "22f1cdc2",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Created a chunk of size 1003, which is longer than the specified 1000\n"
]
}
],
"source": [
"# Combining documents by mapping a chain over them, then combining results\n",
"map_reduce_chain = MapReduceDocumentsChain(\n",
" # Map chain\n",
" llm_chain=map_chain,\n",
" # Reduce chain\n",
" reduce_documents_chain=reduce_documents_chain,\n",
" # The variable name in the llm_chain to put the documents in\n",
" document_variable_name=\"docs\",\n",
" # Return the results of the map steps in the output\n",
" return_intermediate_steps=False,\n",
")\n",
"\n",
"text_splitter = CharacterTextSplitter.from_tiktoken_encoder(\n",
" chunk_size=1000, chunk_overlap=0\n",
")\n",
"split_docs = text_splitter.split_documents(docs)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "c7afb8c3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The main themes identified in the provided set of documents are:\n",
"\n",
"1. LLM-powered autonomous agent systems: The documents discuss the concept of building autonomous agents with large language models (LLMs) as the core controller. They explore the potential of LLMs beyond content generation and present them as powerful problem solvers.\n",
"\n",
"2. Components of the agent system: The documents outline the key components of LLM-powered agent systems, including planning, memory, and tool use. Each component is described in detail, highlighting its role in enhancing the agent's capabilities.\n",
"\n",
"3. Planning and task decomposition: The planning component focuses on task decomposition and self-reflection. The agent breaks down complex tasks into smaller subgoals and learns from past actions to improve future results.\n",
"\n",
"4. Memory and learning: The memory component includes short-term memory for in-context learning and long-term memory for retaining and recalling information over extended periods. The use of external vector stores for fast retrieval is also mentioned.\n",
"\n",
"5. Tool use and external APIs: The agent learns to utilize external APIs for accessing additional information, code execution, and proprietary sources. This enhances the agent's knowledge and problem-solving abilities.\n",
"\n",
"6. Case studies and proof-of-concept examples: The documents provide case studies and examples to demonstrate the application of LLM-powered agents in scientific discovery, generative simulations, and other domains. These examples serve as proof-of-concept for the effectiveness of the agent system.\n",
"\n",
"7. Challenges and limitations: The documents mention challenges associated with building LLM-powered autonomous agents, such as the limitations of finite context length, difficulties in long-term planning, and reliability issues with natural language interfaces.\n",
"\n",
"8. Citation and references: The documents include a citation and reference section for acknowledging the sources and inspirations for the concepts discussed.\n",
"\n",
"Overall, the main themes revolve around the development and capabilities of LLM-powered autonomous agent systems, including their components, planning and task decomposition, memory and learning mechanisms, tool use and external APIs, case studies and proof-of-concept examples, challenges and limitations, and the importance of proper citation and references.\n"
]
}
],
"source": [
"print(map_reduce_chain.run(split_docs))"
]
},
{
"cell_type": "markdown",
"id": "e62c21cf",
"metadata": {},
"source": [
"### Go deeper\n",
" \n",
"**Customization** \n",
"\n",
"* As shown above, you can customize the LLMs and prompts for map and reduce stages.\n",
"\n",
"**Real-world use-case**\n",
"\n",
"* See [this blog post](https://blog.langchain.dev/llms-to-improve-documentation/) case-study on analyzing user interactions (questions about LangChain documentation)! "
]
},
{
"cell_type": "markdown",
"id": "f08ff365",
"metadata": {},
"source": [
"## Option 3. Refine\n",
"\n",
"--- \n",
" \n",
"[Refine](/docs/modules/chains/document/refine) is similar to map-reduce:\n",
"\n",
"> The refine documents chain constructs a response by looping over the input documents and iteratively updating its answer. For each document, it passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to get a new answer.\n",
"\n",
"This can be easily run with the `chain_type=\"refine\"` specified."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "de1dc10e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'The GPT-Engineer project aims to create a repository of code for specific tasks specified in natural language. It involves breaking down tasks into smaller components and seeking clarification from the user when needed. The project emphasizes the importance of implementing every detail of the architecture as code and provides guidelines for file organization, code structure, and dependencies. However, there are challenges in long-term planning and task decomposition, as well as the reliability of the natural language interface. The system has limited communication bandwidth and struggles to adjust plans when faced with unexpected errors. The reliability of model outputs is questionable, as formatting errors and rebellious behavior can occur. The conversation also includes instructions for writing the code, including laying out the core classes, functions, and methods, and providing the code in a markdown code block format. The user is reminded to ensure that the code is fully functional and follows best practices for file naming, imports, and types. The project is powered by LLM (Large Language Models) and incorporates prompting techniques from various research papers.'"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
"chain.run(split_docs)"
]
},
{
"cell_type": "markdown",
"id": "5b46f44d",
"metadata": {},
"source": [
"It's also possible to supply a prompt and return intermediate steps."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "f86c8072",
"metadata": {},
"outputs": [],
"source": [
"prompt_template = \"\"\"Write a concise summary of the following:\n",
"{text}\n",
"CONCISE SUMMARY:\"\"\"\n",
"prompt = PromptTemplate.from_template(prompt_template)\n",
"\n",
"refine_template = (\n",
" \"Your job is to produce a final summary\\n\"\n",
" \"We have provided an existing summary up to a certain point: {existing_answer}\\n\"\n",
" \"We have the opportunity to refine the existing summary\"\n",
" \"(only if needed) with some more context below.\\n\"\n",
" \"------------\\n\"\n",
" \"{text}\\n\"\n",
" \"------------\\n\"\n",
" \"Given the new context, refine the original summary in Italian\"\n",
" \"If the context isn't useful, return the original summary.\"\n",
")\n",
"refine_prompt = PromptTemplate.from_template(refine_template)\n",
"chain = load_summarize_chain(\n",
" llm=llm,\n",
" chain_type=\"refine\",\n",
" question_prompt=prompt,\n",
" refine_prompt=refine_prompt,\n",
" return_intermediate_steps=True,\n",
" input_key=\"input_documents\",\n",
" output_key=\"output_text\",\n",
")\n",
"result = chain({\"input_documents\": split_docs}, return_only_outputs=True)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "d9600b67-79d4-4f85-aba2-9fe81fa29f49",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"L'articolo discute il concetto di costruire agenti autonomi utilizzando LLM (large language model) come controller principale. Esplora i diversi componenti di un sistema di agenti alimentato da LLM, inclusa la pianificazione, la memoria e l'uso di strumenti. Dimostrazioni di concetto come AutoGPT mostrano la possibilità di creare agenti autonomi con LLM come controller principale. Approcci come Chain of Thought, Tree of Thoughts, LLM+P, ReAct e Reflexion consentono agli agenti autonomi di pianificare, riflettere su se stessi e migliorare iterativamente. Tuttavia, ci sono sfide legate alla lunghezza del contesto, alla pianificazione a lungo termine e alla decomposizione delle attività. Inoltre, l'affidabilità dell'interfaccia di linguaggio naturale tra LLM e componenti esterni come la memoria e gli strumenti è incerta. Nonostante ciò, l'uso di LLM come router per indirizzare le richieste ai moduli esperti più adatti è stato proposto come architettura neuro-simbolica per agenti autonomi nel sistema MRKL. L'articolo fa riferimento a diverse pubblicazioni che approfondiscono l'argomento, tra cui Chain of Thought, Tree of Thoughts, LLM+P, ReAct, Reflexion, e MRKL Systems.\n"
]
}
],
"source": [
"print(result[\"output_text\"])"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "5f91a8eb-daa5-4191-ace4-01765801db3e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This article discusses the concept of building autonomous agents using LLM (large language model) as the core controller. The article explores the different components of an LLM-powered agent system, including planning, memory, and tool use. It also provides examples of proof-of-concept demos and highlights the potential of LLM as a general problem solver.\n",
"\n",
"Questo articolo discute del concetto di costruire agenti autonomi utilizzando LLM (large language model) come controller principale. L'articolo esplora i diversi componenti di un sistema di agenti alimentato da LLM, inclusa la pianificazione, la memoria e l'uso degli strumenti. Vengono anche forniti esempi di dimostrazioni di proof-of-concept e si evidenzia il potenziale di LLM come risolutore generale di problemi. Inoltre, vengono presentati approcci come Chain of Thought, Tree of Thoughts, LLM+P, ReAct e Reflexion che consentono agli agenti autonomi di pianificare, riflettere su se stessi e migliorare iterativamente.\n",
"\n",
"Questo articolo discute del concetto di costruire agenti autonomi utilizzando LLM (large language model) come controller principale. L'articolo esplora i diversi componenti di un sistema di agenti alimentato da LLM, inclusa la pianificazione, la memoria e l'uso degli strumenti. Vengono anche forniti esempi di dimostrazioni di proof-of-concept e si evidenzia il potenziale di LLM come risolutore generale di problemi. Inoltre, vengono presentati approcci come Chain of Thought, Tree of Thoughts, LLM+P, ReAct e Reflexion che consentono agli agenti autonomi di pianificare, riflettere su se stessi e migliorare iterativamente. Il nuovo contesto riguarda l'approccio Chain of Hindsight (CoH) che permette al modello di migliorare autonomamente i propri output attraverso un processo di apprendimento supervisionato. Viene anche presentato l'approccio Algorithm Distillation (AD) che applica lo stesso concetto alle traiettorie di apprendimento per compiti di reinforcement learning.\n"
]
}
],
"source": [
"print(\"\\n\\n\".join(result[\"intermediate_steps\"][:3]))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0ddd522e-30dc-4f6a-b993-c4f97e656c4f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -1,22 +0,0 @@
---
sidebar_position: 5
---
# Summarization
Summarization involves creating a smaller summary of multiple longer documents.
This can be useful for distilling long documents into the core pieces of information.
The recommended way to get started using a summarization chain is:
```python
from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)
```
The following resources exist:
- [Summarization notebook](/docs/use_cases/summarization/summarize.html): A notebook walking through how to accomplish this task.
Additional related resources include:
- [Modules for working with documents](/docs/modules/data_connection): Core components for working with documents.

@ -1,385 +0,0 @@
## Prepare Data
First we prepare the data. For this example we create multiple documents from one long one, but these documents could be fetched in any manner (the point of this notebook to highlight what to do AFTER you fetch the documents).
```python
from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
llm = OpenAI(temperature=0)
text_splitter = CharacterTextSplitter()
```
```python
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)
```
```python
from langchain.docstore.document import Document
docs = [Document(page_content=t) for t in texts[:3]]
```
## Quickstart
If you just want to get started as quickly as possible, this is the recommended way to do it:
```python
from langchain.chains.summarize import load_summarize_chain
```
```python
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)
```
<CodeOutputBlock lang="python">
```
' In response to Russian aggression in Ukraine, the United States and its allies are taking action to hold Putin accountable, including economic sanctions, asset seizures, and military assistance. The US is also providing economic and humanitarian aid to Ukraine, and has passed the American Rescue Plan and the Bipartisan Infrastructure Law to help struggling families and create jobs. The US remains unified and determined to protect Ukraine and the free world.'
```
</CodeOutputBlock>
If you want more control and understanding over what is happening, please see the information below.
## The `stuff` Chain
This sections shows results of using the `stuff` Chain to do summarization.
```python
chain = load_summarize_chain(llm, chain_type="stuff")
```
```python
chain.run(docs)
```
<CodeOutputBlock lang="python">
```
' In his speech, President Biden addressed the crisis in Ukraine, the American Rescue Plan, and the Bipartisan Infrastructure Law. He discussed the need to invest in America, educate Americans, and build the economy from the bottom up. He also announced the release of 60 million barrels of oil from reserves around the world, and the creation of a dedicated task force to go after the crimes of Russian oligarchs. He concluded by emphasizing the need to Buy American and use taxpayer dollars to rebuild America.'
```
</CodeOutputBlock>
**Custom Prompts**
You can also use your own prompts with this chain. In this example, we will respond in Italian.
```python
prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY IN ITALIAN:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(llm, chain_type="stuff", prompt=PROMPT)
chain.run(docs)
```
<CodeOutputBlock lang="python">
```
"\n\nIn questa serata, il Presidente degli Stati Uniti ha annunciato una serie di misure per affrontare la crisi in Ucraina, causata dall'aggressione di Putin. Ha anche annunciato l'invio di aiuti economici, militari e umanitari all'Ucraina. Ha anche annunciato che gli Stati Uniti e i loro alleati stanno imponendo sanzioni economiche a Putin e stanno rilasciando 60 milioni di barili di petrolio dalle riserve di tutto il mondo. Inoltre, ha annunciato che il Dipartimento di Giustizia degli Stati Uniti sta creando una task force dedicata ai crimini degli oligarchi russi. Il Presidente ha anche annunciato l'approvazione della legge bipartitica sull'infrastruttura, che prevede investimenti per la ricostruzione dell'America. Questo porterà a creare posti"
```
</CodeOutputBlock>
## The `map_reduce` Chain
This sections shows results of using the `map_reduce` Chain to do summarization.
```python
chain = load_summarize_chain(llm, chain_type="map_reduce")
```
```python
chain.run(docs)
```
<CodeOutputBlock lang="python">
```
" In response to Russia's aggression in Ukraine, the United States and its allies have imposed economic sanctions and are taking other measures to hold Putin accountable. The US is also providing economic and military assistance to Ukraine, protecting NATO countries, and releasing oil from its Strategic Petroleum Reserve. President Biden and Vice President Harris have passed legislation to help struggling families and rebuild America's infrastructure."
```
</CodeOutputBlock>
**Intermediate Steps**
We can also return the intermediate steps for `map_reduce` chains, should we want to inspect them. This is done with the `return_map_steps` variable.
```python
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True)
```
```python
chain({"input_documents": docs}, return_only_outputs=True)
```
<CodeOutputBlock lang="python">
```
{'map_steps': [" In response to Russia's aggression in Ukraine, the United States has united with other freedom-loving nations to impose economic sanctions and hold Putin accountable. The U.S. Department of Justice is also assembling a task force to go after the crimes of Russian oligarchs and seize their ill-gotten gains.",
' The United States and its European allies are taking action to punish Russia for its invasion of Ukraine, including seizing assets, closing off airspace, and providing economic and military assistance to Ukraine. The US is also mobilizing forces to protect NATO countries and has released 30 million barrels of oil from its Strategic Petroleum Reserve to help blunt gas prices. The world is uniting in support of Ukraine and democracy, and the US stands with its Ukrainian-American citizens.',
" President Biden and Vice President Harris ran for office with a new economic vision for America, and have since passed the American Rescue Plan and the Bipartisan Infrastructure Law to help struggling families and rebuild America's infrastructure. This includes creating jobs, modernizing roads, airports, ports, and waterways, replacing lead pipes, providing affordable high-speed internet, and investing in American products to support American jobs."],
'output_text': " In response to Russia's aggression in Ukraine, the United States and its allies have imposed economic sanctions and are taking other measures to hold Putin accountable. The US is also providing economic and military assistance to Ukraine, protecting NATO countries, and passing legislation to help struggling families and rebuild America's infrastructure. The world is uniting in support of Ukraine and democracy, and the US stands with its Ukrainian-American citizens."}
```
</CodeOutputBlock>
**Custom Prompts**
You can also use your own prompts with this chain. In this example, we will respond in Italian.
```python
prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY IN ITALIAN:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True, map_prompt=PROMPT, combine_prompt=PROMPT)
chain({"input_documents": docs}, return_only_outputs=True)
```
<CodeOutputBlock lang="python">
```
{'intermediate_steps': ["\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Gli Stati Uniti e i loro alleati stanno ora imponendo sanzioni economiche a Putin e stanno tagliando l'accesso della Russia alla tecnologia. Il Dipartimento di Giustizia degli Stati Uniti sta anche creando una task force dedicata per andare dopo i crimini degli oligarchi russi.",
"\n\nStiamo unendo le nostre forze con quelle dei nostri alleati europei per sequestrare yacht, appartamenti di lusso e jet privati di Putin. Abbiamo chiuso lo spazio aereo americano ai voli russi e stiamo fornendo più di un miliardo di dollari in assistenza all'Ucraina. Abbiamo anche mobilitato le nostre forze terrestri, aeree e navali per proteggere i paesi della NATO. Abbiamo anche rilasciato 60 milioni di barili di petrolio dalle riserve di tutto il mondo, di cui 30 milioni dalla nostra riserva strategica di petrolio. Stiamo affrontando una prova reale e ci vorrà del tempo, ma alla fine Putin non riuscirà a spegnere l'amore dei popoli per la libertà.",
"\n\nIl Presidente Biden ha lottato per passare l'American Rescue Plan per aiutare le persone che soffrivano a causa della pandemia. Il piano ha fornito sollievo economico immediato a milioni di americani, ha aiutato a mettere cibo sulla loro tavola, a mantenere un tetto sopra le loro teste e a ridurre il costo dell'assicurazione sanitaria. Il piano ha anche creato più di 6,5 milioni di nuovi posti di lavoro, il più alto numero di posti di lavoro creati in un anno nella storia degli Stati Uniti. Il Presidente Biden ha anche firmato la legge bipartitica sull'infrastruttura, la più ampia iniziativa di ricostruzione della storia degli Stati Uniti. Il piano prevede di modernizzare le strade, gli aeroporti, i porti e le vie navigabili in"],
'output_text': "\n\nIl Presidente Biden sta lavorando per aiutare le persone che soffrono a causa della pandemia attraverso l'American Rescue Plan e la legge bipartitica sull'infrastruttura. Gli Stati Uniti e i loro alleati stanno anche imponendo sanzioni economiche a Putin e tagliando l'accesso della Russia alla tecnologia. Stanno anche sequestrando yacht, appartamenti di lusso e jet privati di Putin e fornendo più di un miliardo di dollari in assistenza all'Ucraina. Alla fine, Putin non riuscirà a spegnere l'amore dei popoli per la libertà."}
```
</CodeOutputBlock>
## The custom `MapReduceChain`
**Multi input prompt**
You can also use prompt with multi input. In this example, we will use a MapReduce chain to answer specific question about our code.
```python
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import ReduceDocumentsChain
map_template_string = """Give the following python code information, generate a description that explains what the code does and also mention the time complexity.
Code:
{code}
Return the the description in the following format:
name of the function: description of the function
"""
reduce_template_string = """Given the following python function names and descriptions, answer the following question
{code_description}
Question: {question}
Answer:
"""
# Prompt to use in map and reduce stages
MAP_PROMPT = PromptTemplate(input_variables=["code"], template=map_template_string)
REDUCE_PROMPT = PromptTemplate(input_variables=["code_description", "question"], template=reduce_template_string)
# LLM to use in map and reduce stages
llm = OpenAI()
map_llm_chain = LLMChain(llm=llm, prompt=MAP_PROMPT)
reduce_llm_chain = LLMChain(llm=llm, prompt=REDUCE_PROMPT)
# Takes a list of documents and combines them into a single string
combine_documents_chain = StuffDocumentsChain(
llm_chain=reduce_llm_chain,
document_variable_name="code_description",
)
# Combines and iteravely reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
# This is final chain that is called.
combine_documents_chain=combine_documents_chain,
# If documents exceed context for `combine_documents_chain`
collapse_documents_chain=combine_documents_chain,
# The maximum number of tokens to group documents into
token_max=3000)
# Combining documents by mapping a chain over them, then combining results with reduce chain
combine_documents = MapReduceDocumentsChain(
# Map chain
llm_chain=map_llm_chain,
# Reduce chain
reduce_documents_chain=reduce_documents_chain,
# The variable name in the llm_chain to put the documents in
document_variable_name="code",
)
map_reduce = MapReduceChain(
combine_documents_chain=combine_documents,
text_splitter=CharacterTextSplitter(separator="\n##\n", chunk_size=100, chunk_overlap=0),
)
```
```python
code = """
def bubblesort(list):
for iter_num in range(len(list)-1,0,-1):
for idx in range(iter_num):
if list[idx]>list[idx+1]:
temp = list[idx]
list[idx] = list[idx+1]
list[idx+1] = temp
return list
##
def insertion_sort(InputList):
for i in range(1, len(InputList)):
j = i-1
nxt_element = InputList[i]
while (InputList[j] > nxt_element) and (j >= 0):
InputList[j+1] = InputList[j]
j=j-1
InputList[j+1] = nxt_element
return InputList
##
def shellSort(input_list):
gap = len(input_list) // 2
while gap > 0:
for i in range(gap, len(input_list)):
temp = input_list[i]
j = i
while j >= gap and input_list[j - gap] > temp:
input_list[j] = input_list[j - gap]
j = j-gap
input_list[j] = temp
gap = gap//2
return input_list
"""
```
```python
map_reduce.run(input_text=code, question="Which function has a better time complexity?")
```
<CodeOutputBlock lang="python">
```
Created a chunk of size 247, which is longer than the specified 100
Created a chunk of size 267, which is longer than the specified 100
'shellSort has a better time complexity than both bubblesort and insertion_sort, as it has a time complexity of O(n^2), while the other two have a time complexity of O(n^2).'
```
</CodeOutputBlock>
## The `refine` Chain
This sections shows results of using the `refine` Chain to do summarization.
```python
chain = load_summarize_chain(llm, chain_type="refine")
chain.run(docs)
```
<CodeOutputBlock lang="python">
```
"\n\nIn response to Russia's aggression in Ukraine, the United States has united with other freedom-loving nations to impose economic sanctions and hold Putin accountable. The U.S. Department of Justice is also assembling a task force to go after the crimes of Russian oligarchs and seize their ill-gotten gains. We are joining with our European allies to find and seize the assets of Russian oligarchs, including yachts, luxury apartments, and private jets. The U.S. is also closing off American airspace to all Russian flights, further isolating Russia and adding an additional squeeze on their economy. The U.S. and its allies are providing support to the Ukrainians in their fight for freedom, including military, economic, and humanitarian assistance. The U.S. is also mobilizing ground forces, air squadrons, and ship deployments to protect NATO countries. The U.S. and its allies are also releasing 60 million barrels of oil from reserves around the world, with the U.S. contributing 30 million barrels from its own Strategic Petroleum Reserve. In addition, the U.S. has passed the American Rescue Plan to provide immediate economic relief for tens of millions of Americans, and the Bipartisan Infrastructure Law to rebuild America and create jobs. This investment will"
```
</CodeOutputBlock>
**Intermediate Steps**
We can also return the intermediate steps for `refine` chains, should we want to inspect them. This is done with the `return_refine_steps` variable.
```python
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="refine", return_intermediate_steps=True)
chain({"input_documents": docs}, return_only_outputs=True)
```
<CodeOutputBlock lang="python">
```
{'refine_steps': [" In response to Russia's aggression in Ukraine, the United States has united with other freedom-loving nations to impose economic sanctions and hold Putin accountable. The U.S. Department of Justice is also assembling a task force to go after the crimes of Russian oligarchs and seize their ill-gotten gains.",
"\n\nIn response to Russia's aggression in Ukraine, the United States has united with other freedom-loving nations to impose economic sanctions and hold Putin accountable. The U.S. Department of Justice is also assembling a task force to go after the crimes of Russian oligarchs and seize their ill-gotten gains. We are joining with our European allies to find and seize the assets of Russian oligarchs, including yachts, luxury apartments, and private jets. The U.S. is also closing off American airspace to all Russian flights, further isolating Russia and adding an additional squeeze on their economy. The U.S. and its allies are providing support to the Ukrainians in their fight for freedom, including military, economic, and humanitarian assistance. The U.S. is also mobilizing ground forces, air squadrons, and ship deployments to protect NATO countries. The U.S. and its allies are also releasing 60 million barrels of oil from reserves around the world, with the U.S. contributing 30 million barrels from its own Strategic Petroleum Reserve. Putin's war on Ukraine has left Russia weaker and the rest of the world stronger, with the world uniting in support of democracy and peace.",
"\n\nIn response to Russia's aggression in Ukraine, the United States has united with other freedom-loving nations to impose economic sanctions and hold Putin accountable. The U.S. Department of Justice is also assembling a task force to go after the crimes of Russian oligarchs and seize their ill-gotten gains. We are joining with our European allies to find and seize the assets of Russian oligarchs, including yachts, luxury apartments, and private jets. The U.S. is also closing off American airspace to all Russian flights, further isolating Russia and adding an additional squeeze on their economy. The U.S. and its allies are providing support to the Ukrainians in their fight for freedom, including military, economic, and humanitarian assistance. The U.S. is also mobilizing ground forces, air squadrons, and ship deployments to protect NATO countries. The U.S. and its allies are also releasing 60 million barrels of oil from reserves around the world, with the U.S. contributing 30 million barrels from its own Strategic Petroleum Reserve. In addition, the U.S. has passed the American Rescue Plan to provide immediate economic relief for tens of millions of Americans, and the Bipartisan Infrastructure Law to rebuild America and create jobs. This includes investing"],
'output_text': "\n\nIn response to Russia's aggression in Ukraine, the United States has united with other freedom-loving nations to impose economic sanctions and hold Putin accountable. The U.S. Department of Justice is also assembling a task force to go after the crimes of Russian oligarchs and seize their ill-gotten gains. We are joining with our European allies to find and seize the assets of Russian oligarchs, including yachts, luxury apartments, and private jets. The U.S. is also closing off American airspace to all Russian flights, further isolating Russia and adding an additional squeeze on their economy. The U.S. and its allies are providing support to the Ukrainians in their fight for freedom, including military, economic, and humanitarian assistance. The U.S. is also mobilizing ground forces, air squadrons, and ship deployments to protect NATO countries. The U.S. and its allies are also releasing 60 million barrels of oil from reserves around the world, with the U.S. contributing 30 million barrels from its own Strategic Petroleum Reserve. In addition, the U.S. has passed the American Rescue Plan to provide immediate economic relief for tens of millions of Americans, and the Bipartisan Infrastructure Law to rebuild America and create jobs. This includes investing"}
```
</CodeOutputBlock>
**Custom Prompts**
You can also use your own prompts with this chain. In this example, we will respond in Italian.
```python
prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY IN ITALIAN:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
refine_template = (
"Your job is to produce a final summary\n"
"We have provided an existing summary up to a certain point: {existing_answer}\n"
"We have the opportunity to refine the existing summary"
"(only if needed) with some more context below.\n"
"------------\n"
"{text}\n"
"------------\n"
"Given the new context, refine the original summary in Italian"
"If the context isn't useful, return the original summary."
)
refine_prompt = PromptTemplate(
input_variables=["existing_answer", "text"],
template=refine_template,
)
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="refine", return_intermediate_steps=True, question_prompt=PROMPT, refine_prompt=refine_prompt)
chain({"input_documents": docs}, return_only_outputs=True)
```
<CodeOutputBlock lang="python">
```
{'intermediate_steps': ["\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia e bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi.",
"\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare,",
"\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare."],
'output_text': "\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare."}
```
</CodeOutputBlock>
Loading…
Cancel
Save