mirror of
https://github.com/openai/openai-cookbook
synced 2024-11-15 18:13:18 +00:00
1162 lines
56 KiB
Plaintext
1162 lines
56 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "30995a82",
|
||
"metadata": {},
|
||
"source": [
|
||
"# How to build a tool-using agent with LangChain\n",
|
||
"\n",
|
||
"This notebook takes you through how to use LangChain to augment an OpenAI model with access to external tools. In particular, you'll be able to create LLM agents that use custom tools to answer user queries.\n",
|
||
"\n",
|
||
"\n",
|
||
"## What is Langchain?\n",
|
||
"[LangChain](https://python.langchain.com/en/latest/index.html) is a framework for developing applications powered by language models. Their framework enables you to build layered LLM-powered applications that are context-aware and able to interact dynamically with their environment as agents, leading to simplified code for you and a more dynamic user experience for your customers.\n",
|
||
"\n",
|
||
"## Why do LLMs need to use Tools?\n",
|
||
"One of the most common challenges with LLMs is overcoming the lack of recency and specificity in their training data - answers can be out of date, and they are prone to hallucinations given the huge variety in their knowledge base. Tools are a great method of allowing an LLM to answer within a controlled context that draws on your existing knowledge bases and internal APIs - instead of trying to prompt engineer the LLM all the way to your intended answer, you allow it access to tools that it calls on dynamically for info, parses, and serves to customer. \n",
|
||
"\n",
|
||
"Providing LLMs access to tools can enable them to answer questions with context directly from search engines, APIs or your own databases. Instead of answering directly, an LLM with access to tools can perform intermediate steps to gather relevant information. Tools can also be used in combination. [For example](https://python.langchain.com/en/latest/modules/agents/agents/examples/mrkl_chat.html), a language model can be made to use a search tool to lookup quantitative information and a calculator to execute calculations.\n",
|
||
"\n",
|
||
"## Notebook Sections\n",
|
||
"\n",
|
||
"- **Setup:** Import packages and connect to a Pinecone vector database.\n",
|
||
"- **LLM Agent:** Build an agent that leverages a modified version of the [ReAct](https://react-lm.github.io/) framework to do chain-of-thought reasoning.\n",
|
||
"- **LLM Agent with History:** Provide the LLM with access to previous steps in the conversation.\n",
|
||
"- **Knowledge Base:** Create a knowledge base of \"Stuff You Should Know\" podcast episodes, to be accessed through a tool.\n",
|
||
"- **LLM Agent with Tools:** Extend the agent with access to multiple tools and test that it uses them to answer questions."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 32,
|
||
"id": "9c069980",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"The autoreload extension is already loaded. To reload it, use:\n",
|
||
" %reload_ext autoreload\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"%load_ext autoreload\n",
|
||
"%autoreload 2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "7a254003",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Setup\n",
|
||
"\n",
|
||
"Import libraries and set up a connection to a [Pinecone](https://www.pinecone.io) vector database.\n",
|
||
"\n",
|
||
"You can substitute Pinecone for any other vectorstore or database - there are a [selection](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html) that are supported by Langchain natively, while other connectors will need to be developed yourself."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "f55905f7",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"!pip install openai\n",
|
||
"!pip install pinecone-client\n",
|
||
"!pip install pandas\n",
|
||
"!pip install typing\n",
|
||
"!pip install tqdm\n",
|
||
"!pip install langchain\n",
|
||
"!pip install wget"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "18be3d9f",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import datetime\n",
|
||
"import json\n",
|
||
"import openai\n",
|
||
"import os\n",
|
||
"import pandas as pd\n",
|
||
"import pinecone\n",
|
||
"import re\n",
|
||
"from tqdm.auto import tqdm\n",
|
||
"from typing import List, Union\n",
|
||
"import zipfile\n",
|
||
"\n",
|
||
"# Langchain imports\n",
|
||
"from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser\n",
|
||
"from langchain.prompts import BaseChatPromptTemplate, ChatPromptTemplate\n",
|
||
"from langchain import SerpAPIWrapper, LLMChain\n",
|
||
"from langchain.schema import AgentAction, AgentFinish, HumanMessage, SystemMessage\n",
|
||
"# LLM wrapper\n",
|
||
"from langchain.chat_models import ChatOpenAI\n",
|
||
"from langchain import OpenAI\n",
|
||
"# Conversational memory\n",
|
||
"from langchain.memory import ConversationBufferWindowMemory\n",
|
||
"# Embeddings and vectorstore\n",
|
||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||
"from langchain.vectorstores import Pinecone\n",
|
||
"\n",
|
||
"# Vectorstore Index\n",
|
||
"index_name = 'podcasts'"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "9c85f298",
|
||
"metadata": {},
|
||
"source": [
|
||
"For acquiring an API key to connect with Pinecone, you can set up a [free account](https://app.pinecone.io/) and store it in the `api_key` variable below or in your environment variables under `PINECONE_API_KEY`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "af825644",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"api_key = os.getenv(\"PINECONE_API_KEY\") or \"PINECONE_API_KEY\"\n",
|
||
"\n",
|
||
"# find environment next to your API key in the Pinecone console\n",
|
||
"env = os.getenv(\"PINECONE_ENVIRONMENT\") or \"PINECONE_ENVIRONMENT\"\n",
|
||
"\n",
|
||
"pinecone.init(api_key=api_key, environment=env)\n",
|
||
"pinecone.whoami()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "c0518596",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"['podcasts']"
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pinecone.list_indexes()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "7e7644cd",
|
||
"metadata": {},
|
||
"source": [
|
||
"Run this code block if you want to clear the index, or if the index doesn't exist yet\n",
|
||
"\n",
|
||
"```\n",
|
||
"# Check whether the index with the same name already exists - if so, delete it\n",
|
||
"if index_name in pinecone.list_indexes():\n",
|
||
" pinecone.delete_index(index_name)\n",
|
||
" \n",
|
||
"# Creates new index\n",
|
||
"pinecone.create_index(name=index_name, dimension=1536)\n",
|
||
"index = pinecone.Index(index_name=index_name)\n",
|
||
"\n",
|
||
"# Confirm our index was created\n",
|
||
"pinecone.list_indexes()\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0eb63e37",
|
||
"metadata": {},
|
||
"source": [
|
||
"## LLM Agent\n",
|
||
"\n",
|
||
"An [LLM agent](https://python.langchain.com/en/latest/modules/agents/agents/custom_llm_agent.html) in Langchain has many configurable components, which are detailed in the Langchain documentation.\n",
|
||
"\n",
|
||
"We'll employ a few of the core concepts to make an agent that talks in the way we want, can use tools to answer questions, and uses the appropriate language model to power the conversation.\n",
|
||
"- **Prompt Template:** The input template to control the LLM's behaviour and how it accepts inputs and produces outputs - this is the brain that drives your application ([docs](https://python.langchain.com/en/latest/modules/prompts/prompt_templates.html)).\n",
|
||
"- **Output Parser:** A method of parsing the output from the prompt. If the LLM produces output using certain headers, you can enable complex interactions where variables are generated by the LLM in their response and passed into the next step of the chain ([docs](https://python.langchain.com/en/latest/modules/prompts/output_parsers.html)).\n",
|
||
"- **LLM Chain:** A Chain brings together a prompt template with an LLM that will execute it - in this case we'll be using ```gpt-3.5-turbo``` but this framework can be used with OpenAI completions models, or other LLMs entirely ([docs](https://python.langchain.com/en/latest/modules/chains.html)).\n",
|
||
"- **Tool:** An external service that the LLM can use to retrieve information or execute commands should the user require it ([docs](https://python.langchain.com/en/latest/modules/agents/tools.html)).\n",
|
||
"- **Agent:** The glue that brings all of this together, an agent can call multiple LLM Chains, each with their own tools. Agents can be extended with your own logic to allow retries, error handling and any other methods you choose to add reliability to your application ([docs](https://python.langchain.com/en/latest/modules/agents.html)).\n",
|
||
"\n",
|
||
"**NB:** Before using this cookbook with the Search tool you'll need to sign up on https://serpapi.com/ and generate an API key. Once you have it, store it in an environment variable named ```SERPAPI_API_KEY```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 34,
|
||
"id": "3f813668",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Initiate a Search tool - note you'll need to have set SERPAPI_API_KEY as an environment variable as per the above instructions\n",
|
||
"search = SerpAPIWrapper()\n",
|
||
"\n",
|
||
"# Define a list of tools\n",
|
||
"tools = [\n",
|
||
" Tool(\n",
|
||
" name = \"Search\",\n",
|
||
" func=search.run,\n",
|
||
" description=\"useful for when you need to answer questions about current events\"\n",
|
||
" )\n",
|
||
"]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 35,
|
||
"id": "effae642",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Set up the prompt with input variables for tools, user input and a scratchpad for the model to record its workings\n",
|
||
"template = \"\"\"Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:\n",
|
||
"\n",
|
||
"{tools}\n",
|
||
"\n",
|
||
"Use the following format:\n",
|
||
"\n",
|
||
"Question: the input question you must answer\n",
|
||
"Thought: you should always think about what to do\n",
|
||
"Action: the action to take, should be one of [{tool_names}]\n",
|
||
"Action Input: the input to the action\n",
|
||
"Observation: the result of the action\n",
|
||
"... (this Thought/Action/Action Input/Observation can repeat N times)\n",
|
||
"Thought: I now know the final answer\n",
|
||
"Final Answer: the final answer to the original input question\n",
|
||
"\n",
|
||
"Begin! Remember to speak as a pirate when giving your final answer. Use lots of \"Arg\"s\n",
|
||
"\n",
|
||
"Question: {input}\n",
|
||
"{agent_scratchpad}\"\"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 36,
|
||
"id": "b4efd85d",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Set up a prompt template\n",
|
||
"class CustomPromptTemplate(BaseChatPromptTemplate):\n",
|
||
" # The template to use\n",
|
||
" template: str\n",
|
||
" # The list of tools available\n",
|
||
" tools: List[Tool]\n",
|
||
" \n",
|
||
" def format_messages(self, **kwargs) -> str:\n",
|
||
" # Get the intermediate steps (AgentAction, Observation tuples)\n",
|
||
" \n",
|
||
" # Format them in a particular way\n",
|
||
" intermediate_steps = kwargs.pop(\"intermediate_steps\")\n",
|
||
" thoughts = \"\"\n",
|
||
" for action, observation in intermediate_steps:\n",
|
||
" thoughts += action.log\n",
|
||
" thoughts += f\"\\nObservation: {observation}\\nThought: \"\n",
|
||
" \n",
|
||
" # Set the agent_scratchpad variable to that value\n",
|
||
" kwargs[\"agent_scratchpad\"] = thoughts\n",
|
||
" \n",
|
||
" # Create a tools variable from the list of tools provided\n",
|
||
" kwargs[\"tools\"] = \"\\n\".join([f\"{tool.name}: {tool.description}\" for tool in self.tools])\n",
|
||
" \n",
|
||
" # Create a list of tool names for the tools provided\n",
|
||
" kwargs[\"tool_names\"] = \", \".join([tool.name for tool in self.tools])\n",
|
||
" formatted = self.template.format(**kwargs)\n",
|
||
" return [HumanMessage(content=formatted)]\n",
|
||
" \n",
|
||
"prompt = CustomPromptTemplate(\n",
|
||
" template=template,\n",
|
||
" tools=tools,\n",
|
||
" # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically\n",
|
||
" # This includes the `intermediate_steps` variable because that is needed\n",
|
||
" input_variables=[\"input\", \"intermediate_steps\"]\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 37,
|
||
"id": "533657a5",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"class CustomOutputParser(AgentOutputParser):\n",
|
||
" \n",
|
||
" def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:\n",
|
||
" \n",
|
||
" # Check if agent should finish\n",
|
||
" if \"Final Answer:\" in llm_output:\n",
|
||
" return AgentFinish(\n",
|
||
" # Return values is generally always a dictionary with a single `output` key\n",
|
||
" # It is not recommended to try anything else at the moment :)\n",
|
||
" return_values={\"output\": llm_output.split(\"Final Answer:\")[-1].strip()},\n",
|
||
" log=llm_output,\n",
|
||
" )\n",
|
||
" \n",
|
||
" # Parse out the action and action input\n",
|
||
" regex = r\"Action: (.*?)[\\n]*Action Input:[\\s]*(.*)\"\n",
|
||
" match = re.search(regex, llm_output, re.DOTALL)\n",
|
||
" \n",
|
||
" # If it can't parse the output it raises an error\n",
|
||
" # You can add your own logic here to handle errors in a different way i.e. pass to a human, give a canned response\n",
|
||
" if not match:\n",
|
||
" raise ValueError(f\"Could not parse LLM output: `{llm_output}`\")\n",
|
||
" action = match.group(1).strip()\n",
|
||
" action_input = match.group(2)\n",
|
||
" \n",
|
||
" # Return the action and action input\n",
|
||
" return AgentAction(tool=action, tool_input=action_input.strip(\" \").strip('\"'), log=llm_output)\n",
|
||
" \n",
|
||
"output_parser = CustomOutputParser()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 38,
|
||
"id": "4c7d93d1",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Initiate our LLM - default is 'gpt-3.5-turbo'\n",
|
||
"llm = ChatOpenAI(temperature=0)\n",
|
||
"\n",
|
||
"# LLM chain consisting of the LLM and a prompt\n",
|
||
"llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
|
||
"\n",
|
||
"# Using tools, the LLM chain and output_parser to make an agent\n",
|
||
"tool_names = [tool.name for tool in tools]\n",
|
||
"\n",
|
||
"agent = LLMSingleActionAgent(\n",
|
||
" llm_chain=llm_chain, \n",
|
||
" output_parser=output_parser,\n",
|
||
" # We use \"Observation\" as our stop sequence so it will stop when it receives Tool output\n",
|
||
" # If you change your prompt template you'll need to adjust this as well\n",
|
||
" stop=[\"\\nObservation:\"], \n",
|
||
" allowed_tools=tool_names\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 39,
|
||
"id": "dd893fa5",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Initiate the agent that will respond to our queries\n",
|
||
"# Set verbose=True to share the CoT reasoning the LLM goes through\n",
|
||
"agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 40,
|
||
"id": "c7ca3ee2",
|
||
"metadata": {
|
||
"scrolled": true
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\n",
|
||
"\n",
|
||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||
"\u001b[32;1m\u001b[1;3mThought: Hmm, I be not sure of the answer to that one. Let me think.\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"Canada population 2023\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3m39,566,248\u001b[0m\u001b[32;1m\u001b[1;3mAhoy, that be a lot of people! But I need to make sure this be true.\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"Canada population 2023 official source\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3mThe current population of Canada is 38,664,637 as of Wednesday, April 19, 2023, based on Worldometer elaboration of the latest United Nations data.\u001b[0m\u001b[32;1m\u001b[1;3mArrr, that be the official number! I be confident in me answer now.\n",
|
||
"Final Answer: The population of Canada as of 2023 is 38,664,637. Arg!\u001b[0m\n",
|
||
"\n",
|
||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'The population of Canada as of 2023 is 38,664,637. Arg!'"
|
||
]
|
||
},
|
||
"execution_count": 40,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"agent_executor.run(\"How many people live in canada as of 2023?\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 41,
|
||
"id": "9ff96040",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\n",
|
||
"\n",
|
||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||
"\u001b[32;1m\u001b[1;3mThought: Hmm, I'm not sure what this question is asking about. I better use the search tool.\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"2022 events\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3m8. Humanitarian Crises Deepen · 7. Latin America Moves Left. · 6. Iranians Protest. · 5. COVID Eases. · 4. Inflation Returns. · 3. Climate Change ...\u001b[0m\u001b[32;1m\u001b[1;3mAhoy, it looks like this be a question about what be happenin' in 2022. Let me search again.\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"2022 calendar\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3mUnited States 2022 – Calendar with American holidays. Yearly calendar showing months for the year 2022. Calendars – online and print friendly – for any year ...\u001b[0m\u001b[32;1m\u001b[1;3mShiver me timbers, it looks like this be a question about the year 2022. Let me search one more time.\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"What be happenin' in 2022?\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3m8. Humanitarian Crises Deepen · 7. Latin America Moves Left. · 6. Iranians Protest. · 5. COVID Eases. · 4. Inflation Returns. · 3. Climate Change ...\u001b[0m\u001b[32;1m\u001b[1;3mAvast ye, it looks like the same results be comin' up. I reckon there be no clear answer to this question.\n",
|
||
"Final Answer: Arg, I be sorry matey, but I can't give ye a clear answer to that question.\u001b[0m\n",
|
||
"\n",
|
||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"\"Arg, I be sorry matey, but I can't give ye a clear answer to that question.\""
|
||
]
|
||
},
|
||
"execution_count": 41,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"agent_executor.run(\"How many in 2022?\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c3b51817",
|
||
"metadata": {},
|
||
"source": [
|
||
"## LLM Agent with History\n",
|
||
"\n",
|
||
"Extend the LLM Agent with the ability to retain a [memory](https://python.langchain.com/en/latest/modules/agents/agents/custom_llm_agent.html#adding-memory) and use it as context as it continues the conversation.\n",
|
||
"\n",
|
||
"We use a simple ```ConversationBufferWindowMemory``` for this example that keeps a rolling window of the last two conversation turns. LangChain has other [memory options](https://python.langchain.com/en/latest/modules/memory.html), with different tradeoffs suitable for different use cases."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "85737eea",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Set up a prompt template which can interpolate the history\n",
|
||
"template_with_history = \"\"\"You are SearchGPT, a professional search engine who provides informative answers to users. Answer the following questions as best you can. You have access to the following tools:\n",
|
||
"\n",
|
||
"{tools}\n",
|
||
"\n",
|
||
"Use the following format:\n",
|
||
"\n",
|
||
"Question: the input question you must answer\n",
|
||
"Thought: you should always think about what to do\n",
|
||
"Action: the action to take, should be one of [{tool_names}]\n",
|
||
"Action Input: the input to the action\n",
|
||
"Observation: the result of the action\n",
|
||
"... (this Thought/Action/Action Input/Observation can repeat N times)\n",
|
||
"Thought: I now know the final answer\n",
|
||
"Final Answer: the final answer to the original input question\n",
|
||
"\n",
|
||
"Begin! Remember to give detailed, informative answers\n",
|
||
"\n",
|
||
"Previous conversation history:\n",
|
||
"{history}\n",
|
||
"\n",
|
||
"New question: {input}\n",
|
||
"{agent_scratchpad}\"\"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 43,
|
||
"id": "4d85c309",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"prompt_with_history = CustomPromptTemplate(\n",
|
||
" template=template_with_history,\n",
|
||
" tools=tools,\n",
|
||
" # The history template includes \"history\" as an input variable so we can interpolate it into the prompt\n",
|
||
" input_variables=[\"input\", \"intermediate_steps\", \"history\"]\n",
|
||
")\n",
|
||
"\n",
|
||
"llm_chain = LLMChain(llm=llm, prompt=prompt_with_history)\n",
|
||
"tool_names = [tool.name for tool in tools]\n",
|
||
"agent = LLMSingleActionAgent(\n",
|
||
" llm_chain=llm_chain, \n",
|
||
" output_parser=output_parser,\n",
|
||
" stop=[\"\\nObservation:\"], \n",
|
||
" allowed_tools=tool_names\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 44,
|
||
"id": "4f8c8fea",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Initiate the memory with k=2 to keep the last two turns\n",
|
||
"# Provide the memory to the agent\n",
|
||
"memory = ConversationBufferWindowMemory(k=2)\n",
|
||
"agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 45,
|
||
"id": "e1f031bb",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\n",
|
||
"\n",
|
||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||
"\u001b[32;1m\u001b[1;3mThought: I need to find the most recent population data for Canada.\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"Canada population 2023\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3m39,566,248\u001b[0m\u001b[32;1m\u001b[1;3mThis data seems reliable, but I should double-check the source.\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"Source of Canada population 2023\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3mThe current population of Canada is 38,664,637 as of Wednesday, April 19, 2023, based on Worldometer elaboration of the latest United Nations data. Canada 2020 population is estimated at 37,742,154 people at mid year according to UN data. Canada population is equivalent to 0.48% of the total world population.\u001b[0m\u001b[32;1m\u001b[1;3mI now know the final answer\n",
|
||
"Final Answer: As of April 19, 2023, the population of Canada is 38,664,637.\u001b[0m\n",
|
||
"\n",
|
||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'As of April 19, 2023, the population of Canada is 38,664,637.'"
|
||
]
|
||
},
|
||
"execution_count": 45,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"agent_executor.run(\"How many people live in canada as of 2023?\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 46,
|
||
"id": "9d28d73f",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\n",
|
||
"\n",
|
||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||
"\u001b[32;1m\u001b[1;3mThought: I need to search for the current population of Mexico.\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"current population of Mexico\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3mMexico, officially the United Mexican States, is a country in the southern portion of North America. It is bordered to the north by the United States; to the south and west by the Pacific Ocean; to the southeast by Guatemala, Belize, and the Caribbean Sea; and to the east by the Gulf of Mexico.\u001b[0m\u001b[32;1m\u001b[1;3mThat's not the answer to the question, I need to refine my search.\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"population of Mexico 2023\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3m132,709,512\u001b[0m\u001b[32;1m\u001b[1;3mI now know the final answer.\n",
|
||
"Final Answer: As of 2023, the population of Mexico is 132,709,512.\u001b[0m\n",
|
||
"\n",
|
||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'As of 2023, the population of Mexico is 132,709,512.'"
|
||
]
|
||
},
|
||
"execution_count": 46,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"agent_executor.run(\"how about in mexico?\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "62356eed",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Knowledge base\n",
|
||
"\n",
|
||
"Create a custom vectorstore for the Agent to use as a tool to answer questions with. We'll store the results in [Pinecone](https://docs.pinecone.io/docs/quickstart), which is supported by LangChain ([Docs](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/pinecone.html), [API reference](https://python.langchain.com/en/latest/reference/modules/vectorstore.html)). For help getting started with Pinecone or other vector databases, we have a [cookbook](https://github.com/openai/openai-cookbook/blob/colin/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb) to help you get started.\n",
|
||
"\n",
|
||
"You can check the LangChain documentation to see what other [vectorstores](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html) and [databases](https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html) are available.\n",
|
||
"\n",
|
||
"For this example we'll use the transcripts of the Stuff You Should Know podcast, which was provided thanks to OSF DOI [10.17605/OSF.IO/VM9NT](https://doi.org/10.17605/OSF.IO/VM9NT)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 18,
|
||
"id": "87f86008",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"100% [......................................................................] 571275039 / 571275039"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'sysk_podcast_transcripts_embedded.json.zip'"
|
||
]
|
||
},
|
||
"execution_count": 18,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import wget\n",
|
||
"\n",
|
||
"# Here is a URL to a zip archive containing the transcribed podcasts\n",
|
||
"# Note that this data has already been split into chunks and embeddings from OpenAI's text-embedding-ada-002 embedding model are included\n",
|
||
"content_url = 'https://cdn.openai.com/API/examples/data/sysk_podcast_transcripts_embedded.json.zip'\n",
|
||
"\n",
|
||
"# Download the file (it is ~541 MB so this will take some time)\n",
|
||
"wget.download(content_url)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"id": "4889c292",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Load podcasts\n",
|
||
"with zipfile.ZipFile(\"sysk_podcast_transcripts_embedded.json.zip\",\"r\") as zip_ref:\n",
|
||
" zip_ref.extractall(\"./data\")\n",
|
||
"f = open('./data/sysk_podcast_transcripts_embedded.json')\n",
|
||
"processed_podcasts = json.load(f)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 47,
|
||
"id": "1070ef60",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>id</th>\n",
|
||
" <th>filename</th>\n",
|
||
" <th>title</th>\n",
|
||
" <th>url</th>\n",
|
||
" <th>text_chunk</th>\n",
|
||
" <th>embedding</th>\n",
|
||
" <th>cleaned_id</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" <td>\\n\\nSYSK Selects How Crime Scene Cleanup Works</td>\n",
|
||
" <td>https://chtbl.com/track/5899E/podtrac.com/pts/...</td>\n",
|
||
" <td>Title: sysk_with_transcripts_SYSK Selects How ...</td>\n",
|
||
" <td>[0.021279960870742798, -0.005817972123622894, ...</td>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" <td>\\n\\nSYSK Selects How Crime Scene Cleanup Works</td>\n",
|
||
" <td>https://chtbl.com/track/5899E/podtrac.com/pts/...</td>\n",
|
||
" <td>Title: sysk_with_transcripts_SYSK Selects How ...</td>\n",
|
||
" <td>[0.013859338127076626, 0.00857278611510992, 0....</td>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" <td>\\n\\nSYSK Selects How Crime Scene Cleanup Works</td>\n",
|
||
" <td>https://chtbl.com/track/5899E/podtrac.com/pts/...</td>\n",
|
||
" <td>Title: sysk_with_transcripts_SYSK Selects How ...</td>\n",
|
||
" <td>[0.015242221765220165, 0.016030369326472282, 0...</td>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" <td>\\n\\nSYSK Selects How Crime Scene Cleanup Works</td>\n",
|
||
" <td>https://chtbl.com/track/5899E/podtrac.com/pts/...</td>\n",
|
||
" <td>Title: sysk_with_transcripts_SYSK Selects How ...</td>\n",
|
||
" <td>[0.004371842369437218, -0.003036574460566044, ...</td>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" <td>\\n\\nSYSK Selects How Crime Scene Cleanup Works</td>\n",
|
||
" <td>https://chtbl.com/track/5899E/podtrac.com/pts/...</td>\n",
|
||
" <td>Title: sysk_with_transcripts_SYSK Selects How ...</td>\n",
|
||
" <td>[0.017309172078967094, 0.015154214575886726, 0...</td>\n",
|
||
" <td>sysk_with_transcripts_SYSK Selects How Crime S...</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" id \\\n",
|
||
"0 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"1 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"2 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"3 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"4 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"\n",
|
||
" filename \\\n",
|
||
"0 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"1 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"2 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"3 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"4 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"\n",
|
||
" title \\\n",
|
||
"0 \\n\\nSYSK Selects How Crime Scene Cleanup Works \n",
|
||
"1 \\n\\nSYSK Selects How Crime Scene Cleanup Works \n",
|
||
"2 \\n\\nSYSK Selects How Crime Scene Cleanup Works \n",
|
||
"3 \\n\\nSYSK Selects How Crime Scene Cleanup Works \n",
|
||
"4 \\n\\nSYSK Selects How Crime Scene Cleanup Works \n",
|
||
"\n",
|
||
" url \\\n",
|
||
"0 https://chtbl.com/track/5899E/podtrac.com/pts/... \n",
|
||
"1 https://chtbl.com/track/5899E/podtrac.com/pts/... \n",
|
||
"2 https://chtbl.com/track/5899E/podtrac.com/pts/... \n",
|
||
"3 https://chtbl.com/track/5899E/podtrac.com/pts/... \n",
|
||
"4 https://chtbl.com/track/5899E/podtrac.com/pts/... \n",
|
||
"\n",
|
||
" text_chunk \\\n",
|
||
"0 Title: sysk_with_transcripts_SYSK Selects How ... \n",
|
||
"1 Title: sysk_with_transcripts_SYSK Selects How ... \n",
|
||
"2 Title: sysk_with_transcripts_SYSK Selects How ... \n",
|
||
"3 Title: sysk_with_transcripts_SYSK Selects How ... \n",
|
||
"4 Title: sysk_with_transcripts_SYSK Selects How ... \n",
|
||
"\n",
|
||
" embedding \\\n",
|
||
"0 [0.021279960870742798, -0.005817972123622894, ... \n",
|
||
"1 [0.013859338127076626, 0.00857278611510992, 0.... \n",
|
||
"2 [0.015242221765220165, 0.016030369326472282, 0... \n",
|
||
"3 [0.004371842369437218, -0.003036574460566044, ... \n",
|
||
"4 [0.017309172078967094, 0.015154214575886726, 0... \n",
|
||
"\n",
|
||
" cleaned_id \n",
|
||
"0 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"1 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"2 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"3 sysk_with_transcripts_SYSK Selects How Crime S... \n",
|
||
"4 sysk_with_transcripts_SYSK Selects How Crime S... "
|
||
]
|
||
},
|
||
"execution_count": 47,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# Have a look at the contents\n",
|
||
"pd.DataFrame(processed_podcasts).head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "f9715d1f",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Add the text embeddings to Pinecone\n",
|
||
"\n",
|
||
"batch_size = 100 # how many embeddings we create and insert at once\n",
|
||
"\n",
|
||
"for i in tqdm(range(0, len(processed_podcasts), batch_size)):\n",
|
||
" # find end of batch\n",
|
||
" i_end = min(len(processed_podcasts), i+batch_size)\n",
|
||
" meta_batch = processed_podcasts[i:i_end]\n",
|
||
" # get ids\n",
|
||
" ids_batch = [x['cleaned_id'] for x in meta_batch]\n",
|
||
" # get texts to encode\n",
|
||
" texts = [x['text_chunk'] for x in meta_batch]\n",
|
||
" # add embeddings\n",
|
||
" embeds = [x['embedding'] for x in meta_batch]\n",
|
||
" # cleanup metadata\n",
|
||
" meta_batch = [{\n",
|
||
" 'filename': x['filename'],\n",
|
||
" 'title': x['title'],\n",
|
||
" 'text_chunk': x['text_chunk'],\n",
|
||
" 'url': x['url']\n",
|
||
" } for x in meta_batch]\n",
|
||
" to_upsert = list(zip(ids_batch, embeds, meta_batch))\n",
|
||
" # upsert to Pinecone\n",
|
||
" index.upsert(vectors=to_upsert)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 48,
|
||
"id": "3c5bac10",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Configuring the embeddings to be used by our retriever to be OpenAI Embeddings, matching our embedded corpus\n",
|
||
"embeddings = OpenAIEmbeddings()\n",
|
||
"\n",
|
||
"\n",
|
||
"# Loads a docsearch object from an existing Pinecone index so we can retrieve from it\n",
|
||
"docsearch = Pinecone.from_existing_index(index_name,embeddings,text_key='text_chunk')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 49,
|
||
"id": "5dd21d92",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"retriever = docsearch.as_retriever()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 50,
|
||
"id": "8abd34e5",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"query_docs = retriever.get_relevant_documents(\"can you live without a bank account\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 51,
|
||
"id": "016bd458",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Title: sysk: Can You Live Without a Bank Account?\n",
|
||
"\n",
|
||
"Title: sysk_with_transcripts_Can you live without a bank account.json; And if you had a life, you didn't necessarily rectify your bank checkbook every day. Oh, wait, what is balancing a checkbook mean? Seriously? Yeah. Thank God for my wife. So another reason you might avoid a bank is philosophically. There may be a longstanding distrust of banks in your family that you don't want to put your money in, or you may just want to be like, you know what? I don't want to take part in this modern society. I want to kind of drop out a bit. And a really good first move is to shut your bank account down. That's a big statement. Oh, yeah, it is. But a lot of people that are underbanked and don't have accounts aren't there on purpose. It's not some philosophical statement. A lot of times it's simply because they are poor and they don't have a lot of alternatives. Yeah. And the other thing about not having a bank account, not only do you not have a bank account, you also are, like, basically just avoiding banks altogether. There's plenty of other things that banks offer, like loans and mortgage, lollipops, stuff like that. Yeah. Maybe some free nasty coffee. So when you don't have a banking account, that's like the most basic unit of the banking world. Right. If you don't have that, you obviously aren't going to be exposed to all these other things that can help. Things like build your credit history through like a revolving loan or a mortgage or a car loan or something like that that you can build up your credit for and ultimately save money. So when you don't have a bank account, for whatever reason, you are effectively out of the banking system. The problem is you can live parallel to the banking system outside of it, but it can be really dangerous, especially if you're just dealing with cash, because that cash has to stay somewhere, whether it's on you or in your mattress or in a coffee can in your backyard. You're exposed for having that readily available to anybody who finds it or comes into your house with a gun to get it. Yes.\n",
|
||
"\n",
|
||
"\n",
|
||
"Title: sysk: Can You Live Without a Bank Account?\n",
|
||
"\n",
|
||
"Title: sysk_with_transcripts_Can you live without a bank account.json; And it doesn't have to be an everyday thing. You can host when you want. Like, let's say you're taking a week's vacation. Why not host your home? Because that money could go toward paying for your current vacation or towards your retirement fund or even towards your kids college fund. Yeah. For anything. And listen, if you're worried about your stuff, don't be. Air cover for hosts. Let hosts welcome guests into their home without having to worry. You get $1 million in damage protection anytime you're hosting. Plus pet damage protection and income loss protection, too. And are you ready for this? Air cover for host is completely free every time you host on airbnb. Free with a capital F, with air cover for Host. It makes hosting a no brainer, and the benefits really start adding up. So learn more and host with peace of mind at Airbnb comaircoverforhosts. Capital One offers commercial solutions you can bank on. Now more than ever, your business faces specific challenges and unique opportunities. That's why Capital One offers a comprehensive suite of financial services custom tailored to your short and long term goals, backed by the expertise, strategy and resources of a top ten commercial bank, a dedicated team works with you to support your success and help you achieve your goals. Explore the possibilities at CapitalOne. comCOMMERCIAL all right, so if you live in modern society today, it is pretty tough to get by without a bank. Most cases these days you have well, I don't know about most cases, but in many cases you have automatic deposits of your work checks. Sure. A lot of people pay their bills wirelessly, online, directly from their bank. You might have a student loan, you might have a car loan, you might have your house mortgage, you might pay your credit card bills. All this stuff is running through a bank, most likely. And you would think it's probably impossible to not have a bank account these days. And I would say pretty much all Americans have them. Not true. Well, pretty much all Americans do. Like 93% do. Yeah, but that's not all. No, it's true.\n",
|
||
"\n",
|
||
"\n",
|
||
"Title: sysk: Can You Live Without a Bank Account?\n",
|
||
"\n",
|
||
"Title: sysk_with_transcripts_Can you live without a bank account.json; Yeah. 7% of Americans do not have bank accounts. About 9 million people last year in 2015 did not have bank accounts. 9 million people is a lot of people. No, it really is. And apparently that's household sorry, not people. Yeah, right. You're that is a big distinction, too. And the FDIC said, man, that's the lowest since we've been tracking this by far. And someone said, well, how long have you been tracking this? They said, well, the last six years. Really? Yeah, which I'm like. Really? That's when they started tracking it, but apparently so 2009. So if you want another number, the 9 million American households don't have bank accounts at all, then there are 25 million households in addition to that. So that makes almost like 34 million households, which that's a substantial number at this point. Sure. The 25 million are what's called underbanked, meaning they may have a bank account, but they don't use the bank account. Yeah. They don't use it because they are probably afraid of overdraft fees. Or they have maybe a bank account that got grandfathered in so that they don't have to pay minimum amount fees. And who knows? There's all sorts of reasons for people to not use a bank account that they have, but probably cheap among them is overdressed, which you'll talk more about. Yeah. And the majority of these underbank people in the United States are poor, usually. A lot of times they're minorities, a lot of times they're less educated. And these communities, there's a few reasons why they may not want to use a bank one. Maybe they don't trust banks. And if you look in the history of the United States or certainly even we're just talking about the Wells Fargo scandal, when you see stuff like that on the news, it should be upsetting to everyone. But obviously if you're poor and you don't have a lot of money, that may scare you into not wanting to use a bank at all. Right? Yeah.\n",
|
||
"\n",
|
||
"\n",
|
||
"Title: sysk: Can You Live Without a Bank Account?\n",
|
||
"\n",
|
||
"Title: sysk_with_transcripts_Can you live without a bank account.json; Maybe at the time, I might be making it up. I seem to remember them saying that, and I was like, I don't want that. Just let the check bounce and I'll take it up with them. Yes. The way it was marketed, though, was like, hey, we value you. We want to make sure that you can pay all your bills. So if something happens and you're overdrafted we'll cover it. We're just going to charge you a fee. And it sounds good, but again, when you go from high to low and all of a sudden your overdraft fees go from one to four or five or however many, that's a huge problem. Well, and the people that are overdrafting and the people that are at least able to afford those fees. Exactly. So it's a disproportionate burden on the poor, which makes it, as a scam, one of the more evil scams around. Yes. It's just wrong, then the idea that if you open an account, you should not opt in for overdraft protection. And it's easy to say when you're talking about checks for, like you're writing a check for a Mountain Dew and some cheetos. Yeah, who cares if you're short for that? You can go without that. But when you're talking about your rent check or like an actual grocery bill or something like that, it sucks that you can't get that stuff. But it's better to have to put a couple of things back than to pay $35 for one $2 item that you went over by, right? Yeah, that's a good point. And this was in my case, too. This is also back in the day when you I mean, a lot of times it was a mystery how much you had in your account. Right. Like, you couldn't just get on your phone before you write the check and be like, oh, well, no, I don't have enough money to cover this. Yeah, because even if you balanced your checkbook, sometimes you forgot to carry the one, it wasn't always 100% accurate.\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Print out the title and content for the most relevant retrieved documents\n",
|
||
"print(\"\\n\".join(['Title: ' + x.metadata['title'].strip() + '\\n\\n' + x.page_content + '\\n\\n' for x in query_docs]))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2bf10f53",
|
||
"metadata": {},
|
||
"source": [
|
||
"## LLM Agent with Tools\n",
|
||
"\n",
|
||
"Extend our list of tools by creating a [RetrievalQA](https://python.langchain.com/en/latest/modules/chains/index_examples/vector_db_qa.html) chain leveraging our Pinecone knowledge base."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 52,
|
||
"id": "a6cae4cf",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from langchain.chains import RetrievalQA\n",
|
||
"\n",
|
||
"retrieval_llm = OpenAI(temperature=0)\n",
|
||
"\n",
|
||
"podcast_retriever = RetrievalQA.from_chain_type(llm=retrieval_llm, chain_type=\"stuff\", retriever=docsearch.as_retriever())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 53,
|
||
"id": "0cc282cd",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"expanded_tools = [\n",
|
||
" Tool(\n",
|
||
" name = \"Search\",\n",
|
||
" func=search.run,\n",
|
||
" description=\"useful for when you need to answer questions about current events\"\n",
|
||
" ),\n",
|
||
" Tool(\n",
|
||
" name = 'Knowledge Base',\n",
|
||
" func=podcast_retriever.run,\n",
|
||
" description=\"Useful for general questions about how to do things and for details on interesting topics. Input should be a fully formed question.\"\n",
|
||
" )\n",
|
||
"]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 54,
|
||
"id": "d1e12121",
|
||
"metadata": {
|
||
"pycharm": {
|
||
"is_executing": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Re-initialize the agent with our new list of tools\n",
|
||
"prompt_with_history = CustomPromptTemplate(\n",
|
||
" template=template_with_history,\n",
|
||
" tools=expanded_tools,\n",
|
||
" input_variables=[\"input\", \"intermediate_steps\", \"history\"]\n",
|
||
")\n",
|
||
"llm_chain = LLMChain(llm=llm, prompt=prompt_with_history)\n",
|
||
"multi_tool_names = [tool.name for tool in expanded_tools]\n",
|
||
"multi_tool_agent = LLMSingleActionAgent(\n",
|
||
" llm_chain=llm_chain, \n",
|
||
" output_parser=output_parser,\n",
|
||
" stop=[\"\\nObservation:\"], \n",
|
||
" allowed_tools=multi_tool_names\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 55,
|
||
"id": "2ac8eaa2",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"multi_tool_memory = ConversationBufferWindowMemory(k=2)\n",
|
||
"multi_tool_executor = AgentExecutor.from_agent_and_tools(agent=multi_tool_agent, tools=expanded_tools, verbose=True, memory=multi_tool_memory)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 56,
|
||
"id": "f27bb1de",
|
||
"metadata": {
|
||
"scrolled": false
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\n",
|
||
"\n",
|
||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||
"\u001b[32;1m\u001b[1;3mThought: This is an interesting question. I'm not sure if I have the answer in my knowledge base, so I might need to search for it.\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"How to live without a bank account\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3mUnderbanked households have a checking or savings account but also use alternative financial services such as money orders, check cashing, international remittances, payday loans, refund anticipation loans, rent-to-own services, pawnshop loans, or auto title loans, according to the FDIC.\u001b[0m\u001b[32;1m\u001b[1;3mIt seems like there are alternative financial services available for those who don't have a bank account. I should look into this further to provide a more comprehensive answer.\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"Alternative financial services for those without a bank account\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3mInstead, people who are unbanked use alternative financial services—payday loans, money orders, check cashing services, pawnshop loans, and the like—to meet their banking needs. These financial services fill an important role for unbanked individuals, but they can also cause further financial hardship.\u001b[0m\u001b[32;1m\u001b[1;3mIt's important to note that while there are alternatives to having a bank account, relying solely on these services can lead to financial difficulties. I should provide some information on the potential drawbacks of not having a bank account.\n",
|
||
"Action: Knowledge Base\n",
|
||
"Action Input: \"What are the drawbacks of not having a bank account?\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[33;1m\u001b[1;3m Not having a bank account can be dangerous, as the cash has to be stored somewhere and can be exposed to anyone who finds it or comes into the house with a gun to get it. Additionally, not having a bank account means not being exposed to other things that can help, such as building credit history through loans or mortgages, which can ultimately save money. Finally, not having a bank account can be a disproportionate burden on the poor, as overdraft fees can be expensive.\u001b[0m\u001b[32;1m\u001b[1;3mIt's important to provide some resources for those who may be interested in learning more about alternative financial services or how to open a bank account. \n",
|
||
"Action: Knowledge Base\n",
|
||
"Action Input: \"Resources for alternative financial services or opening a bank account\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[33;1m\u001b[1;3m There are a few resources available for alternative financial services or opening a bank account. Prepaid credit cards are becoming more popular and can be found at convenience stores. Capital One offers commercial solutions and a comprehensive suite of financial services tailored to short and long term goals. Airbnb also offers Air Cover for Hosts, which provides $1 million in damage protection, pet damage protection, and income loss protection.\u001b[0m\u001b[32;1m\u001b[1;3mIt's important to note that while prepaid credit cards and alternative financial services can be helpful, they may not offer the same level of protection and benefits as a traditional bank account. It's also important to do research and compare options before making a decision. \n",
|
||
"Final Answer: While it is possible to live without a bank account by using alternative financial services, it may come with potential drawbacks and limitations. It's important to do research and compare options before making a decision, and there are resources available for those who may be interested in opening a bank account or exploring alternative financial services.\u001b[0m\n",
|
||
"\n",
|
||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"\"While it is possible to live without a bank account by using alternative financial services, it may come with potential drawbacks and limitations. It's important to do research and compare options before making a decision, and there are resources available for those who may be interested in opening a bank account or exploring alternative financial services.\""
|
||
]
|
||
},
|
||
"execution_count": 56,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"multi_tool_executor.run(\"Hi, I'd like to know how you can live without a bank account\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 57,
|
||
"id": "cba815cf",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\n",
|
||
"\n",
|
||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||
"\u001b[32;1m\u001b[1;3mThought: This is a complex topic that requires a balanced perspective\n",
|
||
"Action: Knowledge Base\n",
|
||
"Action Input: \"What are the arguments for and against zoos?\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[33;1m\u001b[1;3m The arguments for zoos include that they have gotten a lot better in the last 30-40 years, they participate in research and conservation projects, and they can help save species from extinction. The arguments against zoos include that they are still businesses, they can be counterproductive in terms of educating the public, and they can have a negative impact on the life span of animals in captivity.\u001b[0m\u001b[32;1m\u001b[1;3mIt's important to consider both sides of the argument before coming to a conclusion\n",
|
||
"Action: Search\n",
|
||
"Action Input: \"What are some examples of successful zoo conservation projects?\"\u001b[0m\n",
|
||
"\n",
|
||
"Observation:\u001b[36;1m\u001b[1;3mThere are dedicated species survival programs which have helped species come out from the brink of extinction, good examples of that being the black-footed ferrets, the red wolves, the Przewalski's wild horse, and the California condors.\u001b[0m\u001b[32;1m\u001b[1;3mWhile there are valid arguments on both sides, it seems that zoos can have a positive impact on conservation efforts for endangered species.\n",
|
||
"Final Answer: Zoos can have both positive and negative effects on animals, but they can play a role in conservation efforts for endangered species. It's important to consider both sides of the argument and do research before forming an opinion.\u001b[0m\n",
|
||
"\n",
|
||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"\"Zoos can have both positive and negative effects on animals, but they can play a role in conservation efforts for endangered species. It's important to consider both sides of the argument and do research before forming an opinion.\""
|
||
]
|
||
},
|
||
"execution_count": 57,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"multi_tool_executor.run('Can you tell me some interesting facts about whether zoos are good or bad for animals')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "efe408a3",
|
||
"metadata": {},
|
||
"source": [
|
||
"You now have a template to deploy conversational agents with tools. If you want to extend this with a Custom Agent to add your own retry behaviour or treatment of input/output variables, then follow [this article](https://python.langchain.com/en/latest/modules/agents/agents/custom_agent.html).\n",
|
||
"\n",
|
||
"We look forward to seeing what you build!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "384f72b2",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "pchain_env",
|
||
"language": "python",
|
||
"name": "pchain_env"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.10.10"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|