Harrison/agent intro (#8138)

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago · f0eb5db670
parent cbf2fc8af8
commit f0eb5db670
2 changed files with 157 additions and 58 deletions
--- a/docs/docs_skeleton/docs/modules/agents/index.mdx
+++ b/docs/docs_skeleton/docs/modules/agents/index.mdx
@ -3,46 +3,80 @@ sidebar_position: 4
 ---
 # Agents

-Some applications require a flexible chain of calls to LLMs and other tools based on user input. The **Agent** interface provides the flexibility for such applications. An agent has access to a suite of tools, and determines which ones to use depending on the user input. Agents can use multiple tools, and use the output of one tool as the input to the next.
+The core idea of agents is to use an LLM to choose a sequence of actions to take.
+In chains, a sequence of actions is hardcoded (in code).
+In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.

-There are two main types of agents:
+There are several key components here:

- **Action agents**: at each timestep, decide on the next action using the outputs of all previous actions
- **Plan-and-execute agents**: decide on the full sequence of actions up front, then execute them all without updating the plan
+## Agent

-Action agents are suitable for small tasks, while plan-and-execute agents are better for complex or long-running tasks that require maintaining long-term objectives and focus. Often the best approach is to combine the dynamism of an action agent with the planning abilities of a plan-and-execute agent by letting the plan-and-execute agent use action agents to execute plans.
+This is the class responsible for deciding what step to take next.
+This is powered by a language model and a prompt.
+This prompt can include things like:

-For a full list of agent types see [agent types](/docs/modules/agents/agent_types/). Additional abstractions involved in agents are:
- [**Tools**](/docs/modules/agents/tools/): the actions an agent can take. What tools you give an agent highly depend on what you want the agent to do
- [**Toolkits**](/docs/modules/agents/toolkits/): wrappers around collections of tools that can be used together a specific use case. For example, in order for an agent to
-  interact with a SQL database it will likely need one tool to execute queries and another to inspect tables
+1. The personality of the agent (useful for having it respond in a certain way)
+2. Background context for the agent (useful for giving it more context on the types of tasks it's being asked to do)
+3. Prompting strategies to invoke better reasoning (the most famous/widely used being [ReAct](https://arxiv.org/abs/2210.03629))

-## Action agents
+LangChain provides a few different agent types to get started.
+Even then, you will likely want to customize those agents with parts (1) and (2).
+For a full list of agent types see [agent types](/docs/modules/agents/agent_types/)

-At a high-level an action agent:
-1. Receives user input
-2. Decides which tool, if any, to use and the tool input
-3. Calls the tool and records the output (also known as an "observation")
-4. Decides the next step using the history of tools, tool inputs, and observations
-5. Repeats 3-4 until it determines it can respond directly to the user
+## Tools

-Action agents are wrapped in **agent executors**, which are responsible for calling the agent, getting back an action and action input, calling the tool that the action references with the generated input, getting the output of the tool, and then passing all that information back into the agent to get the next action it should take.
+Tools are functions that an agent calls.
+There are two important considerations here:

-Although an agent can be constructed in many ways, it typically involves these components:
+1. Giving the agent access to the right tools
+2. Describing the tools in a way that is most helpful to the agent

- **Prompt template**: Responsible for taking the user input and previous steps and constructing a prompt
-  to send to the language model
- **Language model**: Takes the prompt with use input and action history and decides what to do next
- **Output parser**: Takes the output of the language model and parses it into the next action or a final answer
+Without both, the agent you are trying to build will not work.
+If you don't give the agent access to a correct set of tools, it will never be able to accomplish the objective.
+If you don't describe the tools properly, the agent won't know how to properly use them.

-## Plan-and-execute agents
+LangChain provides a wide set of tools to get started, but also makes it easy to define your own (including custom descriptions).
+For a full list of tools, see [here](/docs/modules/agents/tools/)

-At a high-level a plan-and-execute agent:
-1. Receives user input
-2. Plans the full sequence of steps to take
-3. Executes the steps in order, passing the outputs of past steps as inputs to future steps
+## Toolkits

-The most typical implementation is to have the planner be a language model, and the executor be an action agent. Read more [here](/docs/modules/agents/agent_types/plan_and_execute.html).
+Often the set of tools an agent has access to is more important than a single tool.
+For this LangChain provides the concept of toolkits - groups of tools needed to accomplish specific objectives.
+There are generally around 3-5 tools in a toolkit.
+
+LangChain provides a wide set of toolkits to get started.
+For a full list of toolkits, see [here](/docs/modules/agents/toolkits/)
+
+## AgentExecutor
+
+The agent executor is the runtime for an agent.
+This is what actually calls the agent and executes the actions it chooses.
+Pseudocode for this runtime is below:
+
+```python
+next_action = agent.get_action(...)
+while next_action != AgentFinish:
+    observation = run(next_action)
+    next_action = agent.get_action(..., next_action, observation)
+return next_action
+```
+
+While this may seem simple, there are several complexities this runtime handles for you, including:
+
+1. Handling cases where the agent selects a non-existent tool
+2. Handling cases where the tool errors
+3. Handling cases where the agent produces output that cannot be parsed into a tool invocation
+4. Logging and observability at all levels (agent decisions, tool calls) either to stdout or [LangSmith](https://smith.langchain.com).
+
+## Other types of agent runtimes
+
+The `AgentExecutor` class is the main agent runtime supported by LangChain.
+However, there are other, more experimental runtimes we also support.
+These include:
+
+- [Plan-and-execute Agent](/docs/modules/agents/agent_types/plan_and_execute.html)
+- [Baby AGI](/docs/use_cases/autonomous_agents/baby_agi.html)
+- [Auto GPT](/docs/use_cases/autonomous_agents/autogpt.html)

 ## Get started

--- a/docs/snippets/modules/agents/get_started.mdx
+++ b/docs/snippets/modules/agents/get_started.mdx
@ -1,36 +1,72 @@
+This will go over how to get started building an agent.
+We will use a LangChain agent class, but show how to customize it to give it specific context.
+We will then define custom tools, and then run it all in the standard LangChain AgentExecutor.
+
+### Set up the agent
+
+We will use the OpenAIFunctionsAgent.
+This is easiest and best agent to get started with.
+It does however require usage of ChatOpenAI models.
+If you want to use a different language model, we would recommend using the [ReAct](/docs/modules/agents/agent_types/react) agent.
+
+For this guide, we will construct a custom agent that has access to a custom tool.
+We are choosing this example because we think for most use cases you will NEED to customize either the agent or the tools.
+The tool we will give the agent is a tool to calculate the length of a word.
+This is useful because this is actually something LLMs can mess up due to tokenization.
+We will first create it WITHOUT memory, but we will then show how to add memory in.
+Memory is needed to enable conversation.
+
+First, let's load the language model we're going to use to control the agent.
 ```python
-from langchain.agents import load_tools
-from langchain.agents import initialize_agent
-from langchain.agents import AgentType
-from langchain.llms import OpenAI
+from langchain.chat_models import ChatOpenAI
+llm = ChatOpenAI(temperature=0)
 ```

-First, let's load the language model we're going to use to control the agent.
+Next, let's define some tools to use.
+Let's write a really simple Python function to calculate the length of a word that is passed in.
+


 ```python
-llm = OpenAI(temperature=0)
-```
+from langchain.agents import tool

-Next, let's load some tools to use. Note that the `llm-math` tool uses an LLM, so we need to pass that in.
+@tool
+def get_word_length(word: str) -> int:
+    """Returns the length of a word."""
+    return len(word)

+tools = [get_word_length]
+```
+
+Now let us create the prompt.
+We can use the `OpenAIFunctionsAgent.create_prompt` helper function to create a prompt automatically.
+This allows for a few different ways to customize, including passing in a custom SystemMessage, which we will do.

 ```python
-tools = load_tools(["serpapi", "llm-math"], llm=llm)
+from langchain.schema import SystemMessage
+system_message = SystemMessage(content="You are very powerful assistant, but bad at calculating lengths of words.")
+prompt = OpenAIFunctionsAgent.create_prompt(system_message=system_message)
 ```

-Finally, let's initialize an agent with the tools, the language model, and the type of agent we want to use.
+Putting those pieces together, we can now create the agent.
+
+```python
+from langchain.agents import OpenAIFunctionsAgent
+agent = OpenAIFunctionsAgent(llm=llm, tools=tools, prompt=prompt)
+```

+Finally, we create the AgentExecutor - the runtime for our agent.

 ```python
-agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
+from langchain.agents import AgentExecutor
+agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
 ```

 Now let's test it out!


 ```python
-agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?")
+agent_executor.run("how many letters in the word educa?")
 ```

 <CodeOutputBlock lang="python">
@ -39,29 +75,58 @@ agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to t
    
    
    > Entering new AgentExecutor chain...
-     I need to find out who Leo DiCaprio's girlfriend is and then calculate her age raised to the 0.43 power.
-    Action: Search
-    Action Input: "Leo DiCaprio girlfriend"
-    Observation: Camila Morrone
-    Thought: I need to find out Camila Morrone's age
-    Action: Search
-    Action Input: "Camila Morrone age"
-    Observation: 25 years
-    Thought: I need to calculate 25 raised to the 0.43 power
-    Action: Calculator
-    Action Input: 25^0.43
-    Observation: Answer: 3.991298452658078
-    
-    Thought: I now know the final answer
-    Final Answer: Camila Morrone is Leo DiCaprio's girlfriend and her current age raised to the 0.43 power is 3.991298452658078.
-    
-    > Finished chain.

+    Invoking: `get_word_length` with `{'word': 'educa'}`

+    5

+    There are 5 letters in the word "educa".

+    > Finished chain.

-    "Camila Morrone is Leo DiCaprio's girlfriend and her current age raised to the 0.43 power is 3.991298452658078."
+    'There are 5 letters in the word "educa".'
 ```

 </CodeOutputBlock>
+
+This is great - we have an agent!
+However, this agent is stateless - it doesn't remember anything about previous interactions.
+This means you can't ask follow up questions easily.
+Let's fix that by adding in memory.
+
+In order to do this, we need to do two things:
+
+1. Add a place for memory variables to go in the prompt
+2. Add memory to the AgentExecutor (note that we add it here, and NOT to the agent, as this is the outermost chain)
+
+First, let's add a place for memory in the prompt.
+We do this by adding a placeholder for messages with the key `"chat_history"`.
+
+```python
+from langchain.prompts import MessagesPlaceholder
+
+MEMORY_KEY = "chat_history"
+prompt = OpenAIFunctionsAgent.create_prompt(
+    system_message=system_message,
+    extra_prompt_messages=[MessagesPlaceholder(variable_name=MEMORY_KEY)]
+)
+```
+
+Next, let's create a memory object.
+We will do this by using `ConversationBufferMemory`.
+Importantly, we set `memory_key` also equal to `"chat_history"` (to align it with the prompt) and set `return_messages` (to make it return messages rather than a string).
+
+```python
+from langchain.memory import ConversationBufferMemory
+
+memory = ConversationBufferMemory(memory_key=MEMORY_KEY, return_messages=True)
+```
+
+We can then put it all together!
+
+```python
+agent = OpenAIFunctionsAgent(llm=llm, tools=tools, prompt=prompt)
+agent_executor = AgentExecutor(agent=agent, tools=tools, memory=memory, verbose=True)
+agent_executor.run("how many letters in the word educa?")
+agent_executor.run("is that a real word?")
+```