{ "cells": [ { "cell_type": "markdown", "id": "c81da886", "metadata": {}, "source": [ "# Pandas Dataframe\n", "\n", "This notebook shows how to use agents to interact with a `Pandas DataFrame`. It is mostly optimized for question answering.\n", "\n", "**NOTE: this agent calls the `Python` agent under the hood, which executes LLM generated Python code - this can be bad if the LLM generated Python code is harmful. Use cautiously.**" ] }, { "cell_type": "code", "execution_count": 1, "id": "0cdd9bf5", "metadata": {}, "outputs": [], "source": [ "from langchain.agents import create_pandas_dataframe_agent\n", "from langchain.chat_models import ChatOpenAI\n", "from langchain.agents.agent_types import AgentType" ] }, { "cell_type": "code", "execution_count": 2, "id": "051ebe84", "metadata": {}, "outputs": [], "source": [ "from langchain.llms import OpenAI\n", "import pandas as pd\n", "\n", "df = pd.read_csv(\"titanic.csv\")" ] }, { "cell_type": "markdown", "id": "a62858e2", "metadata": {}, "source": [ "## Using `ZERO_SHOT_REACT_DESCRIPTION`\n", "\n", "This shows how to initialize the agent using the `ZERO_SHOT_REACT_DESCRIPTION` agent type. Note that this is an alternative to the above." ] }, { "cell_type": "code", "execution_count": 3, "id": "4185ff46", "metadata": {}, "outputs": [], "source": [ "agent = create_pandas_dataframe_agent(OpenAI(temperature=0), df, verbose=True)" ] }, { "cell_type": "markdown", "id": "7233ab56", "metadata": {}, "source": [ "## Using OpenAI Functions\n", "\n", "This shows how to initialize the agent using the OPENAI_FUNCTIONS agent type. Note that this is an alternative to the above." ] }, { "cell_type": "code", "execution_count": 4, "id": "a8ea710e", "metadata": {}, "outputs": [], "source": [ "agent = create_pandas_dataframe_agent(\n", " ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\"),\n", " df,\n", " verbose=True,\n", " agent_type=AgentType.OPENAI_FUNCTIONS,\n", ")" ] }, { "cell_type": "code", "execution_count": 5, "id": "a9207a2e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3m\n", "Invoking: `python_repl_ast` with `df.shape[0]`\n", "\n", "\n", "\u001b[0m\u001b[36;1m\u001b[1;3m891\u001b[0m\u001b[32;1m\u001b[1;3mThere are 891 rows in the dataframe.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'There are 891 rows in the dataframe.'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent.run(\"how many rows are there?\")" ] }, { "cell_type": "code", "execution_count": 5, "id": "bd43617c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I need to count the number of people with more than 3 siblings\n", "Action: python_repl_ast\n", "Action Input: df[df['SibSp'] > 3].shape[0]\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3m30\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", "Final Answer: 30 people have more than 3 siblings.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'30 people have more than 3 siblings.'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent.run(\"how many people have more than 3 siblings\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "94e64b58", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I need to calculate the average age first\n", "Action: python_repl_ast\n", "Action Input: df['Age'].mean()\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3m29.69911764705882\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I now need to calculate the square root of the average age\n", "Action: python_repl_ast\n", "Action Input: math.sqrt(df['Age'].mean())\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3mNameError(\"name 'math' is not defined\")\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I need to import the math library\n", "Action: python_repl_ast\n", "Action Input: import math\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3m\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I now need to calculate the square root of the average age\n", "Action: python_repl_ast\n", "Action Input: math.sqrt(df['Age'].mean())\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3m5.449689683556195\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", "Final Answer: The square root of the average age is 5.449689683556195.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'The square root of the average age is 5.449689683556195.'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent.run(\"whats the square root of the average age?\")" ] }, { "cell_type": "markdown", "id": "c4bc0584", "metadata": {}, "source": [ "## Multi DataFrame Example\n", "\n", "This next part shows how the agent can interact with multiple dataframes passed in as a list." ] }, { "cell_type": "code", "execution_count": 7, "id": "42a15bd9", "metadata": {}, "outputs": [], "source": [ "df1 = df.copy()\n", "df1[\"Age\"] = df1[\"Age\"].fillna(df1[\"Age\"].mean())" ] }, { "cell_type": "code", "execution_count": 8, "id": "eba13b4d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I need to compare the age columns in both dataframes\n", "Action: python_repl_ast\n", "Action Input: len(df1[df1['Age'] != df2['Age']])\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3m177\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", "Final Answer: 177 rows in the age column are different.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'177 rows in the age column are different.'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent = create_pandas_dataframe_agent(OpenAI(temperature=0), [df, df1], verbose=True)\n", "agent.run(\"how many rows in the age column are different?\")" ] }, { "cell_type": "code", "execution_count": null, "id": "60d08a56", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }