{ "cells": [ { "cell_type": "markdown", "id": "c81da886", "metadata": {}, "source": [ "# Pandas Dataframe Agent\n", "\n", "This notebook shows how to use agents to interact with a pandas dataframe. It is mostly optimized for question answering.\n", "\n", "**NOTE: this agent calls the Python agent under the hood, which executes LLM generated Python code - this can be bad if the LLM generated Python code is harmful. Use cautiously.**" ] }, { "cell_type": "code", "execution_count": 1, "id": "0cdd9bf5", "metadata": {}, "outputs": [], "source": [ "from langchain.agents import create_pandas_dataframe_agent" ] }, { "cell_type": "code", "execution_count": 2, "id": "051ebe84", "metadata": {}, "outputs": [], "source": [ "from langchain.llms import OpenAI\n", "import pandas as pd\n", "\n", "df = pd.read_csv('titanic.csv')" ] }, { "cell_type": "code", "execution_count": 3, "id": "4185ff46", "metadata": {}, "outputs": [], "source": [ "agent = create_pandas_dataframe_agent(OpenAI(temperature=0), df, verbose=True)" ] }, { "cell_type": "code", "execution_count": 4, "id": "a9207a2e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I need to count the number of rows\n", "Action: python_repl_ast\n", "Action Input: df.shape[0]\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3m891\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", "Final Answer: There are 891 rows.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'There are 891 rows.'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent.run(\"how many rows are there?\")" ] }, { "cell_type": "code", "execution_count": 5, "id": "bd43617c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I need to count the number of people with more than 3 siblings\n", "Action: python_repl_ast\n", "Action Input: df[df['SibSp'] > 3].shape[0]\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3m30\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", "Final Answer: 30 people have more than 3 siblings.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'30 people have more than 3 siblings.'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent.run(\"how many people have more than 3 siblings\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "94e64b58", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I need to calculate the average age first\n", "Action: python_repl_ast\n", "Action Input: df['Age'].mean()\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3m29.69911764705882\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I now need to calculate the square root of the average age\n", "Action: python_repl_ast\n", "Action Input: math.sqrt(df['Age'].mean())\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3mNameError(\"name 'math' is not defined\")\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I need to import the math library\n", "Action: python_repl_ast\n", "Action Input: import math\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3m\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I now need to calculate the square root of the average age\n", "Action: python_repl_ast\n", "Action Input: math.sqrt(df['Age'].mean())\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3m5.449689683556195\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", "Final Answer: The square root of the average age is 5.449689683556195.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'The square root of the average age is 5.449689683556195.'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent.run(\"whats the square root of the average age?\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c4bc0584", "metadata": {}, "source": [ "### Multi DataFrame Example\n", "\n", "This next part shows how the agent can interact with multiple dataframes passed in as a list." ] }, { "cell_type": "code", "execution_count": 7, "id": "42a15bd9", "metadata": {}, "outputs": [], "source": [ "df1 = df.copy()\n", "df1[\"Age\"] = df1[\"Age\"].fillna(df1[\"Age\"].mean())" ] }, { "cell_type": "code", "execution_count": 8, "id": "eba13b4d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I need to compare the age columns in both dataframes\n", "Action: python_repl_ast\n", "Action Input: len(df1[df1['Age'] != df2['Age']])\u001b[0m\n", "Observation: \u001b[36;1m\u001b[1;3m177\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", "Final Answer: 177 rows in the age column are different.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'177 rows in the age column are different.'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent = create_pandas_dataframe_agent(OpenAI(temperature=0), [df, df1], verbose=True)\n", "agent.run(\"how many rows in the age column are different?\")" ] }, { "cell_type": "code", "execution_count": null, "id": "60d08a56", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 5 }