{ "cells": [ { "cell_type": "markdown", "id": "7094e328", "metadata": {}, "source": [ "# CSV\n", "\n", "This notebook shows how to use agents to interact with data in `CSV` format. It is mostly optimized for question answering.\n", "\n", "**NOTE: this agent calls the Pandas DataFrame agent under the hood, which in turn calls the Python agent, which executes LLM generated Python code - this can be bad if the LLM generated Python code is harmful. Use cautiously.**\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "caae0bec", "metadata": {}, "outputs": [], "source": [ "from langchain.llms import OpenAI\n", "from langchain.chat_models import ChatOpenAI\n", "from langchain.agents.agent_types import AgentType\n", "\n", "from langchain.agents import create_csv_agent" ] }, { "cell_type": "markdown", "id": "bd806175", "metadata": {}, "source": [ "## Using `ZERO_SHOT_REACT_DESCRIPTION`\n", "\n", "This shows how to initialize the agent using the `ZERO_SHOT_REACT_DESCRIPTION` agent type. Note that this is an alternative to the above." ] }, { "cell_type": "code", "execution_count": 9, "id": "a1717204", "metadata": {}, "outputs": [], "source": [ "agent = create_csv_agent(\n", " OpenAI(temperature=0),\n", " \"titanic.csv\",\n", " verbose=True,\n", " agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n", ")" ] }, { "cell_type": "markdown", "id": "c31bb8a6", "metadata": {}, "source": [ "## Using OpenAI Functions\n", "\n", "This shows how to initialize the agent using the OPENAI_FUNCTIONS agent type. Note that this is an alternative to the above." ] }, { "cell_type": "code", "execution_count": 3, "id": "16c4dc59", "metadata": {}, "outputs": [], "source": [ "agent = create_csv_agent(\n", " ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\"),\n", " \"titanic.csv\",\n", " verbose=True,\n", " agent_type=AgentType.OPENAI_FUNCTIONS,\n", ")" ] }, { "cell_type": "code", "execution_count": 4, "id": "46b9489d", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Error in on_chain_start callback: 'name'\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32;1m\u001b[1;3m\n", "Invoking: `python_repl_ast` with `df.shape[0]`\n", "\n", "\n", "\u001b[0m\u001b[36;1m\u001b[1;3m891\u001b[0m\u001b[32;1m\u001b[1;3mThere are 891 rows in the dataframe.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'There are 891 rows in the dataframe.'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent.run(\"how many rows are there?\")" ] }, { "cell_type": "code", "execution_count": 5, "id": "a96309be", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Error in on_chain_start callback: 'name'\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32;1m\u001b[1;3m\n", "Invoking: `python_repl_ast` with `df[df['SibSp'] > 3]['PassengerId'].count()`\n", "\n", "\n", "\u001b[0m\u001b[36;1m\u001b[1;3m30\u001b[0m\u001b[32;1m\u001b[1;3mThere are 30 people in the dataframe who have more than 3 siblings.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'There are 30 people in the dataframe who have more than 3 siblings.'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent.run(\"how many people have more than 3 siblings\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "964a09f7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Error in on_chain_start callback: 'name'\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32;1m\u001b[1;3m\n", "Invoking: `python_repl_ast` with `import pandas as pd\n", "import math\n", "\n", "# Create a dataframe\n", "data = {'Age': [22, 38, 26, 35, 35]}\n", "df = pd.DataFrame(data)\n", "\n", "# Calculate the average age\n", "average_age = df['Age'].mean()\n", "\n", "# Calculate the square root of the average age\n", "square_root = math.sqrt(average_age)\n", "\n", "square_root`\n", "\n", "\n", "\u001b[0m\u001b[36;1m\u001b[1;3m5.585696017507576\u001b[0m\u001b[32;1m\u001b[1;3mThe square root of the average age is approximately 5.59.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'The square root of the average age is approximately 5.59.'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent.run(\"whats the square root of the average age?\")" ] }, { "cell_type": "markdown", "id": "09539c18", "metadata": {}, "source": [ "### Multi CSV Example\n", "\n", "This next part shows how the agent can interact with multiple csv files passed in as a list." ] }, { "cell_type": "code", "execution_count": 8, "id": "15f11fbd", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Error in on_chain_start callback: 'name'\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32;1m\u001b[1;3m\n", "Invoking: `python_repl_ast` with `df1['Age'].nunique() - df2['Age'].nunique()`\n", "\n", "\n", "\u001b[0m\u001b[36;1m\u001b[1;3m-1\u001b[0m\u001b[32;1m\u001b[1;3mThere is 1 row in the age column that is different between the two dataframes.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "'There is 1 row in the age column that is different between the two dataframes.'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent = create_csv_agent(\n", " ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\"),\n", " [\"titanic.csv\", \"titanic_age_fillna.csv\"],\n", " verbose=True,\n", " agent_type=AgentType.OPENAI_FUNCTIONS,\n", ")\n", "agent.run(\"how many rows in the age column are different between the two dfs?\")" ] }, { "cell_type": "code", "execution_count": null, "id": "f2909808", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }