langchain/docs/extras/integrations/toolkits/airbyte_structured_qa.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Airbyte Question Answering\n",
    "This notebook shows how to do question answering over structured data, in this case using the `AirbyteStripeLoader`.\n",
    "\n",
    "Vectorstores often have a hard time answering questions that requires computing, grouping and filtering structured data so the high level idea is to use a `pandas` dataframe to help with these types of questions. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Load data from Stripe using Airbyte. user the `record_handler` paramater to return a JSON from the data loader."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "\n",
    "from langchain.document_loaders.airbyte import AirbyteStripeLoader\n",
    "from langchain.chat_models.openai import ChatOpenAI\n",
    "from langchain.agents import AgentType, create_pandas_dataframe_agent\n",
    "\n",
    "stream_name = \"customers\"\n",
    "config = {\n",
    "    \"client_secret\": os.getenv(\"STRIPE_CLIENT_SECRET\"),\n",
    "    \"account_id\": os.getenv(\"STRIPE_ACCOUNT_D\"),\n",
    "    \"start_date\": \"2023-01-20T00:00:00Z\",\n",
    "}\n",
    "\n",
    "def handle_record(record: dict, _id: str):\n",
    "    return record.data\n",
    "\n",
    "loader = AirbyteStripeLoader(\n",
    "    config=config,\n",
    "    record_handler=handle_record,\n",
    "    stream_name=stream_name,\n",
    ")\n",
    "data = loader.load()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. Pass the data to `pandas` dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(data)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. Pass the dataframe `df` to the `create_pandas_dataframe_agent` and invoke\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "agent = create_pandas_dataframe_agent(\n",
    "    ChatOpenAI(temperature=0, model=\"gpt-4\"),\n",
    "    df,\n",
    "    verbose=True,\n",
    "    agent_type=AgentType.OPENAI_FUNCTIONS,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4. Run the agent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output = agent.run(\"How many rows are there?\")"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}