Airbyte based loaders (#8586)

This PR adds 8 new loaders: * `AirbyteCDKLoader` This reader can wrap and run all python-based Airbyte source connectors. * Separate loaders for the most commonly used APIs: * `AirbyteGongLoader` * `AirbyteHubspotLoader` * `AirbyteSalesforceLoader` * `AirbyteShopifyLoader` * `AirbyteStripeLoader` * `AirbyteTypeformLoader` * `AirbyteZendeskSupportLoader` ## Documentation and getting started I added the basic shape of the config to the notebooks. This increases the maintenance effort a bit, but I think it's worth it to make sure people can get started quickly with these important connectors. This is also why I linked the spec and the documentation page in the readme as these two contain all the information to configure a source correctly (e.g. it won't suggest using oauth if that's avoidable even if the connector supports it). ## Document generation The "documents" produced by these loaders won't have a text part (instead, all the record fields are put into the metadata). If a text is required by the use case, the caller needs to do custom transformation suitable for their use case. ## Incremental sync All loaders support incremental syncs if the underlying streams support it. By storing the `last_state` from the reader instance away and passing it in when loading, it will only load updated records. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago · 8f0cd91d57
parent 15f650ae8c
commit 8f0cd91d57
9 changed files with 1877 additions and 0 deletions
--- a/docs/extras/integrations/document_loaders/airbyte_cdk.ipynb
+++ b/docs/extras/integrations/document_loaders/airbyte_cdk.ipynb
@ -0,0 +1,226 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1f3a5ebf",
+   "metadata": {},
+   "source": [
+    "# Airbyte CDK"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
+   "metadata": {},
+   "source": [
+    ">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
+    "\n",
+    "A lot of source connectors are implemented using the [Airbyte CDK](https://docs.airbyte.com/connector-development/cdk-python/). This loader allows to run any of these connectors and return the data as documents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b06fbde",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3e9dc79",
+   "metadata": {},
+   "source": [
+    "First, you need to install the `airbyte-cdk` python package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d35e4e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install airbyte-cdk"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "085aa658",
+   "metadata": {},
+   "source": [
+    "Then, either install an existing connector from the [Airbyte Github repository](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors) or create your own connector using the [Airbyte CDK](https://docs.airbyte.io/connector-development/connector-development).\n",
+    "\n",
+    "For example, to install the Github connector, run"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f6d04ef4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install \"source_github@git+https://github.com/airbytehq/airbyte.git@master#subdirectory=airbyte-integrations/connectors/source-github\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36069b74",
+   "metadata": {},
+   "source": [
+    "Some sources are also published as regular packages on PyPI"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae855210",
+   "metadata": {},
+   "source": [
+    "## Example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02208f52",
+   "metadata": {},
+   "source": [
+    "Now you can create an `AirbyteCDKLoader` based on the imported source. It takes a `config` object that's passed to the connector. You also have to pick the stream you want to retrieve records from by name (`stream_name`). Check the connectors documentation page and spec definition for more information on the config object and available streams. For the Github connectors these are:\n",
+    "* [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-github/source_github/spec.json](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-github/source_github/spec.json).\n",
+    "* [https://docs.airbyte.com/integrations/sources/github/](https://docs.airbyte.com/integrations/sources/github/)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89a99e58",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from langchain.document_loaders.airbyte import AirbyteCDKLoader\n",
+    "from source_github.source import SourceGithub # plug in your own source here\n",
+    "\n",
+    "config = {\n",
+    "    # your github configuration\n",
+    "    \"credentials\": {\n",
+    "        \"api_url\": \"api.github.com\",\n",
+    "        \"personal_access_token\": \"<token>\"\n",
+    "    },\n",
+    "    \"repository\": \"<repo>\",\n",
+    "    \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\"\n",
+    "}\n",
+    "\n",
+    "issues_loader = AirbyteCDKLoader(source_class=SourceGithub, config=config, stream_name=\"issues\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cea23fc",
+   "metadata": {},
+   "source": [
+    "Now you can load documents the usual way"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dae75cdb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs = issues_loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a93dc2a",
+   "metadata": {},
+   "source": [
+    "As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1782db09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_iterator = issues_loader.lazy_load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a124086",
+   "metadata": {},
+   "source": [
+    "Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5671395d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.docstore.document import Document\n",
+    "\n",
+    "def handle_record(record, id):\n",
+    "    return Document(page_content=record.data[\"title\"] + \"\\n\" + (record.data[\"body\"] or \"\"), metadata=record.data)\n",
+    "\n",
+    "issues_loader = AirbyteCDKLoader(source_class=SourceGithub, config=config, stream_name=\"issues\", record_handler=handle_record)\n",
+    "\n",
+    "docs = issues_loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "223eb8bc",
+   "metadata": {},
+   "source": [
+    "## Incremental loads\n",
+    "\n",
+    "Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
+    "\n",
+    "To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7061e735",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "last_state = issues_loader.last_state # store safely\n",
+    "\n",
+    "incremental_issue_loader = AirbyteCDKLoader(source_class=SourceGithub, config=config, stream_name=\"issues\", state=last_state)\n",
+    "\n",
+    "new_docs = incremental_issue_loader.load()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/extras/integrations/document_loaders/airbyte_gong.ipynb
+++ b/docs/extras/integrations/document_loaders/airbyte_gong.ipynb
@ -0,0 +1,206 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1f3a5ebf",
+   "metadata": {},
+   "source": [
+    "# Airbyte Gong"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
+   "metadata": {},
+   "source": [
+    ">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
+    "\n",
+    "This loader exposes the Gong connector as a document loader, allowing you to load various Gong objects as documents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6847a40c",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b06fbde",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3e9dc79",
+   "metadata": {},
+   "source": [
+    "First, you need to install the `airbyte-source-gong` python package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d35e4e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install airbyte-source-gong"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae855210",
+   "metadata": {},
+   "source": [
+    "## Example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02208f52",
+   "metadata": {},
+   "source": [
+    "Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/gong/) for details about how to configure the reader.\n",
+    "The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-gong/source_gong/spec.yaml](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-gong/source_gong/spec.yaml).\n",
+    "\n",
+    "The general shape looks like this:\n",
+    "```python\n",
+    "{\n",
+    "  \"access_key\": \"<access key name>\",\n",
+    "  \"access_key_secret\": \"<access key secret>\",\n",
+    "  \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
+    "}\n",
+    "```\n",
+    "\n",
+    "By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89a99e58",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from langchain.document_loaders.airbyte import AirbyteGongLoader\n",
+    "\n",
+    "config = {\n",
+    "    # your gong configuration\n",
+    "}\n",
+    "\n",
+    "loader = AirbyteGongLoader(config=config, stream_name=\"calls\") # check the documentation linked above for a list of all streams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cea23fc",
+   "metadata": {},
+   "source": [
+    "Now you can load documents the usual way"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dae75cdb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a93dc2a",
+   "metadata": {},
+   "source": [
+    "As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1782db09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_iterator = loader.lazy_load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a124086",
+   "metadata": {},
+   "source": [
+    "Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To process documents, create a class inheriting from the base loader and implement the `_handle_records` method yourself:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5671395d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.docstore.document import Document\n",
+    "\n",
+    "def handle_record(record, id):\n",
+    "    return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
+    "\n",
+    "loader = AirbyteGongLoader(config=config, record_handler=handle_record, stream_name=\"calls\")\n",
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "223eb8bc",
+   "metadata": {},
+   "source": [
+    "## Incremental loads\n",
+    "\n",
+    "Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
+    "\n",
+    "To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7061e735",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "last_state = loader.last_state # store safely\n",
+    "\n",
+    "incremental_loader = AirbyteGongLoader(config=config, stream_name=\"calls\", state=last_state)\n",
+    "\n",
+    "new_docs = incremental_loader.load()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/extras/integrations/document_loaders/airbyte_hubspot.ipynb
+++ b/docs/extras/integrations/document_loaders/airbyte_hubspot.ipynb
@ -0,0 +1,208 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1f3a5ebf",
+   "metadata": {},
+   "source": [
+    "# Airbyte Hubspot"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
+   "metadata": {},
+   "source": [
+    ">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
+    "\n",
+    "This loader exposes the Hubspot connector as a document loader, allowing you to load various Hubspot objects as documents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6847a40c",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b06fbde",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3e9dc79",
+   "metadata": {},
+   "source": [
+    "First, you need to install the `airbyte-source-hubspot` python package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d35e4e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install airbyte-source-hubspot"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae855210",
+   "metadata": {},
+   "source": [
+    "## Example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02208f52",
+   "metadata": {},
+   "source": [
+    "Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/hubspot/) for details about how to configure the reader.\n",
+    "The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-hubspot/source_hubspot/spec.yaml](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-hubspot/source_hubspot/spec.yaml).\n",
+    "\n",
+    "The general shape looks like this:\n",
+    "```python\n",
+    "{\n",
+    "  \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
+    "  \"credentials\": {\n",
+    "    \"credentials_title\": \"Private App Credentials\",\n",
+    "    \"access_token\": \"<access token of your private app>\"\n",
+    "  }\n",
+    "}\n",
+    "```\n",
+    "\n",
+    "By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89a99e58",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from langchain.document_loaders.airbyte import AirbyteHubspotLoader\n",
+    "\n",
+    "config = {\n",
+    "    # your hubspot configuration\n",
+    "}\n",
+    "\n",
+    "loader = AirbyteHubspotLoader(config=config, stream_name=\"products\") # check the documentation linked above for a list of all streams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cea23fc",
+   "metadata": {},
+   "source": [
+    "Now you can load documents the usual way"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dae75cdb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a93dc2a",
+   "metadata": {},
+   "source": [
+    "As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1782db09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_iterator = loader.lazy_load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a124086",
+   "metadata": {},
+   "source": [
+    "Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To process documents, create a class inheriting from the base loader and implement the `_handle_records` method yourself:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5671395d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.docstore.document import Document\n",
+    "\n",
+    "def handle_record(record, id):\n",
+    "    return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
+    "\n",
+    "loader = AirbyteHubspotLoader(config=config, record_handler=handle_record, stream_name=\"products\")\n",
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "223eb8bc",
+   "metadata": {},
+   "source": [
+    "## Incremental loads\n",
+    "\n",
+    "Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
+    "\n",
+    "To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7061e735",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "last_state = loader.last_state # store safely\n",
+    "\n",
+    "incremental_loader = AirbyteHubspotLoader(config=config, stream_name=\"products\", state=last_state)\n",
+    "\n",
+    "new_docs = incremental_loader.load()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/extras/integrations/document_loaders/airbyte_salesforce.ipynb
+++ b/docs/extras/integrations/document_loaders/airbyte_salesforce.ipynb
@ -0,0 +1,213 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1f3a5ebf",
+   "metadata": {},
+   "source": [
+    "# Airbyte Salesforce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
+   "metadata": {},
+   "source": [
+    ">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
+    "\n",
+    "This loader exposes the Salesforce connector as a document loader, allowing you to load various Salesforce objects as documents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6847a40c",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b06fbde",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3e9dc79",
+   "metadata": {},
+   "source": [
+    "First, you need to install the `airbyte-source-salesforce` python package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d35e4e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install airbyte-source-salesforce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae855210",
+   "metadata": {},
+   "source": [
+    "## Example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02208f52",
+   "metadata": {},
+   "source": [
+    "Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/salesforce/) for details about how to configure the reader.\n",
+    "The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-salesforce/source_salesforce/spec.yaml](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-salesforce/source_salesforce/spec.yaml).\n",
+    "\n",
+    "The general shape looks like this:\n",
+    "```python\n",
+    "{\n",
+    "  \"client_id\": \"<oauth client id>\",\n",
+    "  \"client_secret\": \"<oauth client secret>\",\n",
+    "  \"refresh_token\": \"<oauth refresh token>\",\n",
+    "  \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
+    "  \"is_sandbox\": False, # set to True if you're using a sandbox environment\n",
+    "  \"streams_criteria\": [ # Array of filters for salesforce objects that should be loadable\n",
+    "    {\"criteria\": \"exacts\", \"value\": \"Account\"}, # Exact name of salesforce object\n",
+    "    {\"criteria\": \"starts with\", \"value\": \"Asset\"}, # Prefix of the name\n",
+    "    # Other allowed criteria: ends with, contains, starts not with, ends not with, not contains, not exacts\n",
+    "  ],\n",
+    "}\n",
+    "```\n",
+    "\n",
+    "By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89a99e58",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from langchain.document_loaders.airbyte import AirbyteSalesforceLoader\n",
+    "\n",
+    "config = {\n",
+    "    # your salesforce configuration\n",
+    "}\n",
+    "\n",
+    "loader = AirbyteSalesforceLoader(config=config, stream_name=\"asset\") # check the documentation linked above for a list of all streams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cea23fc",
+   "metadata": {},
+   "source": [
+    "Now you can load documents the usual way"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dae75cdb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a93dc2a",
+   "metadata": {},
+   "source": [
+    "As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1782db09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_iterator = loader.lazy_load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a124086",
+   "metadata": {},
+   "source": [
+    "Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5671395d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.docstore.document import Document\n",
+    "\n",
+    "def handle_record(record, id):\n",
+    "    return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
+    "\n",
+    "loader = AirbyteSalesforceLoader(config=config, record_handler=handle_record, stream_name=\"asset\")\n",
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "223eb8bc",
+   "metadata": {},
+   "source": [
+    "## Incremental loads\n",
+    "\n",
+    "Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
+    "\n",
+    "To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7061e735",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "last_state = loader.last_state # store safely\n",
+    "\n",
+    "incremental_loader = AirbyteSalesforceLoader(config=config, stream_name=\"asset\", state=last_state)\n",
+    "\n",
+    "new_docs = incremental_loader.load()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/extras/integrations/document_loaders/airbyte_shopify.ipynb
+++ b/docs/extras/integrations/document_loaders/airbyte_shopify.ipynb
@ -0,0 +1,209 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1f3a5ebf",
+   "metadata": {},
+   "source": [
+    "# Airbyte Shopify"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
+   "metadata": {},
+   "source": [
+    ">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
+    "\n",
+    "This loader exposes the Shopify connector as a document loader, allowing you to load various Shopify objects as documents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6847a40c",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b06fbde",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3e9dc79",
+   "metadata": {},
+   "source": [
+    "First, you need to install the `airbyte-source-shopify` python package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d35e4e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install airbyte-source-shopify"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae855210",
+   "metadata": {},
+   "source": [
+    "## Example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02208f52",
+   "metadata": {},
+   "source": [
+    "Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/shopify/) for details about how to configure the reader.\n",
+    "The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-shopify/source_shopify/spec.json](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-shopify/source_shopify/spec.json).\n",
+    "\n",
+    "The general shape looks like this:\n",
+    "```python\n",
+    "{\n",
+    "    \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
+    "    \"shop\": \"<name of the shop you want to retrieve documents from>\",\n",
+    "    \"credentials\": {\n",
+    "        \"auth_method\": \"api_password\",\n",
+    "        \"api_password\": \"<your api password>\"\n",
+    "    }\n",
+    "}\n",
+    "```\n",
+    "\n",
+    "By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89a99e58",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from langchain.document_loaders.airbyte import AirbyteShopifyLoader\n",
+    "\n",
+    "config = {\n",
+    "    # your shopify configuration\n",
+    "}\n",
+    "\n",
+    "loader = AirbyteShopifyLoader(config=config, stream_name=\"orders\") # check the documentation linked above for a list of all streams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cea23fc",
+   "metadata": {},
+   "source": [
+    "Now you can load documents the usual way"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dae75cdb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a93dc2a",
+   "metadata": {},
+   "source": [
+    "As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1782db09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_iterator = loader.lazy_load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a124086",
+   "metadata": {},
+   "source": [
+    "Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5671395d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.docstore.document import Document\n",
+    "\n",
+    "def handle_record(record, id):\n",
+    "    return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
+    "\n",
+    "loader = AirbyteShopifyLoader(config=config, record_handler=handle_record, stream_name=\"orders\")\n",
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "223eb8bc",
+   "metadata": {},
+   "source": [
+    "## Incremental loads\n",
+    "\n",
+    "Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
+    "\n",
+    "To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7061e735",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "last_state = loader.last_state # store safely\n",
+    "\n",
+    "incremental_loader = AirbyteShopifyLoader(config=config, stream_name=\"orders\", state=last_state)\n",
+    "\n",
+    "new_docs = incremental_loader.load()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/extras/integrations/document_loaders/airbyte_stripe.ipynb
+++ b/docs/extras/integrations/document_loaders/airbyte_stripe.ipynb
@ -0,0 +1,206 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1f3a5ebf",
+   "metadata": {},
+   "source": [
+    "# Airbyte Stripe"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
+   "metadata": {},
+   "source": [
+    ">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
+    "\n",
+    "This loader exposes the Stripe connector as a document loader, allowing you to load various Stripe objects as documents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6847a40c",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b06fbde",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3e9dc79",
+   "metadata": {},
+   "source": [
+    "First, you need to install the `airbyte-source-stripe` python package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d35e4e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install airbyte-source-stripe"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae855210",
+   "metadata": {},
+   "source": [
+    "## Example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02208f52",
+   "metadata": {},
+   "source": [
+    "Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/stripe/) for details about how to configure the reader.\n",
+    "The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml).\n",
+    "\n",
+    "The general shape looks like this:\n",
+    "```python\n",
+    "{\n",
+    "  \"client_secret\": \"<secret key>\",\n",
+    "  \"account_id\": \"<account id>\",\n",
+    "  \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
+    "}\n",
+    "```\n",
+    "\n",
+    "By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89a99e58",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from langchain.document_loaders.airbyte import AirbyteStripeLoader\n",
+    "\n",
+    "config = {\n",
+    "    # your stripe configuration\n",
+    "}\n",
+    "\n",
+    "loader = AirbyteStripeLoader(config=config, stream_name=\"invoices\") # check the documentation linked above for a list of all streams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cea23fc",
+   "metadata": {},
+   "source": [
+    "Now you can load documents the usual way"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dae75cdb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a93dc2a",
+   "metadata": {},
+   "source": [
+    "As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1782db09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_iterator = loader.lazy_load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a124086",
+   "metadata": {},
+   "source": [
+    "Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5671395d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.docstore.document import Document\n",
+    "\n",
+    "def handle_record(record, id):\n",
+    "    return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
+    "\n",
+    "loader = AirbyteStripeLoader(config=config, record_handler=handle_record, stream_name=\"invoices\")\n",
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "223eb8bc",
+   "metadata": {},
+   "source": [
+    "## Incremental loads\n",
+    "\n",
+    "Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
+    "\n",
+    "To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7061e735",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "last_state = loader.last_state # store safely\n",
+    "\n",
+    "incremental_loader = AirbyteStripeLoader(config=config, record_handler=handle_record, stream_name=\"invoices\", state=last_state)\n",
+    "\n",
+    "new_docs = incremental_loader.load()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/extras/integrations/document_loaders/airbyte_typeform.ipynb
+++ b/docs/extras/integrations/document_loaders/airbyte_typeform.ipynb
@ -0,0 +1,209 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1f3a5ebf",
+   "metadata": {},
+   "source": [
+    "# Airbyte Typeform"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
+   "metadata": {},
+   "source": [
+    ">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
+    "\n",
+    "This loader exposes the Typeform connector as a document loader, allowing you to load various Typeform objects as documents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6847a40c",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b06fbde",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3e9dc79",
+   "metadata": {},
+   "source": [
+    "First, you need to install the `airbyte-source-typeform` python package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d35e4e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install airbyte-source-typeform"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae855210",
+   "metadata": {},
+   "source": [
+    "## Example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02208f52",
+   "metadata": {},
+   "source": [
+    "Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/typeform/) for details about how to configure the reader.\n",
+    "The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-typeform/source_typeform/spec.json](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-typeform/source_typeform/spec.json).\n",
+    "\n",
+    "The general shape looks like this:\n",
+    "```python\n",
+    "{\n",
+    "  \"credentials\": {\n",
+    "    \"auth_type\": \"Private Token\",\n",
+    "    \"access_token\": \"<your auth token>\"\n",
+    "  },\n",
+    "  \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
+    "  \"form_ids\": [\"<id of form to load records for>\"] # if omitted, records from all forms will be loaded\n",
+    "}\n",
+    "```\n",
+    "\n",
+    "By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89a99e58",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from langchain.document_loaders.airbyte import AirbyteTypeformLoader\n",
+    "\n",
+    "config = {\n",
+    "    # your typeform configuration\n",
+    "}\n",
+    "\n",
+    "loader = AirbyteTypeformLoader(config=config, stream_name=\"forms\") # check the documentation linked above for a list of all streams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cea23fc",
+   "metadata": {},
+   "source": [
+    "Now you can load documents the usual way"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dae75cdb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a93dc2a",
+   "metadata": {},
+   "source": [
+    "As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1782db09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_iterator = loader.lazy_load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a124086",
+   "metadata": {},
+   "source": [
+    "Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5671395d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.docstore.document import Document\n",
+    "\n",
+    "def handle_record(record, id):\n",
+    "    return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
+    "\n",
+    "loader = AirbyteTypeformLoader(config=config, record_handler=handle_record, stream_name=\"forms\")\n",
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "223eb8bc",
+   "metadata": {},
+   "source": [
+    "## Incremental loads\n",
+    "\n",
+    "Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
+    "\n",
+    "To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7061e735",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "last_state = loader.last_state # store safely\n",
+    "\n",
+    "incremental_loader = AirbyteTypeformLoader(config=config, record_handler=handle_record, stream_name=\"forms\", state=last_state)\n",
+    "\n",
+    "new_docs = incremental_loader.load()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/extras/integrations/document_loaders/airbyte_zendesk_support.ipynb
+++ b/docs/extras/integrations/document_loaders/airbyte_zendesk_support.ipynb
@ -0,0 +1,210 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1f3a5ebf",
+   "metadata": {},
+   "source": [
+    "# Airbyte Zendesk Support"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
+   "metadata": {},
+   "source": [
+    ">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
+    "\n",
+    "This loader exposes the Zendesk Support connector as a document loader, allowing you to load various objects as documents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6847a40c",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b06fbde",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3e9dc79",
+   "metadata": {},
+   "source": [
+    "First, you need to install the `airbyte-source-zendesk-support` python package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d35e4e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install airbyte-source-zendesk-support"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae855210",
+   "metadata": {},
+   "source": [
+    "## Example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02208f52",
+   "metadata": {},
+   "source": [
+    "Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/zendesk-support/) for details about how to configure the reader.\n",
+    "The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-zendesk-support/source_zendesk_support/spec.json](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-zendesk-support/source_zendesk_support/spec.json).\n",
+    "\n",
+    "The general shape looks like this:\n",
+    "```python\n",
+    "{\n",
+    "  \"subdomain\": \"<your zendesk subdomain>\",\n",
+    "  \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
+    "  \"credentials\": {\n",
+    "    \"credentials\": \"api_token\",\n",
+    "    \"email\": \"<your email>\",\n",
+    "    \"api_token\": \"<your api token>\"\n",
+    "  }\n",
+    "}\n",
+    "```\n",
+    "\n",
+    "By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89a99e58",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from langchain.document_loaders.airbyte import AirbyteZendeskSupportLoader\n",
+    "\n",
+    "config = {\n",
+    "    # your zendesk-support configuration\n",
+    "}\n",
+    "\n",
+    "loader = AirbyteZendeskSupportLoader(config=config, stream_name=\"tickets\") # check the documentation linked above for a list of all streams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cea23fc",
+   "metadata": {},
+   "source": [
+    "Now you can load documents the usual way"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dae75cdb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a93dc2a",
+   "metadata": {},
+   "source": [
+    "As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1782db09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_iterator = loader.lazy_load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a124086",
+   "metadata": {},
+   "source": [
+    "Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5671395d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.docstore.document import Document\n",
+    "\n",
+    "def handle_record(record, id):\n",
+    "    return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
+    "\n",
+    "loader = AirbyteZendeskSupportLoader(config=config, record_handler=handle_record, stream_name=\"tickets\")\n",
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "223eb8bc",
+   "metadata": {},
+   "source": [
+    "## Incremental loads\n",
+    "\n",
+    "Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
+    "\n",
+    "To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7061e735",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "last_state = loader.last_state # store safely\n",
+    "\n",
+    "incremental_loader = AirbyteZendeskSupportLoader(config=config, stream_name=\"tickets\", state=last_state)\n",
+    "\n",
+    "new_docs = incremental_loader.load()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/libs/langchain/langchain/document_loaders/airbyte.py
+++ b/libs/langchain/langchain/document_loaders/airbyte.py
@ -0,0 +1,190 @@
+"""Loads local airbyte json files."""
+from typing import Any, Callable, Iterator, List, Mapping, Optional
+
+from libs.langchain.langchain.utils.utils import guard_import
+
+from langchain.docstore.document import Document
+from langchain.document_loaders.base import BaseLoader
+
+RecordHandler = Callable[[Any, Optional[str]], Document]
+
+
+class AirbyteCDKLoader(BaseLoader):
+    """Loads records using an Airbyte source connector implemented using the CDK."""
+
+    def __init__(
+        self,
+        config: Mapping[str, Any],
+        source_class: Any,
+        stream_name: str,
+        record_handler: Optional[RecordHandler] = None,
+        state: Optional[Any] = None,
+    ) -> None:
+        from airbyte_cdk.models.airbyte_protocol import AirbyteRecordMessage
+        from airbyte_cdk.sources.embedded.base_integration import (
+            BaseEmbeddedIntegration,
+        )
+        from airbyte_cdk.sources.embedded.runner import CDKRunner
+
+        class CDKIntegration(BaseEmbeddedIntegration):
+            def _handle_record(
+                self, record: AirbyteRecordMessage, id: Optional[str]
+            ) -> Document:
+                if record_handler:
+                    return record_handler(record, id)
+                return Document(page_content="", metadata=record.data)
+
+        self._integration = CDKIntegration(
+            config=config,
+            runner=CDKRunner(source=source_class(), name=source_class.__name__),
+        )
+        self._stream_name = stream_name
+        self._state = state
+
+    def load(self) -> List[Document]:
+        return list(self.lazy_load())
+
+    def lazy_load(self) -> Iterator[Document]:
+        return self._integration._load_data(
+            stream_name=self._stream_name, state=self._state
+        )
+
+
+class AirbyteHubspotLoader(AirbyteCDKLoader):
+    def __init__(
+        self,
+        config: Mapping[str, Any],
+        stream_name: str,
+        record_handler: Optional[RecordHandler] = None,
+        state: Optional[Any] = None,
+    ) -> None:
+        source_class = guard_import(
+            "source_hubspot", pip_name="airbyte-source-hubspot"
+        ).SourceHubspot
+        super().__init__(
+            config=config,
+            source_class=source_class,
+            stream_name=stream_name,
+            record_handler=record_handler,
+            state=state,
+        )
+
+
+class AirbyteStripeLoader(AirbyteCDKLoader):
+    def __init__(
+        self,
+        config: Mapping[str, Any],
+        stream_name: str,
+        record_handler: Optional[RecordHandler] = None,
+        state: Optional[Any] = None,
+    ) -> None:
+        source_class = guard_import(
+            "source_stripe", pip_name="airbyte-source-stripe"
+        ).SourceStripe
+        super().__init__(
+            config=config,
+            source_class=source_class,
+            stream_name=stream_name,
+            record_handler=record_handler,
+            state=state,
+        )
+
+
+class AirbyteTypeformLoader(AirbyteCDKLoader):
+    def __init__(
+        self,
+        config: Mapping[str, Any],
+        stream_name: str,
+        record_handler: Optional[RecordHandler] = None,
+        state: Optional[Any] = None,
+    ) -> None:
+        source_class = guard_import(
+            "source_typeform", pip_name="airbyte-source-typeform"
+        ).SourceTypeform
+        super().__init__(
+            config=config,
+            source_class=source_class,
+            stream_name=stream_name,
+            record_handler=record_handler,
+            state=state,
+        )
+
+
+class AirbyteZendeskSupportLoader(AirbyteCDKLoader):
+    def __init__(
+        self,
+        config: Mapping[str, Any],
+        stream_name: str,
+        record_handler: Optional[RecordHandler] = None,
+        state: Optional[Any] = None,
+    ) -> None:
+        source_class = guard_import(
+            "source_zendesk_support", pip_name="airbyte-source-zendesk-support"
+        ).SourceZendeskSupport
+        super().__init__(
+            config=config,
+            source_class=source_class,
+            stream_name=stream_name,
+            record_handler=record_handler,
+            state=state,
+        )
+
+
+class AirbyteShopifyLoader(AirbyteCDKLoader):
+    def __init__(
+        self,
+        config: Mapping[str, Any],
+        stream_name: str,
+        record_handler: Optional[RecordHandler] = None,
+        state: Optional[Any] = None,
+    ) -> None:
+        source_class = guard_import(
+            "source_shopify", pip_name="airbyte-source-shopify"
+        ).SourceShopify
+        super().__init__(
+            config=config,
+            source_class=source_class,
+            stream_name=stream_name,
+            record_handler=record_handler,
+            state=state,
+        )
+
+
+class AirbyteSalesforceLoader(AirbyteCDKLoader):
+    def __init__(
+        self,
+        config: Mapping[str, Any],
+        stream_name: str,
+        record_handler: Optional[RecordHandler] = None,
+        state: Optional[Any] = None,
+    ) -> None:
+        source_class = guard_import(
+            "source_salesforce", pip_name="airbyte-source-salesforce"
+        ).SourceSalesforce
+        super().__init__(
+            config=config,
+            source_class=source_class,
+            stream_name=stream_name,
+            record_handler=record_handler,
+            state=state,
+        )
+
+
+class AirbyteGongLoader(AirbyteCDKLoader):
+    def __init__(
+        self,
+        config: Mapping[str, Any],
+        stream_name: str,
+        record_handler: Optional[RecordHandler] = None,
+        state: Optional[Any] = None,
+    ) -> None:
+        source_class = guard_import(
+            "source_gong", pip_name="airbyte-source-gong"
+        ).SourceGong
+        super().__init__(
+            config=config,
+            source_class=source_class,
+            stream_name=stream_name,
+            record_handler=record_handler,
+            state=state,
+        )