langchain/docs/modules/document_loaders/examples/airbyte_json.ipynb
Francisco Ingham 0b6aa6a024
Added initial capital letter to bullet points that had it missing (#1000)
Co-authored-by: Francisco Ingham <>
2023-02-11 20:31:34 -08:00

172 lines
3.8 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "1f3a5ebf",
"metadata": {},
"source": [
"# Airbyte JSON\n",
"This covers how to load any source from Airbyte into a local JSON file that can be read in as a document\n",
"\n",
"Prereqs:\n",
"Have docker desktop installed\n",
"\n",
"Steps:\n",
"\n",
"1) Clone Airbyte from GitHub - `git clone https://github.com/airbytehq/airbyte.git`\n",
"\n",
"2) Switch into Airbyte directory - `cd airbyte`\n",
"\n",
"3) Start Airbyte - `docker compose up`\n",
"\n",
"4) In your browser, just visit http://localhost:8000. You will be asked for a username and password. By default, that's username `airbyte` and password `password`.\n",
"\n",
"5) Setup any source you wish.\n",
"\n",
"6) Set destination as Local JSON, with specified destination path - lets say `/json_data`. Set up manual sync.\n",
"\n",
"7) Run the connection!\n",
"\n",
"7) To see what files are create, you can navigate to: `file:///tmp/airbyte_local`\n",
"\n",
"8) Find your data and copy path. That path should be saved in the file variable below. It should start with `/tmp/airbyte_local`\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "180c8b74",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import AirbyteJSONLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4af10665",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"_airbyte_raw_pokemon.jsonl\r\n"
]
}
],
"source": [
"!ls /tmp/airbyte_local/json_data/"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "721d9316",
"metadata": {},
"outputs": [],
"source": [
"loader = AirbyteJSONLoader('/tmp/airbyte_local/json_data/_airbyte_raw_pokemon.jsonl')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "9858b946",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "fca024cb",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"abilities: \n",
"ability: \n",
"name: blaze\n",
"url: https://pokeapi.co/api/v2/ability/66/\n",
"\n",
"is_hidden: False\n",
"slot: 1\n",
"\n",
"\n",
"ability: \n",
"name: solar-power\n",
"url: https://pokeapi.co/api/v2/ability/94/\n",
"\n",
"is_hidden: True\n",
"slot: 3\n",
"\n",
"base_experience: 267\n",
"forms: \n",
"name: charizard\n",
"url: https://pokeapi.co/api/v2/pokemon-form/6/\n",
"\n",
"game_indices: \n",
"game_index: 180\n",
"version: \n",
"name: red\n",
"url: https://pokeapi.co/api/v2/version/1/\n",
"\n",
"\n",
"\n",
"game_index: 180\n",
"version: \n",
"name: blue\n",
"url: https://pokeapi.co/api/v2/version/2/\n",
"\n",
"\n",
"\n",
"game_index: 180\n",
"version: \n",
"n\n"
]
}
],
"source": [
"print(data[0].page_content[:500])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9fa002a5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}