diff --git a/docs/docs_skeleton/vercel.json b/docs/docs_skeleton/vercel.json index 65da50e854..b0b4154773 100644 --- a/docs/docs_skeleton/vercel.json +++ b/docs/docs_skeleton/vercel.json @@ -1972,6 +1972,18 @@ "source": "/docs/modules/data_connection/document_loaders/integrations/youtube_transcript", "destination": "/docs/integrations/document_loaders/youtube_transcript" }, + { + "source": "/docs/integrations/document_loaders/Etherscan", + "destination": "/docs/integrations/document_loaders/etherscan" + }, + { + "source": "/docs/integrations/document_loaders/merge_doc_loader", + "destination": "/docs/integrations/document_loaders/merge_doc" + }, + { + "source": "/docs/integrations/document_loaders/recursive_url_loader", + "destination": "/docs/integrations/document_loaders/recursive_url" + }, { "source": "/en/latest/modules/indexes/text_splitters/examples/markdown_header_metadata.html", "destination": "/docs/modules/data_connection/document_transformers/text_splitters/markdown_header_metadata" diff --git a/docs/extras/integrations/document_loaders/async_html.ipynb b/docs/extras/integrations/document_loaders/async_html.ipynb index 64cced79ad..8a9786a08f 100644 --- a/docs/extras/integrations/document_loaders/async_html.ipynb +++ b/docs/extras/integrations/document_loaders/async_html.ipynb @@ -5,9 +5,9 @@ "id": "e229e34c", "metadata": {}, "source": [ - "# AsyncHtmlLoader\n", + "# AsyncHtml\n", "\n", - "AsyncHtmlLoader loads raw HTML from a list of urls concurrently." + "`AsyncHtmlLoader` loads raw HTML from a list of URLs concurrently." ] }, { @@ -99,7 +99,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.16" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/extras/integrations/document_loaders/Etherscan.ipynb b/docs/extras/integrations/document_loaders/etherscan.ipynb similarity index 97% rename from docs/extras/integrations/document_loaders/Etherscan.ipynb rename to docs/extras/integrations/document_loaders/etherscan.ipynb index 120d1db9ed..5ccffce0aa 100644 --- a/docs/extras/integrations/document_loaders/Etherscan.ipynb +++ b/docs/extras/integrations/document_loaders/etherscan.ipynb @@ -5,12 +5,17 @@ "id": "1ab83660", "metadata": {}, "source": [ - "# Etherscan Loader\n", + "# Etherscan\n", + "\n", + ">[Etherscan](https://docs.etherscan.io/) is the leading blockchain explorer, search, API and analytics platform for Ethereum, \n", + "a decentralized smart contracts platform.\n", + "\n", + "\n", "## Overview\n", "\n", - "The Etherscan loader use etherscan api to load transaction histories under specific account on Ethereum Mainnet.\n", + "The `Etherscan` loader use `Etherscan API` to load transacactions histories under specific account on `Ethereum Mainnet`.\n", "\n", - "You will need a Etherscan api key to proceed. The free api key has 5 calls per second quota.\n", + "You will need a `Etherscan api key` to proceed. The free api key has 5 calls per seconds quota.\n", "\n", "The loader supports the following six functinalities:\n", "* Retrieve normal transactions under specific account on Ethereum Mainet\n", @@ -47,7 +52,7 @@ "id": "d72d4e22", "metadata": {}, "source": [ - "# Setup" + "## Setup" ] }, { @@ -86,7 +91,7 @@ "id": "3bcbb63e", "metadata": {}, "source": [ - "# Create a ERC20 transaction loader" + "## Create a ERC20 transaction loader" ] }, { @@ -136,7 +141,7 @@ "id": "2a1ecce0", "metadata": {}, "source": [ - "# Create a normal transaction loader with customized parameters" + "## Create a normal transaction loader with customized parameters" ] }, { @@ -212,7 +217,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.2" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/extras/integrations/document_loaders/mediawikidump.ipynb b/docs/extras/integrations/document_loaders/mediawikidump.ipynb index 8b2b5d00fd..db13a4e811 100644 --- a/docs/extras/integrations/document_loaders/mediawikidump.ipynb +++ b/docs/extras/integrations/document_loaders/mediawikidump.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# MediaWikiDump\n", + "# MediaWiki Dump\n", "\n", ">[MediaWiki XML Dumps](https://www.mediawiki.org/wiki/Manual:Importing_XML_dumps) contain the content of a wiki (wiki pages with all their revisions), without the site-related data. A XML dump does not create a full backup of the wiki database, the dump does not contain user accounts, images, edit logs, etc.\n", "\n", @@ -122,7 +122,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.6" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/extras/integrations/document_loaders/merge_doc_loader.ipynb b/docs/extras/integrations/document_loaders/merge_doc.ipynb similarity index 97% rename from docs/extras/integrations/document_loaders/merge_doc_loader.ipynb rename to docs/extras/integrations/document_loaders/merge_doc.ipynb index 5270400ef4..2cf0d55d72 100644 --- a/docs/extras/integrations/document_loaders/merge_doc_loader.ipynb +++ b/docs/extras/integrations/document_loaders/merge_doc.ipynb @@ -5,7 +5,7 @@ "id": "dd7c3503", "metadata": {}, "source": [ - "# MergeDocLoader\n", + "# Merge Documents Loader\n", "\n", "Merge the documents returned from a set of specified data loaders." ] @@ -96,7 +96,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.16" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/extras/integrations/document_loaders/nuclia.ipynb b/docs/extras/integrations/document_loaders/nuclia.ipynb index 42daa75990..b1c3c818da 100644 --- a/docs/extras/integrations/document_loaders/nuclia.ipynb +++ b/docs/extras/integrations/document_loaders/nuclia.ipynb @@ -1,17 +1,28 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "# Nuclia Understanding API document loader\n", + "# Nuclia\n", "\n", - "[Nuclia](https://nuclia.com) automatically indexes your unstructured data from any internal and external source, providing optimized search results and generative answers. It can handle video and audio transcription, image content extraction, and document parsing.\n", + ">[Nuclia](https://nuclia.com) automatically indexes your unstructured data from any internal and external source, providing optimized search results and generative answers. It can handle video and audio transcription, image content extraction, and document parsing.\n", "\n", - "The Nuclia Understanding API supports the processing of unstructured data, including text, web pages, documents, and audio/video contents. It extracts all texts wherever they are (using speech-to-text or OCR when needed), it also extracts metadata, embedded files (like images in a PDF), and web links. If machine learning is enabled, it identifies entities, provides a summary of the content and generates embeddings for all the sentences.\n", - "\n", - "To use the Nuclia Understanding API, you need to have a Nuclia account. You can create one for free at [https://nuclia.cloud](https://nuclia.cloud), and then [create a NUA key](https://docs.nuclia.dev/docs/docs/using/understanding/intro)." + ">The `Nuclia Understanding API` supports the processing of unstructured data, including text, web pages, documents, and audio/video contents. It extracts all texts wherever they are (using speech-to-text or OCR when needed), it also extracts metadata, embedded files (like images in a PDF), and web links. If machine learning is enabled, it identifies entities, provides a summary of the content and generates embeddings for all the sentences.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To use the `Nuclia Understanding API`, you need to have a Nuclia account. You can create one for free at [https://nuclia.cloud](https://nuclia.cloud), and then [create a NUA key](https://docs.nuclia.dev/docs/docs/using/understanding/intro)." ] }, { @@ -37,10 +48,11 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ + "## Example\n", + "\n", "To use the Nuclia document loader, you need to instantiate a `NucliaUnderstandingAPI` tool:" ] }, @@ -67,7 +79,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -95,7 +106,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -121,7 +131,7 @@ ], "metadata": { "kernelspec": { - "display_name": "langchain", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -135,10 +145,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.5" - }, - "orig_nbformat": 4 + "version": "3.10.12" + } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/extras/integrations/document_loaders/pyspark_dataframe.ipynb b/docs/extras/integrations/document_loaders/pyspark_dataframe.ipynb index 7f3b6fb303..46b3c60692 100644 --- a/docs/extras/integrations/document_loaders/pyspark_dataframe.ipynb +++ b/docs/extras/integrations/document_loaders/pyspark_dataframe.ipynb @@ -1,11 +1,10 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "# PySpark DataFrame Loader\n", + "# PySpark\n", "\n", "This notebook goes over how to load data from a [PySpark](https://spark.apache.org/docs/latest/api/python/) DataFrame." ] @@ -147,9 +146,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.9" + "version": "3.10.12" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/extras/integrations/document_loaders/recursive_url_loader.ipynb b/docs/extras/integrations/document_loaders/recursive_url.ipynb similarity index 98% rename from docs/extras/integrations/document_loaders/recursive_url_loader.ipynb rename to docs/extras/integrations/document_loaders/recursive_url.ipynb index b76d27a75a..f8ac5f523e 100644 --- a/docs/extras/integrations/document_loaders/recursive_url_loader.ipynb +++ b/docs/extras/integrations/document_loaders/recursive_url.ipynb @@ -5,7 +5,7 @@ "id": "5a7cc773", "metadata": {}, "source": [ - "# Recursive URL Loader\n", + "# Recursive URL\n", "\n", "We may want to process load all URLs under a root directory.\n", "\n", @@ -170,7 +170,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.16" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/extras/integrations/document_loaders/youtube_audio.ipynb b/docs/extras/integrations/document_loaders/youtube_audio.ipynb index 42b0d063f5..4b96a32c75 100644 --- a/docs/extras/integrations/document_loaders/youtube_audio.ipynb +++ b/docs/extras/integrations/document_loaders/youtube_audio.ipynb @@ -1,16 +1,15 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "id": "e48afb8d", "metadata": {}, "source": [ - "# Loading documents from a YouTube url\n", + "# YouTube audio\n", "\n", "Building chat or QA applications on YouTube videos is a topic of high interest.\n", "\n", - "Below we show how to easily go from a YouTube url to text to chat!\n", + "Below we show how to easily go from a `YouTube url` to `audio of the video` to `text` to `chat`!\n", "\n", "We wil use the `OpenAIWhisperParser`, which will use the OpenAI Whisper API to transcribe audio to text, \n", "and the `OpenAIWhisperParserLocal` for local support and running on private clouds or on premise.\n", @@ -82,9 +81,7 @@ "cell_type": "code", "execution_count": 2, "id": "23e1e134", - "metadata": { - "scrolled": false - }, + "metadata": {}, "outputs": [ { "name": "stdout", @@ -128,9 +125,7 @@ "cell_type": "code", "execution_count": 3, "id": "72a94fd8", - "metadata": { - "scrolled": false - }, + "metadata": {}, "outputs": [ { "data": { @@ -293,7 +288,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -307,7 +302,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.11" + "version": "3.10.12" }, "vscode": { "interpreter": {