📖 docs: fixed `integrations/document loaders` toc (#9281)

Fixed navbar:
- renamed several files, so ToC is sorted correctly
- made ToC items consistent: formatted several Titles
- added several links
- reformatted several docs to a consistent format
- renamed several files (removed `_example` suffix)
- added renamed files to the `docs/docs_skeleton/vercel.json`
pull/11077/head
Leonid Ganeline 1 year ago committed by GitHub
parent 0ea384d575
commit 21199cc7b4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1972,6 +1972,18 @@
"source": "/docs/modules/data_connection/document_loaders/integrations/youtube_transcript",
"destination": "/docs/integrations/document_loaders/youtube_transcript"
},
{
"source": "/docs/integrations/document_loaders/Etherscan",
"destination": "/docs/integrations/document_loaders/etherscan"
},
{
"source": "/docs/integrations/document_loaders/merge_doc_loader",
"destination": "/docs/integrations/document_loaders/merge_doc"
},
{
"source": "/docs/integrations/document_loaders/recursive_url_loader",
"destination": "/docs/integrations/document_loaders/recursive_url"
},
{
"source": "/en/latest/modules/indexes/text_splitters/examples/markdown_header_metadata.html",
"destination": "/docs/modules/data_connection/document_transformers/text_splitters/markdown_header_metadata"

@ -5,9 +5,9 @@
"id": "e229e34c",
"metadata": {},
"source": [
"# AsyncHtmlLoader\n",
"# AsyncHtml\n",
"\n",
"AsyncHtmlLoader loads raw HTML from a list of urls concurrently."
"`AsyncHtmlLoader` loads raw HTML from a list of URLs concurrently."
]
},
{
@ -99,7 +99,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -5,12 +5,17 @@
"id": "1ab83660",
"metadata": {},
"source": [
"# Etherscan Loader\n",
"# Etherscan\n",
"\n",
">[Etherscan](https://docs.etherscan.io/) is the leading blockchain explorer, search, API and analytics platform for Ethereum, \n",
"a decentralized smart contracts platform.\n",
"\n",
"\n",
"## Overview\n",
"\n",
"The Etherscan loader use etherscan api to load transaction histories under specific account on Ethereum Mainnet.\n",
"The `Etherscan` loader use `Etherscan API` to load transacactions histories under specific account on `Ethereum Mainnet`.\n",
"\n",
"You will need a Etherscan api key to proceed. The free api key has 5 calls per second quota.\n",
"You will need a `Etherscan api key` to proceed. The free api key has 5 calls per seconds quota.\n",
"\n",
"The loader supports the following six functinalities:\n",
"* Retrieve normal transactions under specific account on Ethereum Mainet\n",
@ -47,7 +52,7 @@
"id": "d72d4e22",
"metadata": {},
"source": [
"# Setup"
"## Setup"
]
},
{
@ -86,7 +91,7 @@
"id": "3bcbb63e",
"metadata": {},
"source": [
"# Create a ERC20 transaction loader"
"## Create a ERC20 transaction loader"
]
},
{
@ -136,7 +141,7 @@
"id": "2a1ecce0",
"metadata": {},
"source": [
"# Create a normal transaction loader with customized parameters"
"## Create a normal transaction loader with customized parameters"
]
},
{
@ -212,7 +217,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.2"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# MediaWikiDump\n",
"# MediaWiki Dump\n",
"\n",
">[MediaWiki XML Dumps](https://www.mediawiki.org/wiki/Manual:Importing_XML_dumps) contain the content of a wiki (wiki pages with all their revisions), without the site-related data. A XML dump does not create a full backup of the wiki database, the dump does not contain user accounts, images, edit logs, etc.\n",
"\n",
@ -122,7 +122,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -5,7 +5,7 @@
"id": "dd7c3503",
"metadata": {},
"source": [
"# MergeDocLoader\n",
"# Merge Documents Loader\n",
"\n",
"Merge the documents returned from a set of specified data loaders."
]
@ -96,7 +96,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -1,17 +1,28 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Nuclia Understanding API document loader\n",
"# Nuclia\n",
"\n",
"[Nuclia](https://nuclia.com) automatically indexes your unstructured data from any internal and external source, providing optimized search results and generative answers. It can handle video and audio transcription, image content extraction, and document parsing.\n",
">[Nuclia](https://nuclia.com) automatically indexes your unstructured data from any internal and external source, providing optimized search results and generative answers. It can handle video and audio transcription, image content extraction, and document parsing.\n",
"\n",
"The Nuclia Understanding API supports the processing of unstructured data, including text, web pages, documents, and audio/video contents. It extracts all texts wherever they are (using speech-to-text or OCR when needed), it also extracts metadata, embedded files (like images in a PDF), and web links. If machine learning is enabled, it identifies entities, provides a summary of the content and generates embeddings for all the sentences.\n",
"\n",
"To use the Nuclia Understanding API, you need to have a Nuclia account. You can create one for free at [https://nuclia.cloud](https://nuclia.cloud), and then [create a NUA key](https://docs.nuclia.dev/docs/docs/using/understanding/intro)."
">The `Nuclia Understanding API` supports the processing of unstructured data, including text, web pages, documents, and audio/video contents. It extracts all texts wherever they are (using speech-to-text or OCR when needed), it also extracts metadata, embedded files (like images in a PDF), and web links. If machine learning is enabled, it identifies entities, provides a summary of the content and generates embeddings for all the sentences.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To use the `Nuclia Understanding API`, you need to have a Nuclia account. You can create one for free at [https://nuclia.cloud](https://nuclia.cloud), and then [create a NUA key](https://docs.nuclia.dev/docs/docs/using/understanding/intro)."
]
},
{
@ -37,10 +48,11 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example\n",
"\n",
"To use the Nuclia document loader, you need to instantiate a `NucliaUnderstandingAPI` tool:"
]
},
@ -67,7 +79,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -95,7 +106,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -121,7 +131,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "langchain",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -135,10 +145,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
},
"orig_nbformat": 4
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

@ -1,11 +1,10 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# PySpark DataFrame Loader\n",
"# PySpark\n",
"\n",
"This notebook goes over how to load data from a [PySpark](https://spark.apache.org/docs/latest/api/python/) DataFrame."
]
@ -147,9 +146,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

@ -5,7 +5,7 @@
"id": "5a7cc773",
"metadata": {},
"source": [
"# Recursive URL Loader\n",
"# Recursive URL\n",
"\n",
"We may want to process load all URLs under a root directory.\n",
"\n",
@ -170,7 +170,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -1,16 +1,15 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "e48afb8d",
"metadata": {},
"source": [
"# Loading documents from a YouTube url\n",
"# YouTube audio\n",
"\n",
"Building chat or QA applications on YouTube videos is a topic of high interest.\n",
"\n",
"Below we show how to easily go from a YouTube url to text to chat!\n",
"Below we show how to easily go from a `YouTube url` to `audio of the video` to `text` to `chat`!\n",
"\n",
"We wil use the `OpenAIWhisperParser`, which will use the OpenAI Whisper API to transcribe audio to text, \n",
"and the `OpenAIWhisperParserLocal` for local support and running on private clouds or on premise.\n",
@ -82,9 +81,7 @@
"cell_type": "code",
"execution_count": 2,
"id": "23e1e134",
"metadata": {
"scrolled": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
@ -128,9 +125,7 @@
"cell_type": "code",
"execution_count": 3,
"id": "72a94fd8",
"metadata": {
"scrolled": false
},
"metadata": {},
"outputs": [
{
"data": {
@ -293,7 +288,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -307,7 +302,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.10.12"
},
"vscode": {
"interpreter": {

Loading…
Cancel
Save