docs: `document_loaders` improvements (#4200)

- made notebooks consistent: titles, service/format descriptions.
- corrected short names to full names, for example, `Word` -> `Microsoft
Word`
- added missed descriptions
- renamed notebook files to make ToC correctly sorted
parallel_dir_loader
Leonid Ganeline 1 year ago committed by GitHub
parent eeb7c96e0c
commit 59204a5033
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -7,7 +7,9 @@
"source": [
"# Bilibili\n",
"\n",
"This loader utilizes the [bilibili-api](https://github.com/MoyuScript/bilibili-api) to fetch the text transcript from [Bilibili](https://www.bilibili.tv/), one of the most beloved long-form video sites in China.\n",
">[Bilibili](https://www.bilibili.tv/) is one of the most beloved long-form video sites in China.\n",
"\n",
"This loader utilizes the [bilibili-api](https://github.com/MoyuScript/bilibili-api) to fetch the text transcript from `Bilibili`.\n",
"\n",
"With this BiliBiliLoader, users can easily obtain the transcript of their desired video content on the platform."
]

@ -6,6 +6,8 @@
"source": [
"# Blackboard\n",
"\n",
">[Blackboard Learn](https://en.wikipedia.org/wiki/Blackboard_Learn) (previously the Blackboard Learning Management System) is a web-based virtual learning environment and learning management system developed by Blackboard Inc. The software features course management, customizable open architecture, and scalable design that allows integration with student information systems and authentication protocols. It may be installed on local servers, hosted by `Blackboard ASP Solutions`, or provided as Software as a Service hosted on Amazon Web Services. Its main purposes are stated to include the addition of online elements to courses traditionally delivered face-to-face and development of completely online courses with few or no face-to-face meetings\n",
"\n",
"This covers how to load data from a [Blackboard Learn](https://www.anthology.com/products/teaching-and-learning/learning-effectiveness/blackboard-learn) instance.\n",
"\n",
"This loader is not compatible with all `Blackboard` courses. It is only\n",

@ -4,7 +4,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### ChatGPT Data Loader\n",
"### ChatGPT Data\n",
"\n",
">[ChatGPT](https://chat.openai.com) is an artificial intelligence (AI) chatbot developed by OpenAI.\n",
"\n",
"\n",
"This notebook covers how to load `conversations.json` from your `ChatGPT` data export folder.\n",
"\n",

@ -6,7 +6,9 @@
"source": [
"# Confluence\n",
"\n",
"A loader for [Confluence](https://www.atlassian.com/software/confluence) pages.\n",
">[Confluence](https://www.atlassian.com/software/confluence) is a wiki collaboration platform that saves and organizes all of the project-related material. `Confluence` is a knowledge base that primarily handles content management activities. \n",
"\n",
"A loader for `Confluence` pages.\n",
"\n",
"\n",
"This currently supports both `username/api_key` and `Oauth2 login`.\n",

@ -6,6 +6,12 @@
"metadata": {},
"source": [
"# CoNLL-U\n",
"\n",
">[CoNLL-U](https://universaldependencies.org/format.html) is revised version of the CoNLL-X format. Annotations are encoded in plain text files (UTF-8, normalized to NFC, using only the LF character as line break, including an LF character at the end of file) with three types of lines:\n",
">- Word lines containing the annotation of a word/token in 10 fields separated by single tab characters; see below.\n",
">- Blank lines marking sentence boundaries.\n",
">- Comment lines starting with hash (#).\n",
"\n",
"This is an example of how to load a file in [CoNLL-U](https://universaldependencies.org/format.html) format. The whole file is treated as one document. The example data (`conllu.conllu`) is based on one of the standard UD/CoNLL-U examples."
]
},

@ -4,7 +4,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# CSV Files\n",
"# CSV\n",
"\n",
">A [comma-separated values (CSV)](https://en.wikipedia.org/wiki/Comma-separated_values) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas.\n",
"\n",
"Load [csv](https://en.wikipedia.org/wiki/Comma-separated_values) data with a single row per document."
]

@ -6,7 +6,9 @@
"source": [
"# Discord\n",
"\n",
"You can follow the below steps to download your Discord data:\n",
">[Discord](https://discord.com/) is a VoIP and instant messaging social platform. Users have the ability to communicate with voice calls, video calls, text messaging, media and files in private chats or as part of communities called \"servers\". A server is a collection of persistent chat rooms and voice channels which can be accessed via invite links.\n",
"\n",
"Follow these steps to download your `Discord` data:\n",
"\n",
"1. Go to your **User Settings**\n",
"2. Then go to **Privacy and Safety**\n",
@ -79,9 +81,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

@ -7,9 +7,21 @@
"source": [
"# EPub \n",
"\n",
">[EPUB](https://en.wikipedia.org/wiki/EPUB) is an e-book file format that uses the \".epub\" file extension. The term is short for electronic publication and is sometimes styled ePub. `EPUB` is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers.\n",
"\n",
"This covers how to load `.epub` documents into the Document format that we can use downstream. You'll need to install the [`pandocs`](https://pandoc.org/installing.html) package for this loader to work."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cd1affad-8ba6-43b1-b8cd-f61f44025077",
"metadata": {},
"outputs": [],
"source": [
"#!pip install pandocs"
]
},
{
"cell_type": "code",
"execution_count": 1,

@ -6,6 +6,8 @@
"source": [
"### Facebook Chat\n",
"\n",
">[Messenger](https://en.wikipedia.org/wiki/Messenger_(software)) is an American proprietary instant messaging app and platform developed by `Meta Platforms`. Originally developed as `Facebook Chat` in 2008, the company revamped its messaging service in 2010.\n",
"\n",
"This notebook covers how to load data from the [Facebook Chats](https://www.facebook.com/business/help/1646890868956360) into a format that can be ingested into LangChain."
]
},

@ -5,8 +5,9 @@
"id": "79f24a6b",
"metadata": {},
"source": [
"# Directory Loader\n",
"This covers how to use the DirectoryLoader to load all documents in a directory. Under the hood, by default this uses the [UnstructuredLoader](./unstructured_file.ipynb)"
"# File Directory\n",
"\n",
"This covers how to use the `DirectoryLoader` to load all documents in a directory. Under the hood, by default this uses the [UnstructuredLoader](./unstructured_file.ipynb)"
]
},
{
@ -255,7 +256,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -4,9 +4,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# BigQuery\n",
"# Google BigQuery\n",
"\n",
">[BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.\n",
">[Google BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.\n",
"`BigQuery` is a part of the `Google Cloud Platform`.\n",
"\n",
"Load a `BigQuery` query with one document per row."

@ -5,7 +5,7 @@
"id": "0ef41fd4",
"metadata": {},
"source": [
"# GCS Directory\n",
"# Google Cloud Storage Directory\n",
"\n",
">[Google Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage) is a managed service for storing unstructured data.\n",
"\n",

@ -5,7 +5,7 @@
"id": "0ef41fd4",
"metadata": {},
"source": [
"# GCS File Storage\n",
"# Google Cloud Storage File\n",
"\n",
">[Google Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage) is a managed service for storing unstructured data.\n",
"\n",

@ -6,6 +6,9 @@
"metadata": {},
"source": [
"# Google Drive\n",
"\n",
">[Google Drive](https://en.wikipedia.org/wiki/Google_Drive) is a file storage and synchronization service developed by Google.\n",
"\n",
"This notebook covers how to load documents from `Google Drive`. Currently, only `Google Docs` are supported.\n",
"\n",
"## Prerequisites\n",

@ -7,7 +7,7 @@
"source": [
"# Hacker News\n",
"\n",
">[Hacker News](https://en.wikipedia.org/wiki/Hacker_News) (sometimes abbreviated as HN) is a social news website focusing on computer science and entrepreneurship. It is run by the investment fund and startup incubator Y Combinator. In general, content that can be submitted is defined as \"anything that gratifies one's intellectual curiosity.\"\n",
">[Hacker News](https://en.wikipedia.org/wiki/Hacker_News) (sometimes abbreviated as `HN`) is a social news website focusing on computer science and entrepreneurship. It is run by the investment fund and startup incubator `Y Combinator`. In general, content that can be submitted is defined as \"anything that gratifies one's intellectual curiosity.\"\n",
"\n",
"This notebook covers how to pull page data and comments from [Hacker News](https://news.ycombinator.com/)"
]

@ -7,6 +7,8 @@
"source": [
"# HTML\n",
"\n",
">[The HyperText Markup Language or HTML](https://en.wikipedia.org/wiki/HTML) is the standard markup language for documents designed to be displayed in a web browser.\n",
"\n",
"This covers how to load `HTML` documents into a document format that we can use downstream."
]
},

@ -5,12 +5,11 @@
"id": "04c9fdc5",
"metadata": {},
"source": [
"# HuggingFace dataset \n",
"# HuggingFace dataset\n",
"\n",
"The [Hugging Face Hub](https://huggingface.co/docs/hub/index) hosts a large number of community-curated datasets for a diverse range of tasks such as translation,\n",
">The [Hugging Face Hub](https://huggingface.co/docs/hub/index) is home to over 5,000 [datasets](https://huggingface.co/docs/hub/index#datasets) in more than 100 languages that can be used for a broad range of tasks across NLP, Computer Vision, and Audio. They used for a diverse range of tasks such as translation,\n",
"automatic speech recognition, and image classification.\n",
"\n",
">The `Hugging Face Hub` is home to over 5,000 [datasets](https://huggingface.co/docs/hub/index#datasets) in more than 100 languages that can be used for a broad range of tasks across NLP, Computer Vision, and Audio.\n",
"\n",
"This notebook shows how to load `Hugging Face Hub` datasets to LangChain."
]

@ -6,7 +6,7 @@
"source": [
"# iFixit\n",
"\n",
"[iFixit](https://www.ifixit.com) is the largest, open repair community on the web. The site contains nearly 100k repair manuals, 200k Questions & Answers on 42k devices, and all the data is licensed under CC-BY-NC-SA 3.0.\n",
">[iFixit](https://www.ifixit.com) is the largest, open repair community on the web. The site contains nearly 100k repair manuals, 200k Questions & Answers on 42k devices, and all the data is licensed under CC-BY-NC-SA 3.0.\n",
"\n",
"This loader will allow you to download the text of a repair guide, text of Q&A's and wikis from devices on `iFixit` using their open APIs. It's incredibly useful for context related to technical documents and answers to questions about devices in the corpus of data on `iFixit`."
]

@ -7,7 +7,7 @@
"source": [
"# Images\n",
"\n",
"This covers how to load images such as JPGs PNGs into a document format that we can use downstream."
"This covers how to load images such as `JPG` or `PNG` into a document format that we can use downstream."
]
},
{

@ -10,7 +10,7 @@
"By default, the loader utilizes the pre-trained [Salesforce BLIP image captioning model](https://huggingface.co/Salesforce/blip-image-captioning-base).\n",
"\n",
"\n",
"This notebook shows how to use the ImageCaptionLoader tutorial to generate a query-able index of image captions"
"This notebook shows how to use the `ImageCaptionLoader` to generate a query-able index of image captions"
]
},
{

@ -7,7 +7,7 @@
"source": [
"# IMSDb\n",
"\n",
"[IMSDb](https://imsdb.com/) is the `Internet Movie Script Database`.\n",
">[IMSDb](https://imsdb.com/) is the `Internet Movie Script Database`.\n",
"\n",
"This covers how to load `IMSDb` webpages into a document format that we can use downstream."
]

@ -4,7 +4,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Notebook\n",
"# Jupyter Notebook\n",
"\n",
">[Jupyter Notebook](https://en.wikipedia.org/wiki/Project_Jupyter#Applications) (formerly `IPython Notebook`) is a web-based interactive computational environment for creating notebook documents.\n",
"\n",
"This notebook covers how to load data from a `Jupyter notebook (.ipynb)` into a format suitable by LangChain."
]

@ -6,9 +6,11 @@
"source": [
"# MediaWikiDump\n",
"\n",
">[MediaWiki XML Dumps](https://www.mediawiki.org/wiki/Manual:Importing_XML_dumps) contain the content of a wiki (wiki pages with all their revisions), without the site-related data. A XML dump does not create a full backup of the wiki database, the dump does not contain user accounts, images, edit logs, etc.\n",
"\n",
"This covers how to load a MediaWiki XML dump file into a document format that we can use downstream.\n",
"\n",
"It uses mwxml from mediawiki-utilities to dump and mwparserfromhell from earwig to parse MediaWiki wikicode.\n",
"It uses `mwxml` from `mediawiki-utilities` to dump and `mwparserfromhell` from `earwig` to parse MediaWiki wikicode.\n",
"\n",
"Dump files can be obtained with dumpBackup.php or on the Special:Statistics page of the Wiki."
]
@ -114,9 +116,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
"nbformat_minor": 4
}

@ -1,11 +1,13 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# OneDrive\n",
"# Microsoft OneDrive\n",
"\n",
">[Microsoft OneDrive](https://en.wikipedia.org/wiki/OneDrive) (formerly `SkyDrive`) is a file hosting service operated by Microsoft.\n",
"\n",
"This notebook covers how to load documents from `OneDrive`. Currently, only docx, doc, and pdf files are supported.\n",
"\n",
"## Prerequisites\n",
@ -77,14 +79,34 @@
"documents = loader.load()\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"language_info": {
"name": "python"
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"orig_nbformat": 4
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

@ -1,12 +1,13 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "39af9ecd",
"metadata": {},
"source": [
"# PowerPoint\n",
"# Microsoft PowerPoint\n",
"\n",
">[Microsoft PowerPoint](https://en.wikipedia.org/wiki/Microsoft_PowerPoint) is a presentation program by Microsoft.\n",
"\n",
"This covers how to load `Microsoft PowerPoint` documents into a document format that we can use downstream."
]

@ -5,9 +5,11 @@
"id": "39af9ecd",
"metadata": {},
"source": [
"# Word Documents\n",
"# Microsoft Word\n",
"\n",
"This covers how to load Word documents into a document format that we can use downstream."
">[Microsoft Word](https://www.microsoft.com/en-us/microsoft-365/word) is a word processor developed by Microsoft.\n",
"\n",
"This covers how to load `Word` documents into a document format that we can use downstream."
]
},
{
@ -198,7 +200,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -6,8 +6,7 @@
"source": [
"# Modern Treasury\n",
"\n",
">[Modern Treasury](https://www.moderntreasury.com/) simplifies complex payment operations\n",
"A unified platform to power products and processes that move money.\n",
">[Modern Treasury](https://www.moderntreasury.com/) simplifies complex payment operations. It is a unified platform to power products and processes that move money.\n",
">- Connect to banks and payment systems\n",
">- Track transactions and balances in real-time\n",
">- Automate payment operations for scale\n",

@ -7,7 +7,9 @@
"source": [
"# PDF\n",
"\n",
"This covers how to load PDF documents into the Document format that we use downstream."
">[Portable Document Format (PDF)](https://en.wikipedia.org/wiki/PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.\n",
"\n",
"This covers how to load `PDF` documents into the Document format that we use downstream."
]
},
{

@ -6,7 +6,7 @@
"source": [
"# Reddit\n",
"\n",
">[Reddit (reddit)](\twww.reddit.com) is an American social news aggregation, content rating, and discussion website.\n",
">[Reddit (reddit)](www.reddit.com) is an American social news aggregation, content rating, and discussion website.\n",
"\n",
"\n",
"This loader fetches the text from the Posts of Subreddits or Reddit users, using the `praw` Python package.\n",

@ -6,9 +6,9 @@
"source": [
"# Sitemap\n",
"\n",
"Extends from the `WebBaseLoader`, this will load a sitemap from a given URL, and then scrape and load all pages in the sitemap, returning each page as a Document.\n",
"Extends from the `WebBaseLoader`, `SitemapLoader` loads a sitemap from a given URL, and then scrape and load all pages in the sitemap, returning each page as a Document.\n",
"\n",
"The scraping is done concurrently, using `WebBaseLoader`. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the server you are scraping and don't care about load, you can change the `requests_per_second` parameter to increase the max concurrent requests. Note, while this will speed up the scraping process, but may cause the server to block you. Be careful!"
"The scraping is done concurrently. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the scrapped server, or don't care about load, you can change the `requests_per_second` parameter to increase the max concurrent requests. Note, while this will speed up the scraping process, but it may cause the server to block you. Be careful!"
]
},
{

@ -5,9 +5,9 @@
"id": "1dc7df1d",
"metadata": {},
"source": [
"# Slack (Local Exported Zipfile)\n",
"# Slack\n",
"\n",
">[Slack](slack.com) is an instant messaging program.\n",
">[Slack](https://slack.com/) is an instant messaging program.\n",
"\n",
"This notebook covers how to load documents from a Zipfile generated from a `Slack` export.\n",
"\n",

@ -6,7 +6,9 @@
"source": [
"# Stripe\n",
"\n",
"This notebook covers how to load data from the Stripe REST API into a format that can be ingested into LangChain, along with example usage for vectorization."
">[Stripe](https://stripe.com/en-ca) is an Irish-American financial services and software as a service (SaaS) company. It offers payment-processing software and application programming interfaces for e-commerce websites and mobile applications.\n",
"\n",
"This notebook covers how to load data from the `Stripe REST API` into a format that can be ingested into LangChain, along with example usage for vectorization."
]
},
{
@ -84,9 +86,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

@ -5,7 +5,7 @@
"id": "4bdaea79",
"metadata": {},
"source": [
"# Subtitle Files\n",
"# Subtitle\n",
"\n",
">[The SubRip file format](https://en.wikipedia.org/wiki/SubRip#SubRip_file_format) is described on the `Matroska` multimedia container format website as \"perhaps the most basic of all subtitle formats.\" `SubRip (SubRip Text)` files are named with the extension `.srt`, and contain formatted lines of plain text in groups separated by a blank line. Subtitles are numbered sequentially, starting at 1. The timecode format used is hours:minutes:seconds,milliseconds with time units fixed to two zero-padded digits and fractions fixed to three zero-padded digits (00:00:00,000). The fractional separator used is the comma, since the program was written in France.\n",
"\n",

@ -7,7 +7,9 @@
"source": [
"# Telegram\n",
"\n",
"This notebook covers how to load data from Telegram into a format that can be ingested into LangChain."
">[Telegram Messenger](https://web.telegram.org/a/) is a globally accessible freemium, cross-platform, encrypted, cloud-based and centralized instant messaging service. The application also provides optional end-to-end encrypted chats and video calling, VoIP, file sharing and several other features.\n",
"\n",
"This notebook covers how to load data from `Telegram` into a format that can be ingested into LangChain."
]
},
{
@ -76,7 +78,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -5,9 +5,11 @@
"id": "4284970b",
"metadata": {},
"source": [
"# TOML Loader\n",
"# TOML\n",
"\n",
"If you need to load Toml files, use the `TomlLoader`."
">[TOML](https://en.wikipedia.org/wiki/TOML) is a file format for configuration files. It is intended to be easy to read and write, and is designed to map unambiguously to a dictionary. Its specification is open-source. `TOML` is implemented in many programming languages. The name `TOML` is an acronym for \"Tom's Obvious, Minimal Language\" referring to its creator, Tom Preston-Werner.\n",
"\n",
"If you need to load `Toml` files, use the `TomlLoader`."
]
},
{
@ -86,7 +88,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -7,8 +7,10 @@
"source": [
"# Twitter\n",
"\n",
"This loader fetches the text from the Tweets of a list of Twitter users, using the `tweepy` Python package.\n",
"You must initialize the loader with your Twitter API token, and you need to pass in the Twitter username you want to extract."
">[Twitter](https://twitter.com/) is an online social media and social networking service.\n",
"\n",
"This loader fetches the text from the Tweets of a list of `Twitter` users, using the `tweepy` Python package.\n",
"You must initialize the loader with your `Twitter API` token, and you need to pass in the Twitter username you want to extract."
]
},
{
@ -106,7 +108,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -5,8 +5,9 @@
"id": "20deed05",
"metadata": {},
"source": [
"# Unstructured File Loader\n",
"This notebook covers how to use Unstructured to load files of many types. Unstructured currently supports loading of text files, powerpoints, html, pdfs, images, and more."
"# Unstructured File\n",
"\n",
"This notebook covers how to use `Unstructured` package to load files of many types. `Unstructured` currently supports loading of text files, powerpoints, html, pdfs, images, and more."
]
},
{
@ -311,7 +312,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -5,9 +5,9 @@
"id": "bf920da0",
"metadata": {},
"source": [
"# Web Base\n",
"# WebBaseLoader\n",
"\n",
"This covers how to load all text from webpages into a document format that we can use downstream. For more custom logic for loading webpages look at some child class examples such as IMSDbLoader, AZLyricsLoader, and CollegeConfidentialLoader"
"This covers how to use `WebBaseLoader` to load all text from `HTML` webpages into a document format that we can use downstream. For more custom logic for loading webpages look at some child class examples such as `IMSDbLoader`, `AZLyricsLoader`, and `CollegeConfidentialLoader`"
]
},
{
@ -140,7 +140,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: nest_asyncio in /Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages (1.5.6)\r\n"
"Requirement already satisfied: nest_asyncio in /Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages (1.5.6)\n"
]
}
],
@ -237,7 +237,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -1,13 +1,14 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### WhatsApp Chat\n",
"\n",
"This notebook covers how to load data from the WhatsApp Chats into a format that can be ingested into LangChain."
">[WhatsApp](https://www.whatsapp.com/) (also called `WhatsApp Messenger`) is a freeware, cross-platform, centralized instant messaging (IM) and voice-over-IP (VoIP) service. It allows users to send text and voice messages, make voice and video calls, and share images, documents, user locations, and other content.\n",
"\n",
"This notebook covers how to load data from the `WhatsApp Chats` into a format that can be ingested into LangChain."
]
},
{
@ -54,7 +55,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
"version": "3.10.6"
},
"vscode": {
"interpreter": {
@ -63,5 +64,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

@ -5,10 +5,11 @@
"id": "df770c72",
"metadata": {},
"source": [
"# YouTube\n",
"# YouTube transcripts\n",
"\n",
"How to load documents from YouTube transcripts.\n",
"\n"
">[YouTube](https://www.youtube.com/) is an online video sharing and social media platform created by Google.\n",
"\n",
"This notebook covers how to load documents from `YouTube transcripts`."
]
},
{
@ -156,7 +157,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
},
"vscode": {
"interpreter": {
Loading…
Cancel
Save