diff --git a/docs/modules/indexes/document_loaders/examples/s3_directory.ipynb b/docs/modules/indexes/document_loaders/examples/aws_s3_directory.ipynb similarity index 100% rename from docs/modules/indexes/document_loaders/examples/s3_directory.ipynb rename to docs/modules/indexes/document_loaders/examples/aws_s3_directory.ipynb diff --git a/docs/modules/indexes/document_loaders/examples/s3_file.ipynb b/docs/modules/indexes/document_loaders/examples/aws_s3_file.ipynb similarity index 100% rename from docs/modules/indexes/document_loaders/examples/s3_file.ipynb rename to docs/modules/indexes/document_loaders/examples/aws_s3_file.ipynb diff --git a/docs/modules/indexes/document_loaders/examples/bilibili.ipynb b/docs/modules/indexes/document_loaders/examples/bilibili.ipynb index 3622a6df..d9a1d07f 100644 --- a/docs/modules/indexes/document_loaders/examples/bilibili.ipynb +++ b/docs/modules/indexes/document_loaders/examples/bilibili.ipynb @@ -7,7 +7,9 @@ "source": [ "# Bilibili\n", "\n", - "This loader utilizes the [bilibili-api](https://github.com/MoyuScript/bilibili-api) to fetch the text transcript from [Bilibili](https://www.bilibili.tv/), one of the most beloved long-form video sites in China.\n", + ">[Bilibili](https://www.bilibili.tv/) is one of the most beloved long-form video sites in China.\n", + "\n", + "This loader utilizes the [bilibili-api](https://github.com/MoyuScript/bilibili-api) to fetch the text transcript from `Bilibili`.\n", "\n", "With this BiliBiliLoader, users can easily obtain the transcript of their desired video content on the platform." ] diff --git a/docs/modules/indexes/document_loaders/examples/blackboard.ipynb b/docs/modules/indexes/document_loaders/examples/blackboard.ipynb index 5834021b..c6580cc7 100644 --- a/docs/modules/indexes/document_loaders/examples/blackboard.ipynb +++ b/docs/modules/indexes/document_loaders/examples/blackboard.ipynb @@ -6,6 +6,8 @@ "source": [ "# Blackboard\n", "\n", + ">[Blackboard Learn](https://en.wikipedia.org/wiki/Blackboard_Learn) (previously the Blackboard Learning Management System) is a web-based virtual learning environment and learning management system developed by Blackboard Inc. The software features course management, customizable open architecture, and scalable design that allows integration with student information systems and authentication protocols. It may be installed on local servers, hosted by `Blackboard ASP Solutions`, or provided as Software as a Service hosted on Amazon Web Services. Its main purposes are stated to include the addition of online elements to courses traditionally delivered face-to-face and development of completely online courses with few or no face-to-face meetings\n", + "\n", "This covers how to load data from a [Blackboard Learn](https://www.anthology.com/products/teaching-and-learning/learning-effectiveness/blackboard-learn) instance.\n", "\n", "This loader is not compatible with all `Blackboard` courses. It is only\n", diff --git a/docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb b/docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb index 1e4a3d58..9ba1820e 100644 --- a/docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb +++ b/docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb @@ -4,7 +4,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### ChatGPT Data Loader\n", + "### ChatGPT Data\n", + "\n", + ">[ChatGPT](https://chat.openai.com) is an artificial intelligence (AI) chatbot developed by OpenAI.\n", + "\n", "\n", "This notebook covers how to load `conversations.json` from your `ChatGPT` data export folder.\n", "\n", diff --git a/docs/modules/indexes/document_loaders/examples/confluence.ipynb b/docs/modules/indexes/document_loaders/examples/confluence.ipynb index 48d8cb40..b4ccfc80 100644 --- a/docs/modules/indexes/document_loaders/examples/confluence.ipynb +++ b/docs/modules/indexes/document_loaders/examples/confluence.ipynb @@ -6,7 +6,9 @@ "source": [ "# Confluence\n", "\n", - "A loader for [Confluence](https://www.atlassian.com/software/confluence) pages.\n", + ">[Confluence](https://www.atlassian.com/software/confluence) is a wiki collaboration platform that saves and organizes all of the project-related material. `Confluence` is a knowledge base that primarily handles content management activities. \n", + "\n", + "A loader for `Confluence` pages.\n", "\n", "\n", "This currently supports both `username/api_key` and `Oauth2 login`.\n", diff --git a/docs/modules/indexes/document_loaders/examples/CoNLL-U.ipynb b/docs/modules/indexes/document_loaders/examples/conll-u.ipynb similarity index 83% rename from docs/modules/indexes/document_loaders/examples/CoNLL-U.ipynb rename to docs/modules/indexes/document_loaders/examples/conll-u.ipynb index c0263aca..e3f495ab 100644 --- a/docs/modules/indexes/document_loaders/examples/CoNLL-U.ipynb +++ b/docs/modules/indexes/document_loaders/examples/conll-u.ipynb @@ -6,6 +6,12 @@ "metadata": {}, "source": [ "# CoNLL-U\n", + "\n", + ">[CoNLL-U](https://universaldependencies.org/format.html) is revised version of the CoNLL-X format. Annotations are encoded in plain text files (UTF-8, normalized to NFC, using only the LF character as line break, including an LF character at the end of file) with three types of lines:\n", + ">- Word lines containing the annotation of a word/token in 10 fields separated by single tab characters; see below.\n", + ">- Blank lines marking sentence boundaries.\n", + ">- Comment lines starting with hash (#).\n", + "\n", "This is an example of how to load a file in [CoNLL-U](https://universaldependencies.org/format.html) format. The whole file is treated as one document. The example data (`conllu.conllu`) is based on one of the standard UD/CoNLL-U examples." ] }, diff --git a/docs/modules/indexes/document_loaders/examples/csv.ipynb b/docs/modules/indexes/document_loaders/examples/csv.ipynb index 7080320c..6b62950b 100644 --- a/docs/modules/indexes/document_loaders/examples/csv.ipynb +++ b/docs/modules/indexes/document_loaders/examples/csv.ipynb @@ -4,7 +4,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# CSV Files\n", + "# CSV\n", + "\n", + ">A [comma-separated values (CSV)](https://en.wikipedia.org/wiki/Comma-separated_values) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas.\n", "\n", "Load [csv](https://en.wikipedia.org/wiki/Comma-separated_values) data with a single row per document." ] diff --git a/docs/modules/indexes/document_loaders/examples/discord_loader.ipynb b/docs/modules/indexes/document_loaders/examples/discord_loader.ipynb index cd24804d..5d9a2031 100644 --- a/docs/modules/indexes/document_loaders/examples/discord_loader.ipynb +++ b/docs/modules/indexes/document_loaders/examples/discord_loader.ipynb @@ -6,7 +6,9 @@ "source": [ "# Discord\n", "\n", - "You can follow the below steps to download your Discord data:\n", + ">[Discord](https://discord.com/) is a VoIP and instant messaging social platform. Users have the ability to communicate with voice calls, video calls, text messaging, media and files in private chats or as part of communities called \"servers\". A server is a collection of persistent chat rooms and voice channels which can be accessed via invite links.\n", + "\n", + "Follow these steps to download your `Discord` data:\n", "\n", "1. Go to your **User Settings**\n", "2. Then go to **Privacy and Safety**\n", @@ -79,9 +81,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.6" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/modules/indexes/document_loaders/examples/epub.ipynb b/docs/modules/indexes/document_loaders/examples/epub.ipynb index d1472e48..b81d6c59 100644 --- a/docs/modules/indexes/document_loaders/examples/epub.ipynb +++ b/docs/modules/indexes/document_loaders/examples/epub.ipynb @@ -7,9 +7,21 @@ "source": [ "# EPub \n", "\n", + ">[EPUB](https://en.wikipedia.org/wiki/EPUB) is an e-book file format that uses the \".epub\" file extension. The term is short for electronic publication and is sometimes styled ePub. `EPUB` is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers.\n", + "\n", "This covers how to load `.epub` documents into the Document format that we can use downstream. You'll need to install the [`pandocs`](https://pandoc.org/installing.html) package for this loader to work." ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd1affad-8ba6-43b1-b8cd-f61f44025077", + "metadata": {}, + "outputs": [], + "source": [ + "#!pip install pandocs" + ] + }, { "cell_type": "code", "execution_count": 1, diff --git a/docs/modules/indexes/document_loaders/examples/facebook_chat.ipynb b/docs/modules/indexes/document_loaders/examples/facebook_chat.ipynb index 466c63f1..c61b3fad 100644 --- a/docs/modules/indexes/document_loaders/examples/facebook_chat.ipynb +++ b/docs/modules/indexes/document_loaders/examples/facebook_chat.ipynb @@ -6,6 +6,8 @@ "source": [ "### Facebook Chat\n", "\n", + ">[Messenger](https://en.wikipedia.org/wiki/Messenger_(software)) is an American proprietary instant messaging app and platform developed by `Meta Platforms`. Originally developed as `Facebook Chat` in 2008, the company revamped its messaging service in 2010.\n", + "\n", "This notebook covers how to load data from the [Facebook Chats](https://www.facebook.com/business/help/1646890868956360) into a format that can be ingested into LangChain." ] }, diff --git a/docs/modules/indexes/document_loaders/examples/directory_loader.ipynb b/docs/modules/indexes/document_loaders/examples/file_directory.ipynb similarity index 95% rename from docs/modules/indexes/document_loaders/examples/directory_loader.ipynb rename to docs/modules/indexes/document_loaders/examples/file_directory.ipynb index 6d6afacb..117284ca 100644 --- a/docs/modules/indexes/document_loaders/examples/directory_loader.ipynb +++ b/docs/modules/indexes/document_loaders/examples/file_directory.ipynb @@ -5,8 +5,9 @@ "id": "79f24a6b", "metadata": {}, "source": [ - "# Directory Loader\n", - "This covers how to use the DirectoryLoader to load all documents in a directory. Under the hood, by default this uses the [UnstructuredLoader](./unstructured_file.ipynb)" + "# File Directory\n", + "\n", + "This covers how to use the `DirectoryLoader` to load all documents in a directory. Under the hood, by default this uses the [UnstructuredLoader](./unstructured_file.ipynb)" ] }, { @@ -255,7 +256,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.6" } }, "nbformat": 4, diff --git a/docs/modules/indexes/document_loaders/examples/bigquery.ipynb b/docs/modules/indexes/document_loaders/examples/google_bigquery.ipynb similarity index 96% rename from docs/modules/indexes/document_loaders/examples/bigquery.ipynb rename to docs/modules/indexes/document_loaders/examples/google_bigquery.ipynb index 2932f1fc..75afc996 100644 --- a/docs/modules/indexes/document_loaders/examples/bigquery.ipynb +++ b/docs/modules/indexes/document_loaders/examples/google_bigquery.ipynb @@ -4,9 +4,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# BigQuery\n", + "# Google BigQuery\n", "\n", - ">[BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.\n", + ">[Google BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.\n", "`BigQuery` is a part of the `Google Cloud Platform`.\n", "\n", "Load a `BigQuery` query with one document per row." diff --git a/docs/modules/indexes/document_loaders/examples/gcs_directory.ipynb b/docs/modules/indexes/document_loaders/examples/google_cloud_storage_directory.ipynb similarity index 99% rename from docs/modules/indexes/document_loaders/examples/gcs_directory.ipynb rename to docs/modules/indexes/document_loaders/examples/google_cloud_storage_directory.ipynb index e9e695f1..9bcc1469 100644 --- a/docs/modules/indexes/document_loaders/examples/gcs_directory.ipynb +++ b/docs/modules/indexes/document_loaders/examples/google_cloud_storage_directory.ipynb @@ -5,7 +5,7 @@ "id": "0ef41fd4", "metadata": {}, "source": [ - "# GCS Directory\n", + "# Google Cloud Storage Directory\n", "\n", ">[Google Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage) is a managed service for storing unstructured data.\n", "\n", diff --git a/docs/modules/indexes/document_loaders/examples/gcs_file.ipynb b/docs/modules/indexes/document_loaders/examples/google_cloud_storage_file.ipynb similarity index 98% rename from docs/modules/indexes/document_loaders/examples/gcs_file.ipynb rename to docs/modules/indexes/document_loaders/examples/google_cloud_storage_file.ipynb index 7dfd8bf2..4d2ed265 100644 --- a/docs/modules/indexes/document_loaders/examples/gcs_file.ipynb +++ b/docs/modules/indexes/document_loaders/examples/google_cloud_storage_file.ipynb @@ -5,7 +5,7 @@ "id": "0ef41fd4", "metadata": {}, "source": [ - "# GCS File Storage\n", + "# Google Cloud Storage File\n", "\n", ">[Google Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage) is a managed service for storing unstructured data.\n", "\n", diff --git a/docs/modules/indexes/document_loaders/examples/googledrive.ipynb b/docs/modules/indexes/document_loaders/examples/google_drive.ipynb similarity index 95% rename from docs/modules/indexes/document_loaders/examples/googledrive.ipynb rename to docs/modules/indexes/document_loaders/examples/google_drive.ipynb index 81aa7dc4..7a2c7222 100644 --- a/docs/modules/indexes/document_loaders/examples/googledrive.ipynb +++ b/docs/modules/indexes/document_loaders/examples/google_drive.ipynb @@ -6,6 +6,9 @@ "metadata": {}, "source": [ "# Google Drive\n", + "\n", + ">[Google Drive](https://en.wikipedia.org/wiki/Google_Drive) is a file storage and synchronization service developed by Google.\n", + "\n", "This notebook covers how to load documents from `Google Drive`. Currently, only `Google Docs` are supported.\n", "\n", "## Prerequisites\n", diff --git a/docs/modules/indexes/document_loaders/examples/hn.ipynb b/docs/modules/indexes/document_loaders/examples/hacker_news.ipynb similarity index 89% rename from docs/modules/indexes/document_loaders/examples/hn.ipynb rename to docs/modules/indexes/document_loaders/examples/hacker_news.ipynb index 4922be17..578d2ae5 100644 --- a/docs/modules/indexes/document_loaders/examples/hn.ipynb +++ b/docs/modules/indexes/document_loaders/examples/hacker_news.ipynb @@ -7,7 +7,7 @@ "source": [ "# Hacker News\n", "\n", - ">[Hacker News](https://en.wikipedia.org/wiki/Hacker_News) (sometimes abbreviated as HN) is a social news website focusing on computer science and entrepreneurship. It is run by the investment fund and startup incubator Y Combinator. In general, content that can be submitted is defined as \"anything that gratifies one's intellectual curiosity.\"\n", + ">[Hacker News](https://en.wikipedia.org/wiki/Hacker_News) (sometimes abbreviated as `HN`) is a social news website focusing on computer science and entrepreneurship. It is run by the investment fund and startup incubator `Y Combinator`. In general, content that can be submitted is defined as \"anything that gratifies one's intellectual curiosity.\"\n", "\n", "This notebook covers how to pull page data and comments from [Hacker News](https://news.ycombinator.com/)" ] diff --git a/docs/modules/indexes/document_loaders/examples/html.ipynb b/docs/modules/indexes/document_loaders/examples/html.ipynb index 84225238..445ec597 100644 --- a/docs/modules/indexes/document_loaders/examples/html.ipynb +++ b/docs/modules/indexes/document_loaders/examples/html.ipynb @@ -7,6 +7,8 @@ "source": [ "# HTML\n", "\n", + ">[The HyperText Markup Language or HTML](https://en.wikipedia.org/wiki/HTML) is the standard markup language for documents designed to be displayed in a web browser.\n", + "\n", "This covers how to load `HTML` documents into a document format that we can use downstream." ] }, diff --git a/docs/modules/indexes/document_loaders/examples/hugging_face_dataset.ipynb b/docs/modules/indexes/document_loaders/examples/hugging_face_dataset.ipynb index b9197a8b..7490524e 100644 --- a/docs/modules/indexes/document_loaders/examples/hugging_face_dataset.ipynb +++ b/docs/modules/indexes/document_loaders/examples/hugging_face_dataset.ipynb @@ -5,12 +5,11 @@ "id": "04c9fdc5", "metadata": {}, "source": [ - "# HuggingFace dataset \n", + "# HuggingFace dataset\n", "\n", - "The [Hugging Face Hub](https://huggingface.co/docs/hub/index) hosts a large number of community-curated datasets for a diverse range of tasks such as translation,\n", + ">The [Hugging Face Hub](https://huggingface.co/docs/hub/index) is home to over 5,000 [datasets](https://huggingface.co/docs/hub/index#datasets) in more than 100 languages that can be used for a broad range of tasks across NLP, Computer Vision, and Audio. They used for a diverse range of tasks such as translation,\n", "automatic speech recognition, and image classification.\n", "\n", - ">The `Hugging Face Hub` is home to over 5,000 [datasets](https://huggingface.co/docs/hub/index#datasets) in more than 100 languages that can be used for a broad range of tasks across NLP, Computer Vision, and Audio.\n", "\n", "This notebook shows how to load `Hugging Face Hub` datasets to LangChain." ] diff --git a/docs/modules/indexes/document_loaders/examples/ifixit.ipynb b/docs/modules/indexes/document_loaders/examples/ifixit.ipynb index 3588ab0e..3791ca91 100644 --- a/docs/modules/indexes/document_loaders/examples/ifixit.ipynb +++ b/docs/modules/indexes/document_loaders/examples/ifixit.ipynb @@ -6,7 +6,7 @@ "source": [ "# iFixit\n", "\n", - "[iFixit](https://www.ifixit.com) is the largest, open repair community on the web. The site contains nearly 100k repair manuals, 200k Questions & Answers on 42k devices, and all the data is licensed under CC-BY-NC-SA 3.0.\n", + ">[iFixit](https://www.ifixit.com) is the largest, open repair community on the web. The site contains nearly 100k repair manuals, 200k Questions & Answers on 42k devices, and all the data is licensed under CC-BY-NC-SA 3.0.\n", "\n", "This loader will allow you to download the text of a repair guide, text of Q&A's and wikis from devices on `iFixit` using their open APIs. It's incredibly useful for context related to technical documents and answers to questions about devices in the corpus of data on `iFixit`." ] diff --git a/docs/modules/indexes/document_loaders/examples/image.ipynb b/docs/modules/indexes/document_loaders/examples/image.ipynb index 8af1faab..e09f2fe7 100644 --- a/docs/modules/indexes/document_loaders/examples/image.ipynb +++ b/docs/modules/indexes/document_loaders/examples/image.ipynb @@ -7,7 +7,7 @@ "source": [ "# Images\n", "\n", - "This covers how to load images such as JPGs PNGs into a document format that we can use downstream." + "This covers how to load images such as `JPG` or `PNG` into a document format that we can use downstream." ] }, { diff --git a/docs/modules/indexes/document_loaders/examples/image_captions.ipynb b/docs/modules/indexes/document_loaders/examples/image_captions.ipynb index 5d354e54..9869afa3 100644 --- a/docs/modules/indexes/document_loaders/examples/image_captions.ipynb +++ b/docs/modules/indexes/document_loaders/examples/image_captions.ipynb @@ -10,7 +10,7 @@ "By default, the loader utilizes the pre-trained [Salesforce BLIP image captioning model](https://huggingface.co/Salesforce/blip-image-captioning-base).\n", "\n", "\n", - "This notebook shows how to use the ImageCaptionLoader tutorial to generate a query-able index of image captions" + "This notebook shows how to use the `ImageCaptionLoader` to generate a query-able index of image captions" ] }, { diff --git a/docs/modules/indexes/document_loaders/examples/imsdb.ipynb b/docs/modules/indexes/document_loaders/examples/imsdb.ipynb index b6c916f0..de686668 100644 --- a/docs/modules/indexes/document_loaders/examples/imsdb.ipynb +++ b/docs/modules/indexes/document_loaders/examples/imsdb.ipynb @@ -7,7 +7,7 @@ "source": [ "# IMSDb\n", "\n", - "[IMSDb](https://imsdb.com/) is the `Internet Movie Script Database`.\n", + ">[IMSDb](https://imsdb.com/) is the `Internet Movie Script Database`.\n", "\n", "This covers how to load `IMSDb` webpages into a document format that we can use downstream." ] diff --git a/docs/modules/indexes/document_loaders/examples/notebook.ipynb b/docs/modules/indexes/document_loaders/examples/jupyter_notebook.ipynb similarity index 93% rename from docs/modules/indexes/document_loaders/examples/notebook.ipynb rename to docs/modules/indexes/document_loaders/examples/jupyter_notebook.ipynb index aabc81dc..208ba198 100644 --- a/docs/modules/indexes/document_loaders/examples/notebook.ipynb +++ b/docs/modules/indexes/document_loaders/examples/jupyter_notebook.ipynb @@ -4,7 +4,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Notebook\n", + "# Jupyter Notebook\n", + "\n", + ">[Jupyter Notebook](https://en.wikipedia.org/wiki/Project_Jupyter#Applications) (formerly `IPython Notebook`) is a web-based interactive computational environment for creating notebook documents.\n", "\n", "This notebook covers how to load data from a `Jupyter notebook (.ipynb)` into a format suitable by LangChain." ] diff --git a/docs/modules/indexes/document_loaders/examples/mediawikidump.ipynb b/docs/modules/indexes/document_loaders/examples/mediawikidump.ipynb index 5c37abbe..e233b96c 100644 --- a/docs/modules/indexes/document_loaders/examples/mediawikidump.ipynb +++ b/docs/modules/indexes/document_loaders/examples/mediawikidump.ipynb @@ -6,9 +6,11 @@ "source": [ "# MediaWikiDump\n", "\n", + ">[MediaWiki XML Dumps](https://www.mediawiki.org/wiki/Manual:Importing_XML_dumps) contain the content of a wiki (wiki pages with all their revisions), without the site-related data. A XML dump does not create a full backup of the wiki database, the dump does not contain user accounts, images, edit logs, etc.\n", + "\n", "This covers how to load a MediaWiki XML dump file into a document format that we can use downstream.\n", "\n", - "It uses mwxml from mediawiki-utilities to dump and mwparserfromhell from earwig to parse MediaWiki wikicode.\n", + "It uses `mwxml` from `mediawiki-utilities` to dump and `mwparserfromhell` from `earwig` to parse MediaWiki wikicode.\n", "\n", "Dump files can be obtained with dumpBackup.php or on the Special:Statistics page of the Wiki." ] @@ -114,9 +116,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.6" } }, "nbformat": 4, - "nbformat_minor": 1 + "nbformat_minor": 4 } diff --git a/docs/modules/indexes/document_loaders/examples/onedrive.ipynb b/docs/modules/indexes/document_loaders/examples/microsoft_onedrive.ipynb similarity index 89% rename from docs/modules/indexes/document_loaders/examples/onedrive.ipynb rename to docs/modules/indexes/document_loaders/examples/microsoft_onedrive.ipynb index f10fef72..a7d8fb46 100644 --- a/docs/modules/indexes/document_loaders/examples/onedrive.ipynb +++ b/docs/modules/indexes/document_loaders/examples/microsoft_onedrive.ipynb @@ -1,11 +1,13 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "# OneDrive\n", + "# Microsoft OneDrive\n", + "\n", + ">[Microsoft OneDrive](https://en.wikipedia.org/wiki/OneDrive) (formerly `SkyDrive`) is a file hosting service operated by Microsoft.\n", + "\n", "This notebook covers how to load documents from `OneDrive`. Currently, only docx, doc, and pdf files are supported.\n", "\n", "## Prerequisites\n", @@ -77,14 +79,34 @@ "documents = loader.load()\n", "```\n" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { - "language_info": { - "name": "python" + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" }, - "orig_nbformat": 4 + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/modules/indexes/document_loaders/examples/powerpoint.ipynb b/docs/modules/indexes/document_loaders/examples/microsoft_powerpoint.ipynb similarity index 95% rename from docs/modules/indexes/document_loaders/examples/powerpoint.ipynb rename to docs/modules/indexes/document_loaders/examples/microsoft_powerpoint.ipynb index f43815d3..e34aebe0 100644 --- a/docs/modules/indexes/document_loaders/examples/powerpoint.ipynb +++ b/docs/modules/indexes/document_loaders/examples/microsoft_powerpoint.ipynb @@ -1,12 +1,13 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "id": "39af9ecd", "metadata": {}, "source": [ - "# PowerPoint\n", + "# Microsoft PowerPoint\n", + "\n", + ">[Microsoft PowerPoint](https://en.wikipedia.org/wiki/Microsoft_PowerPoint) is a presentation program by Microsoft.\n", "\n", "This covers how to load `Microsoft PowerPoint` documents into a document format that we can use downstream." ] diff --git a/docs/modules/indexes/document_loaders/examples/word_document.ipynb b/docs/modules/indexes/document_loaders/examples/microsoft_word.ipynb similarity index 93% rename from docs/modules/indexes/document_loaders/examples/word_document.ipynb rename to docs/modules/indexes/document_loaders/examples/microsoft_word.ipynb index 38621f06..40b5534b 100644 --- a/docs/modules/indexes/document_loaders/examples/word_document.ipynb +++ b/docs/modules/indexes/document_loaders/examples/microsoft_word.ipynb @@ -5,9 +5,11 @@ "id": "39af9ecd", "metadata": {}, "source": [ - "# Word Documents\n", + "# Microsoft Word\n", "\n", - "This covers how to load Word documents into a document format that we can use downstream." + ">[Microsoft Word](https://www.microsoft.com/en-us/microsoft-365/word) is a word processor developed by Microsoft.\n", + "\n", + "This covers how to load `Word` documents into a document format that we can use downstream." ] }, { @@ -198,7 +200,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.6" } }, "nbformat": 4, diff --git a/docs/modules/indexes/document_loaders/examples/modern_treasury.ipynb b/docs/modules/indexes/document_loaders/examples/modern_treasury.ipynb index 51d1e2fd..5a02fb40 100644 --- a/docs/modules/indexes/document_loaders/examples/modern_treasury.ipynb +++ b/docs/modules/indexes/document_loaders/examples/modern_treasury.ipynb @@ -6,8 +6,7 @@ "source": [ "# Modern Treasury\n", "\n", - ">[Modern Treasury](https://www.moderntreasury.com/) simplifies complex payment operations\n", - "A unified platform to power products and processes that move money.\n", + ">[Modern Treasury](https://www.moderntreasury.com/) simplifies complex payment operations. It is a unified platform to power products and processes that move money.\n", ">- Connect to banks and payment systems\n", ">- Track transactions and balances in real-time\n", ">- Automate payment operations for scale\n", diff --git a/docs/modules/indexes/document_loaders/examples/dataframe.ipynb b/docs/modules/indexes/document_loaders/examples/pandas_dataframe.ipynb similarity index 100% rename from docs/modules/indexes/document_loaders/examples/dataframe.ipynb rename to docs/modules/indexes/document_loaders/examples/pandas_dataframe.ipynb diff --git a/docs/modules/indexes/document_loaders/examples/pdf.ipynb b/docs/modules/indexes/document_loaders/examples/pdf.ipynb index 2418262c..b8a222e9 100644 --- a/docs/modules/indexes/document_loaders/examples/pdf.ipynb +++ b/docs/modules/indexes/document_loaders/examples/pdf.ipynb @@ -7,7 +7,9 @@ "source": [ "# PDF\n", "\n", - "This covers how to load PDF documents into the Document format that we use downstream." + ">[Portable Document Format (PDF)](https://en.wikipedia.org/wiki/PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.\n", + "\n", + "This covers how to load `PDF` documents into the Document format that we use downstream." ] }, { diff --git a/docs/modules/indexes/document_loaders/examples/reddit.ipynb b/docs/modules/indexes/document_loaders/examples/reddit.ipynb index e227a953..385a9177 100644 --- a/docs/modules/indexes/document_loaders/examples/reddit.ipynb +++ b/docs/modules/indexes/document_loaders/examples/reddit.ipynb @@ -6,7 +6,7 @@ "source": [ "# Reddit\n", "\n", - ">[Reddit (reddit)](\twww.reddit.com) is an American social news aggregation, content rating, and discussion website.\n", + ">[Reddit (reddit)](www.reddit.com) is an American social news aggregation, content rating, and discussion website.\n", "\n", "\n", "This loader fetches the text from the Posts of Subreddits or Reddit users, using the `praw` Python package.\n", diff --git a/docs/modules/indexes/document_loaders/examples/sitemap.ipynb b/docs/modules/indexes/document_loaders/examples/sitemap.ipynb index 65311525..46a4d0bd 100644 --- a/docs/modules/indexes/document_loaders/examples/sitemap.ipynb +++ b/docs/modules/indexes/document_loaders/examples/sitemap.ipynb @@ -6,9 +6,9 @@ "source": [ "# Sitemap\n", "\n", - "Extends from the `WebBaseLoader`, this will load a sitemap from a given URL, and then scrape and load all pages in the sitemap, returning each page as a Document.\n", + "Extends from the `WebBaseLoader`, `SitemapLoader` loads a sitemap from a given URL, and then scrape and load all pages in the sitemap, returning each page as a Document.\n", "\n", - "The scraping is done concurrently, using `WebBaseLoader`. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the server you are scraping and don't care about load, you can change the `requests_per_second` parameter to increase the max concurrent requests. Note, while this will speed up the scraping process, but may cause the server to block you. Be careful!" + "The scraping is done concurrently. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the scrapped server, or don't care about load, you can change the `requests_per_second` parameter to increase the max concurrent requests. Note, while this will speed up the scraping process, but it may cause the server to block you. Be careful!" ] }, { diff --git a/docs/modules/indexes/document_loaders/examples/slack_directory.ipynb b/docs/modules/indexes/document_loaders/examples/slack.ipynb similarity index 95% rename from docs/modules/indexes/document_loaders/examples/slack_directory.ipynb rename to docs/modules/indexes/document_loaders/examples/slack.ipynb index b8f94b7f..645c74e5 100644 --- a/docs/modules/indexes/document_loaders/examples/slack_directory.ipynb +++ b/docs/modules/indexes/document_loaders/examples/slack.ipynb @@ -5,9 +5,9 @@ "id": "1dc7df1d", "metadata": {}, "source": [ - "# Slack (Local Exported Zipfile)\n", + "# Slack\n", "\n", - ">[Slack](slack.com) is an instant messaging program.\n", + ">[Slack](https://slack.com/) is an instant messaging program.\n", "\n", "This notebook covers how to load documents from a Zipfile generated from a `Slack` export.\n", "\n", diff --git a/docs/modules/indexes/document_loaders/examples/stripe.ipynb b/docs/modules/indexes/document_loaders/examples/stripe.ipynb index a0b0c0a4..691be7ca 100644 --- a/docs/modules/indexes/document_loaders/examples/stripe.ipynb +++ b/docs/modules/indexes/document_loaders/examples/stripe.ipynb @@ -6,7 +6,9 @@ "source": [ "# Stripe\n", "\n", - "This notebook covers how to load data from the Stripe REST API into a format that can be ingested into LangChain, along with example usage for vectorization." + ">[Stripe](https://stripe.com/en-ca) is an Irish-American financial services and software as a service (SaaS) company. It offers payment-processing software and application programming interfaces for e-commerce websites and mobile applications.\n", + "\n", + "This notebook covers how to load data from the `Stripe REST API` into a format that can be ingested into LangChain, along with example usage for vectorization." ] }, { @@ -84,9 +86,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.3" + "version": "3.10.6" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/modules/indexes/document_loaders/examples/srt.ipynb b/docs/modules/indexes/document_loaders/examples/subtitle.ipynb similarity index 99% rename from docs/modules/indexes/document_loaders/examples/srt.ipynb rename to docs/modules/indexes/document_loaders/examples/subtitle.ipynb index 35f3151e..39993a2d 100644 --- a/docs/modules/indexes/document_loaders/examples/srt.ipynb +++ b/docs/modules/indexes/document_loaders/examples/subtitle.ipynb @@ -5,7 +5,7 @@ "id": "4bdaea79", "metadata": {}, "source": [ - "# Subtitle Files\n", + "# Subtitle\n", "\n", ">[The SubRip file format](https://en.wikipedia.org/wiki/SubRip#SubRip_file_format) is described on the `Matroska` multimedia container format website as \"perhaps the most basic of all subtitle formats.\" `SubRip (SubRip Text)` files are named with the extension `.srt`, and contain formatted lines of plain text in groups separated by a blank line. Subtitles are numbered sequentially, starting at 1. The timecode format used is hours:minutes:seconds,milliseconds with time units fixed to two zero-padded digits and fractions fixed to three zero-padded digits (00:00:00,000). The fractional separator used is the comma, since the program was written in France.\n", "\n", diff --git a/docs/modules/indexes/document_loaders/examples/telegram.ipynb b/docs/modules/indexes/document_loaders/examples/telegram.ipynb index ca561645..20f7d46b 100644 --- a/docs/modules/indexes/document_loaders/examples/telegram.ipynb +++ b/docs/modules/indexes/document_loaders/examples/telegram.ipynb @@ -7,7 +7,9 @@ "source": [ "# Telegram\n", "\n", - "This notebook covers how to load data from Telegram into a format that can be ingested into LangChain." + ">[Telegram Messenger](https://web.telegram.org/a/) is a globally accessible freemium, cross-platform, encrypted, cloud-based and centralized instant messaging service. The application also provides optional end-to-end encrypted chats and video calling, VoIP, file sharing and several other features.\n", + "\n", + "This notebook covers how to load data from `Telegram` into a format that can be ingested into LangChain." ] }, { @@ -76,7 +78,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.6" } }, "nbformat": 4, diff --git a/docs/modules/indexes/document_loaders/examples/toml.ipynb b/docs/modules/indexes/document_loaders/examples/toml.ipynb index e5931042..b57d0ccf 100644 --- a/docs/modules/indexes/document_loaders/examples/toml.ipynb +++ b/docs/modules/indexes/document_loaders/examples/toml.ipynb @@ -5,9 +5,11 @@ "id": "4284970b", "metadata": {}, "source": [ - "# TOML Loader\n", + "# TOML\n", "\n", - "If you need to load Toml files, use the `TomlLoader`." + ">[TOML](https://en.wikipedia.org/wiki/TOML) is a file format for configuration files. It is intended to be easy to read and write, and is designed to map unambiguously to a dictionary. Its specification is open-source. `TOML` is implemented in many programming languages. The name `TOML` is an acronym for \"Tom's Obvious, Minimal Language\" referring to its creator, Tom Preston-Werner.\n", + "\n", + "If you need to load `Toml` files, use the `TomlLoader`." ] }, { @@ -86,7 +88,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.6" } }, "nbformat": 4, diff --git a/docs/modules/indexes/document_loaders/examples/twitter.ipynb b/docs/modules/indexes/document_loaders/examples/twitter.ipynb index 9df5a784..f62292e7 100644 --- a/docs/modules/indexes/document_loaders/examples/twitter.ipynb +++ b/docs/modules/indexes/document_loaders/examples/twitter.ipynb @@ -7,8 +7,10 @@ "source": [ "# Twitter\n", "\n", - "This loader fetches the text from the Tweets of a list of Twitter users, using the `tweepy` Python package.\n", - "You must initialize the loader with your Twitter API token, and you need to pass in the Twitter username you want to extract." + ">[Twitter](https://twitter.com/) is an online social media and social networking service.\n", + "\n", + "This loader fetches the text from the Tweets of a list of `Twitter` users, using the `tweepy` Python package.\n", + "You must initialize the loader with your `Twitter API` token, and you need to pass in the Twitter username you want to extract." ] }, { @@ -106,7 +108,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.6" } }, "nbformat": 4, diff --git a/docs/modules/indexes/document_loaders/examples/unstructured_file.ipynb b/docs/modules/indexes/document_loaders/examples/unstructured_file.ipynb index b9ea3b0e..c79868ec 100644 --- a/docs/modules/indexes/document_loaders/examples/unstructured_file.ipynb +++ b/docs/modules/indexes/document_loaders/examples/unstructured_file.ipynb @@ -5,8 +5,9 @@ "id": "20deed05", "metadata": {}, "source": [ - "# Unstructured File Loader\n", - "This notebook covers how to use Unstructured to load files of many types. Unstructured currently supports loading of text files, powerpoints, html, pdfs, images, and more." + "# Unstructured File\n", + "\n", + "This notebook covers how to use `Unstructured` package to load files of many types. `Unstructured` currently supports loading of text files, powerpoints, html, pdfs, images, and more." ] }, { @@ -311,7 +312,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.13" + "version": "3.10.6" } }, "nbformat": 4, diff --git a/docs/modules/indexes/document_loaders/examples/web_base.ipynb b/docs/modules/indexes/document_loaders/examples/web_base.ipynb index d500e778..b2a078d0 100644 --- a/docs/modules/indexes/document_loaders/examples/web_base.ipynb +++ b/docs/modules/indexes/document_loaders/examples/web_base.ipynb @@ -5,9 +5,9 @@ "id": "bf920da0", "metadata": {}, "source": [ - "# Web Base\n", + "# WebBaseLoader\n", "\n", - "This covers how to load all text from webpages into a document format that we can use downstream. For more custom logic for loading webpages look at some child class examples such as IMSDbLoader, AZLyricsLoader, and CollegeConfidentialLoader" + "This covers how to use `WebBaseLoader` to load all text from `HTML` webpages into a document format that we can use downstream. For more custom logic for loading webpages look at some child class examples such as `IMSDbLoader`, `AZLyricsLoader`, and `CollegeConfidentialLoader`" ] }, { @@ -140,7 +140,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Requirement already satisfied: nest_asyncio in /Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages (1.5.6)\r\n" + "Requirement already satisfied: nest_asyncio in /Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages (1.5.6)\n" ] } ], @@ -237,7 +237,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.6" } }, "nbformat": 4, diff --git a/docs/modules/indexes/document_loaders/examples/whatsapp_chat.ipynb b/docs/modules/indexes/document_loaders/examples/whatsapp_chat.ipynb index 0744773e..691c4fde 100644 --- a/docs/modules/indexes/document_loaders/examples/whatsapp_chat.ipynb +++ b/docs/modules/indexes/document_loaders/examples/whatsapp_chat.ipynb @@ -1,13 +1,14 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### WhatsApp Chat\n", "\n", - "This notebook covers how to load data from the WhatsApp Chats into a format that can be ingested into LangChain." + ">[WhatsApp](https://www.whatsapp.com/) (also called `WhatsApp Messenger`) is a freeware, cross-platform, centralized instant messaging (IM) and voice-over-IP (VoIP) service. It allows users to send text and voice messages, make voice and video calls, and share images, documents, user locations, and other content.\n", + "\n", + "This notebook covers how to load data from the `WhatsApp Chats` into a format that can be ingested into LangChain." ] }, { @@ -54,7 +55,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.1" + "version": "3.10.6" }, "vscode": { "interpreter": { @@ -63,5 +64,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/modules/indexes/document_loaders/examples/youtube.ipynb b/docs/modules/indexes/document_loaders/examples/youtube_transcript.ipynb similarity index 94% rename from docs/modules/indexes/document_loaders/examples/youtube.ipynb rename to docs/modules/indexes/document_loaders/examples/youtube_transcript.ipynb index 7f1f0179..70d5be06 100644 --- a/docs/modules/indexes/document_loaders/examples/youtube.ipynb +++ b/docs/modules/indexes/document_loaders/examples/youtube_transcript.ipynb @@ -5,10 +5,11 @@ "id": "df770c72", "metadata": {}, "source": [ - "# YouTube\n", + "# YouTube transcripts\n", "\n", - "How to load documents from YouTube transcripts.\n", - "\n" + ">[YouTube](https://www.youtube.com/) is an online video sharing and social media platform created by Google.\n", + "\n", + "This notebook covers how to load documents from `YouTube transcripts`." ] }, { @@ -156,7 +157,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.6" }, "vscode": { "interpreter": {