diff --git a/docs/integrations/aws_s3.md b/docs/integrations/aws_s3.md new file mode 100644 index 00000000..707fe8ff --- /dev/null +++ b/docs/integrations/aws_s3.md @@ -0,0 +1,25 @@ +# AWS S3 Directory + +>[Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html) is an object storage service. + +>[AWS S3 Directory](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html) + +>[AWS S3 Buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingBucket.html) + + +## Installation and Setup + +```bash +pip install boto3 +``` + + +## Document Loader + +See a [usage example for S3DirectoryLoader](../modules/indexes/document_loaders/examples/aws_s3_directory.ipynb). + +See a [usage example for S3FileLoader](../modules/indexes/document_loaders/examples/aws_s3_file.ipynb). + +```python +from langchain.document_loaders import S3DirectoryLoader, S3FileLoader +``` diff --git a/docs/integrations/azlyrics.md b/docs/integrations/azlyrics.md new file mode 100644 index 00000000..f275717e --- /dev/null +++ b/docs/integrations/azlyrics.md @@ -0,0 +1,16 @@ +# AZLyrics + +>[AZLyrics](https://www.azlyrics.com/) is a large, legal, every day growing collection of lyrics. + +## Installation and Setup + +There isn't any special setup for it. + + +## Document Loader + +See a [usage example](../modules/indexes/document_loaders/examples/azlyrics.ipynb). + +```python +from langchain.document_loaders import AZLyricsLoader +``` diff --git a/docs/integrations/azure_blob_storage.md b/docs/integrations/azure_blob_storage.md new file mode 100644 index 00000000..832abd21 --- /dev/null +++ b/docs/integrations/azure_blob_storage.md @@ -0,0 +1,36 @@ +# Azure Blob Storage + +>[Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) is Microsoft's object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data. + +>[Azure Files](https://learn.microsoft.com/en-us/azure/storage/files/storage-files-introduction) offers fully managed +> file shares in the cloud that are accessible via the industry standard Server Message Block (`SMB`) protocol, +> Network File System (`NFS`) protocol, and `Azure Files REST API`. `Azure Files` are based on the `Azure Blob Storage`. + +`Azure Blob Storage` is designed for: +- Serving images or documents directly to a browser. +- Storing files for distributed access. +- Streaming video and audio. +- Writing to log files. +- Storing data for backup and restore, disaster recovery, and archiving. +- Storing data for analysis by an on-premises or Azure-hosted service. + +## Installation and Setup + +```bash +pip install azure-storage-blob +``` + + +## Document Loader + +See a [usage example for the Azure Blob Storage](../modules/indexes/document_loaders/examples/azure_blob_storage_container.ipynb). + +```python +from langchain.document_loaders import AzureBlobStorageContainerLoader +``` + +See a [usage example for the Azure Files](../modules/indexes/document_loaders/examples/azure_blob_storage_file.ipynb). + +```python +from langchain.document_loaders import AzureBlobStorageFileLoader +``` diff --git a/docs/integrations/bilibili.md b/docs/integrations/bilibili.md new file mode 100644 index 00000000..d992821c --- /dev/null +++ b/docs/integrations/bilibili.md @@ -0,0 +1,17 @@ +# BiliBili + +>[Bilibili](https://www.bilibili.tv/) is one of the most beloved long-form video sites in China. + +## Installation and Setup + +```bash +pip install bilibili-api-python +``` + +## Document Loader + +See a [usage example](../modules/indexes/document_loaders/examples/bilibili.ipynb). + +```python +from langchain.document_loaders import BiliBiliLoader +``` diff --git a/docs/integrations/blackboard.md b/docs/integrations/blackboard.md new file mode 100644 index 00000000..130764a8 --- /dev/null +++ b/docs/integrations/blackboard.md @@ -0,0 +1,22 @@ +# Blackboard + +>[Blackboard Learn](https://en.wikipedia.org/wiki/Blackboard_Learn) (previously the `Blackboard Learning Management System`) +> is a web-based virtual learning environment and learning management system developed by Blackboard Inc. +> The software features course management, customizable open architecture, and scalable design that allows +> integration with student information systems and authentication protocols. It may be installed on local servers, +> hosted by `Blackboard ASP Solutions`, or provided as Software as a Service hosted on Amazon Web Services. +> Its main purposes are stated to include the addition of online elements to courses traditionally delivered +> face-to-face and development of completely online courses with few or no face-to-face meetings. + +## Installation and Setup + +There isn't any special setup for it. + +## Document Loader + +See a [usage example](../modules/indexes/document_loaders/examples/blackboard.ipynb). + +```python +from langchain.document_loaders import BlackboardLoader + +``` diff --git a/docs/integrations/college_confidential.md b/docs/integrations/college_confidential.md new file mode 100644 index 00000000..b23923f2 --- /dev/null +++ b/docs/integrations/college_confidential.md @@ -0,0 +1,16 @@ +# College Confidential + +>[College Confidential](https://www.collegeconfidential.com/) gives information on 3,800+ colleges and universities. + +## Installation and Setup + +There isn't any special setup for it. + + +## Document Loader + +See a [usage example](../modules/indexes/document_loaders/examples/college_confidential.ipynb). + +```python +from langchain.document_loaders import CollegeConfidentialLoader +``` diff --git a/docs/integrations/confluence.md b/docs/integrations/confluence.md new file mode 100644 index 00000000..bab15eb6 --- /dev/null +++ b/docs/integrations/confluence.md @@ -0,0 +1,22 @@ +# Confluence + +>[Confluence](https://www.atlassian.com/software/confluence) is a wiki collaboration platform that saves and organizes all of the project-related material. `Confluence` is a knowledge base that primarily handles content management activities. + + +## Installation and Setup + +```bash +pip install atlassian-python-api +``` + +We need to set up `username/api_key` or `Oauth2 login`. +See [instructions](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/). + + +## Document Loader + +See a [usage example](../modules/indexes/document_loaders/examples/confluence.ipynb). + +```python +from langchain.document_loaders import ConfluenceLoader +``` diff --git a/docs/integrations/diffbot.md b/docs/integrations/diffbot.md new file mode 100644 index 00000000..e1a81846 --- /dev/null +++ b/docs/integrations/diffbot.md @@ -0,0 +1,18 @@ +# Diffbot + +>[Diffbot](https://docs.diffbot.com/docs) is a service to read web pages. Unlike traditional web scraping tools, +> `Diffbot` doesn't require any rules to read the content on a page. +>It starts with computer vision, which classifies a page into one of 20 possible types. Content is then interpreted by a machine learning model trained to identify the key attributes on a page based on its type. +>The result is a website transformed into clean-structured data (like JSON or CSV), ready for your application. + +## Installation and Setup + +Read [instructions](https://docs.diffbot.com/reference/authentication) how to get the Diffbot API Token. + +## Document Loader + +See a [usage example](../modules/indexes/document_loaders/examples/diffbot.ipynb). + +```python +from langchain.document_loaders import DiffbotLoader +``` diff --git a/docs/integrations/openai.md b/docs/integrations/openai.md index 2e26b58b..958192dc 100644 --- a/docs/integrations/openai.md +++ b/docs/integrations/openai.md @@ -1,40 +1,50 @@ # OpenAI -This page covers how to use the OpenAI ecosystem within LangChain. -It is broken into two parts: installation and setup, and then references to specific OpenAI wrappers. +>[OpenAI](https://en.wikipedia.org/wiki/OpenAI) is American artificial intelligence (AI) research laboratory +> consisting of the non-profit `OpenAI Incorporated` +> and its for-profit subsidiary corporation `OpenAI Limited Partnership`. +> `OpenAI` conducts AI research with the declared intention of promoting and developing a friendly AI. +> `OpenAI` systems run on an `Azure`-based supercomputing platform from `Microsoft`. + +>The [OpenAI API](https://platform.openai.com/docs/models) is powered by a diverse set of models with different capabilities and price points. +> +>[ChatGPT](https://chat.openai.com) is the Artificial Intelligence (AI) chatbot developed by `OpenAI`. ## Installation and Setup -- Install the Python SDK with `pip install openai` +- Install the Python SDK with +```bash +pip install openai +``` - Get an OpenAI api key and set it as an environment variable (`OPENAI_API_KEY`) -- If you want to use OpenAI's tokenizer (only available for Python 3.9+), install it with `pip install tiktoken` +- If you want to use OpenAI's tokenizer (only available for Python 3.9+), install it +```bash +pip install tiktoken +``` -## Wrappers -### LLM +## LLM -There exists an OpenAI LLM wrapper, which you can access with ```python from langchain.llms import OpenAI ``` -If you are using a model hosted on Azure, you should use different wrapper for that: +If you are using a model hosted on `Azure`, you should use different wrapper for that: ```python from langchain.llms import AzureOpenAI ``` -For a more detailed walkthrough of the Azure wrapper, see [this notebook](../modules/models/llms/integrations/azure_openai_example.ipynb) +For a more detailed walkthrough of the `Azure` wrapper, see [this notebook](../modules/models/llms/integrations/azure_openai_example.ipynb) -### Embeddings +## Text Embedding Model -There exists an OpenAI Embeddings wrapper, which you can access with ```python from langchain.embeddings import OpenAIEmbeddings ``` For a more detailed walkthrough of this, see [this notebook](../modules/models/text_embedding/examples/openai.ipynb) -### Tokenizer +## Tokenizer There are several places you can use the `tiktoken` tokenizer. By default, it is used to count tokens for OpenAI LLMs. @@ -46,10 +56,18 @@ CharacterTextSplitter.from_tiktoken_encoder(...) ``` For a more detailed walkthrough of this, see [this notebook](../modules/indexes/text_splitters/examples/tiktoken.ipynb) -### Moderation -You can also access the OpenAI content moderation endpoint with +## Chain + +See a [usage example](../modules/chains/examples/moderation.ipynb). ```python from langchain.chains import OpenAIModerationChain ``` -For a more detailed walkthrough of this, see [this notebook](../modules/chains/examples/moderation.ipynb) + +## Document Loader + +See a [usage example](../modules/indexes/document_loaders/examples/chatgpt_loader.ipynb). + +```python +from langchain.document_loaders.chatgpt import ChatGPTLoader +``` diff --git a/docs/modules/indexes/document_loaders/examples/confluence.ipynb b/docs/modules/indexes/document_loaders/examples/confluence.ipynb index b4ccfc80..27864464 100644 --- a/docs/modules/indexes/document_loaders/examples/confluence.ipynb +++ b/docs/modules/indexes/document_loaders/examples/confluence.ipynb @@ -8,13 +8,11 @@ "\n", ">[Confluence](https://www.atlassian.com/software/confluence) is a wiki collaboration platform that saves and organizes all of the project-related material. `Confluence` is a knowledge base that primarily handles content management activities. \n", "\n", - "A loader for `Confluence` pages.\n", + "A loader for `Confluence` pages currently supports both `username/api_key` and `Oauth2 login`.\n", + "See [instructions](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).\n", "\n", "\n", - "This currently supports both `username/api_key` and `Oauth2 login`.\n", - "\n", - "\n", - "Specify a list page_ids and/or space_key to load in the corresponding pages into Document objects, if both are specified the union of both sets will be returned.\n", + "Specify a list `page_id`-s and/or `space_key` to load in the corresponding pages into Document objects, if both are specified the union of both sets will be returned.\n", "\n", "\n", "You can also specify a boolean `include_attachments` to include attachments, this is set to False by default, if set to True all attachments will be downloaded and ConfluenceReader will extract the text from the attachments and add it to the Document object. Currently supported attachment types are: `PDF`, `PNG`, `JPEG/JPG`, `SVG`, `Word` and `Excel`.\n", diff --git a/docs/modules/indexes/document_loaders/examples/diffbot.ipynb b/docs/modules/indexes/document_loaders/examples/diffbot.ipynb index 46184b5c..571b4bf6 100644 --- a/docs/modules/indexes/document_loaders/examples/diffbot.ipynb +++ b/docs/modules/indexes/document_loaders/examples/diffbot.ipynb @@ -11,7 +11,7 @@ ">It starts with computer vision, which classifies a page into one of 20 possible types. Content is then interpreted by a machine learning model trained to identify the key attributes on a page based on its type.\n", ">The result is a website transformed into clean structured data (like JSON or CSV), ready for your application.\n", "\n", - "This covers how to extract HTML documents from a list of URLs using the [Diffbot extract API](https://www.diffbot.com/products/extract/), into a document format that we can use downstream." + "This covers how to extract HTML documents from a list of URLs using the [Diffbot extract API](https://www.diffbot.com/products/extract/), into a document format that we can use downstream.\n" ] }, { @@ -31,7 +31,9 @@ "id": "6fffec88", "metadata": {}, "source": [ - "The Diffbot Extract API Requires an API token. Once you have it, you can extract the data from the previous URLs\n" + "The Diffbot Extract API Requires an API token. Once you have it, you can extract the data.\n", + "\n", + "Read [instructions](https://docs.diffbot.com/reference/authentication) how to get the Diffbot API Token." ] }, {