langchain/docs/extras/integrations/document_loaders/huawei_obs_directory.ipynb
mpb159753 7df2dfc4c2
Add Support for Loading Documents from Huawei OBS (#8573)
Description:
This PR adds support for loading documents from Huawei OBS (Object
Storage Service) in Langchain. OBS is a cloud-based object storage
service provided by Huawei Cloud. With this enhancement, Langchain users
can now easily access and load documents stored in Huawei OBS directly
into the system.

Key Changes:
- Added a new document loader module specifically for Huawei OBS
integration.
- Implemented the necessary logic to authenticate and connect to Huawei
OBS using access credentials.
- Enabled the loading of individual documents from a specified bucket
and object key in Huawei OBS.
- Provided the option to specify custom authentication information or
obtain security tokens from Huawei Cloud ECS for easy access.

How to Test:
1. Ensure the required package "esdk-obs-python" is installed.
2. Configure the endpoint, access key, secret key, and bucket details
for Huawei OBS in the Langchain settings.
3. Load documents from Huawei OBS using the updated document loader
module.
4. Verify that documents are successfully retrieved and loaded into
Langchain for further processing.

Please review this PR and let us know if any further improvements are
needed. Your feedback is highly appreciated!

@rlancemartin, @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-01 09:30:30 -07:00

179 lines
3.9 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "c83b6a4c",
"metadata": {},
"source": [
"# Huawei OBS Directory\n",
"The following code demonstrates how to load objects from the Huawei OBS (Object Storage Service) as documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2191935",
"metadata": {},
"outputs": [],
"source": [
"# Install the required package\n",
"# pip install esdk-obs-python"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "55fca3b4",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import OBSDirectoryLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c3ed419f",
"metadata": {},
"outputs": [],
"source": [
"endpoint = \"your-endpoint\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "3428fd4e",
"metadata": {},
"outputs": [],
"source": [
"# Configure your access credentials\\n\n",
"config = {\n",
" \"ak\": \"your-access-key\",\n",
" \"sk\": \"your-secret-key\"\n",
"}\n",
"loader = OBSDirectoryLoader(\"your-bucket-name\", endpoint=endpoint, config=config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9beede9f",
"metadata": {},
"outputs": [],
"source": [
"loader.load()"
]
},
{
"cell_type": "markdown",
"id": "1e20a839",
"metadata": {},
"source": [
"## Specify a Prefix for Loading\n",
"If you want to load objects with a specific prefix from the bucket, you can use the following code:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "125f311d",
"metadata": {},
"outputs": [],
"source": [
"loader = OBSDirectoryLoader(\"your-bucket-name\", endpoint=endpoint, config=config, prefix=\"test_prefix\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3488037",
"metadata": {},
"outputs": [],
"source": [
"loader.load()"
]
},
{
"cell_type": "markdown",
"id": "84c82c0a",
"metadata": {},
"source": [
"## Get Authentication Information from ECS\n",
"If your langchain is deployed on Huawei Cloud ECS and [Agency is set up](https://support.huaweicloud.com/intl/en-us/usermanual-ecs/ecs_03_0166.html#section7), the loader can directly get the security token from ECS without needing access key and secret key. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1db99969",
"metadata": {},
"outputs": [],
"source": [
"config = {\"get_token_from_ecs\": True}\n",
"loader = OBSDirectoryLoader(\"your-bucket-name\", endpoint=endpoint, config=config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "57dd9f35",
"metadata": {},
"outputs": [],
"source": [
"loader.load()"
]
},
{
"cell_type": "markdown",
"id": "30205d25",
"metadata": {},
"source": [
"## Use a Public Bucket\n",
"If your bucket's bucket policy allows anonymous access (anonymous users have `listBucket` and `GetObject` permissions), you can directly load the objects without configuring the `config` parameter."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "4dfa2ef0",
"metadata": {},
"outputs": [],
"source": [
"loader = OBSDirectoryLoader(\"your-bucket-name\", endpoint=endpoint)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67d4c1d0",
"metadata": {},
"outputs": [],
"source": [
"loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}