mirror of
https://github.com/hwchase17/langchain
synced 2024-10-31 15:20:26 +00:00
7df2dfc4c2
Description: This PR adds support for loading documents from Huawei OBS (Object Storage Service) in Langchain. OBS is a cloud-based object storage service provided by Huawei Cloud. With this enhancement, Langchain users can now easily access and load documents stored in Huawei OBS directly into the system. Key Changes: - Added a new document loader module specifically for Huawei OBS integration. - Implemented the necessary logic to authenticate and connect to Huawei OBS using access credentials. - Enabled the loading of individual documents from a specified bucket and object key in Huawei OBS. - Provided the option to specify custom authentication information or obtain security tokens from Huawei Cloud ECS for easy access. How to Test: 1. Ensure the required package "esdk-obs-python" is installed. 2. Configure the endpoint, access key, secret key, and bucket details for Huawei OBS in the Langchain settings. 3. Load documents from Huawei OBS using the updated document loader module. 4. Verify that documents are successfully retrieved and loaded into Langchain for further processing. Please review this PR and let us know if any further improvements are needed. Your feedback is highly appreciated! @rlancemartin, @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
181 lines
4.2 KiB
Plaintext
181 lines
4.2 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4394a872",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Huawei OBS File\n",
|
|
"The following code demonstrates how to load an object from the Huawei OBS (Object Storage Service) as document."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c43d811b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Install the required package\n",
|
|
"# pip install esdk-obs-python"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "5e16bae6",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.document_loaders.obs_file import OBSFileLoader"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "75cc7e7c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"endpoint = \"your-endpoint\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "f9816984",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from obs import ObsClient\n",
|
|
"obs_client = ObsClient(access_key_id=\"your-access-key\", secret_access_key=\"your-secret-key\", server=endpoint)\n",
|
|
"loader = OBSFileLoader(\"your-bucket-name\", \"your-object-key\", client=obs_client)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "6143b39b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "633e05ca",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Each Loader with Separate Authentication Information\n",
|
|
"If you don't need to reuse OBS connections between different loaders, you can directly configure the `config`. The loader will use the config information to initialize its own OBS client."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "a5dd6a5d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Configure your access credentials\\n\n",
|
|
"config = {\n",
|
|
" \"ak\": \"your-access-key\",\n",
|
|
" \"sk\": \"your-secret-key\"\n",
|
|
"}\n",
|
|
"loader = OBSFileLoader(\"your-bucket-name\", \"your-object-key\",endpoint=endpoint, config=config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9a741f1c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "1e2e611c",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Get Authentication Information from ECS\n",
|
|
"If your langchain is deployed on Huawei Cloud ECS and [Agency is set up](https://support.huaweicloud.com/intl/en-us/usermanual-ecs/ecs_03_0166.html#section7), the loader can directly get the security token from ECS without needing access key and secret key. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "338fafef",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"config = {\"get_token_from_ecs\": True}\n",
|
|
"loader = OBSFileLoader(\"your-bucket-name\", \"your-object-key\", endpoint=endpoint, config=config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "73976c55",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b77aa18c",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Access a Publicly Accessible Object\n",
|
|
"If the object you want to access allows anonymous user access (anonymous users have `GetObject` permission), you can directly load the object without configuring the `config` parameter."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "df83d121",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader = OBSFileLoader(\"your-bucket-name\", \"your-object-key\", endpoint=endpoint)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "82a844ba",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader.load()"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.7"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|