langchain/libs/community/langchain_community/document_loaders/tencent_cos_file.py
Ran c3f8733aef
fix: correct spelling mistakes of "seperate, intialise, pre-defined" (#14647)
fix spellings

**seperate -> separate**: found more occurrences, see
https://github.com/langchain-ai/langchain/pull/14602
**initialise -> intialize**: the latter is more common in the repo
**pre-defined > predefined**: adding a comma after a prefix is a
delicate matter, but this is a generally accepted word

also, another word that appears in the repo is "fs" (stands for
filesystem), e.g., in `libs/core/langchain_core/prompts/loading.py`
` """Unified method for loading a prompt from LangChainHub or local
fs."""`
Isn't "filesystem" better?
2023-12-22 11:49:35 -08:00

49 lines
1.7 KiB
Python

import os
import tempfile
from typing import Any, Iterator, List
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseLoader
from langchain_community.document_loaders.unstructured import UnstructuredFileLoader
class TencentCOSFileLoader(BaseLoader):
"""Load from `Tencent Cloud COS` file."""
def __init__(self, conf: Any, bucket: str, key: str):
"""Initialize with COS config, bucket and key name.
:param conf(CosConfig): COS config.
:param bucket(str): COS bucket.
:param key(str): COS file key.
"""
self.conf = conf
self.bucket = bucket
self.key = key
def load(self) -> List[Document]:
return list(self.lazy_load())
def lazy_load(self) -> Iterator[Document]:
"""Load documents."""
try:
from qcloud_cos import CosS3Client
except ImportError:
raise ImportError(
"Could not import cos-python-sdk-v5 python package. "
"Please install it with `pip install cos-python-sdk-v5`."
)
# initialize a client
client = CosS3Client(self.conf)
with tempfile.TemporaryDirectory() as temp_dir:
file_path = f"{temp_dir}/{self.bucket}/{self.key}"
os.makedirs(os.path.dirname(file_path), exist_ok=True)
# Download the file to a destination
client.download_file(
Bucket=self.bucket, Key=self.key, DestFilePath=file_path
)
loader = UnstructuredFileLoader(file_path)
# UnstructuredFileLoader not implement lazy_load yet
return iter(loader.load())