langchain/libs/community/langchain_community/document_loaders/onedrive_file.py
Erick Friis c2a3021bb0
multiple: pydantic 2 compatibility, v0.3 (#26443)
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com>
Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com>
Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com>
Co-authored-by: ZhangShenao <15201440436@163.com>
Co-authored-by: Friso H. Kingma <fhkingma@gmail.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Morgante Pell <morgantep@google.com>
2024-09-13 14:38:45 -07:00

35 lines
992 B
Python

from __future__ import annotations
import tempfile
from typing import TYPE_CHECKING, List
from langchain_core.documents import Document
from pydantic import BaseModel, ConfigDict, Field
from langchain_community.document_loaders.base import BaseLoader
from langchain_community.document_loaders.unstructured import UnstructuredFileLoader
if TYPE_CHECKING:
from O365.drive import File
CHUNK_SIZE = 1024 * 1024 * 5
class OneDriveFileLoader(BaseLoader, BaseModel):
"""Load a file from `Microsoft OneDrive`."""
file: File = Field(...)
"""The file to load."""
model_config = ConfigDict(
arbitrary_types_allowed=True,
)
def load(self) -> List[Document]:
"""Load Documents"""
with tempfile.TemporaryDirectory() as temp_dir:
file_path = f"{temp_dir}/{self.file.name}"
self.file.download(to_path=temp_dir, chunk_size=CHUNK_SIZE)
loader = UnstructuredFileLoader(file_path)
return loader.load()