Change Default GoogleDriveLoader Behavior to not Load Trashed Files (issue #5104) (#5220)

# Change Default GoogleDriveLoader Behavior to not Load Trashed Files
(issue #5104)

Fixes #5104

If the previous behavior of loading files that used to live in the
folder, but are now trashed, you can use the `load_trashed_files`
parameter:

```
loader = GoogleDriveLoader(
    folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
    recursive=False,
    load_trashed_files=True
)
```

As not loading trashed files should be expected behavior, should we
1. even provide the `load_trashed_files` parameter?
2. add documentation? Feels most users will stick with default behavior

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

DataLoaders
- @eyurtsev

Twitter: [@nicholasliu77](https://twitter.com/nicholasliu77)
searx_updates
Nicholas Liu 12 months ago committed by GitHub
parent eff31a3361
commit f0ea093de8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -31,6 +31,7 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
file_ids: Optional[List[str]] = None
recursive: bool = False
file_types: Optional[Sequence[str]] = None
load_trashed_files: bool = False
@root_validator
def validate_inputs(cls, values: Dict[str, Any]) -> Dict[str, Any]:
@ -215,8 +216,10 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
_files = files
returns = []
for file in _files:
if file["mimeType"] == "application/vnd.google-apps.document":
for file in files:
if file["trashed"] and not self.load_trashed_files:
continue
elif file["mimeType"] == "application/vnd.google-apps.document":
returns.append(self._load_document_from_id(file["id"])) # type: ignore
elif file["mimeType"] == "application/vnd.google-apps.spreadsheet":
returns.extend(self._load_sheet_from_id(file["id"])) # type: ignore
@ -224,7 +227,6 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
returns.extend(self._load_file_from_id(file["id"])) # type: ignore
else:
pass
return returns
def _fetch_files_recursive(
@ -238,7 +240,7 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
pageSize=1000,
includeItemsFromAllDrives=True,
supportsAllDrives=True,
fields="nextPageToken, files(id, name, mimeType, parents)",
fields="nextPageToken, files(id, name, mimeType, parents, trashed)",
)
.execute()
)

Loading…
Cancel
Save