community: fix `DirectoryLoader` progress bar (#19821)

**Description:** currently, the `DirectoryLoader` progress-bar maximum value is based on an incorrect number of files to process

In langchain_community/document_loaders/directory.py:127:

```python
        paths = p.rglob(self.glob) if self.recursive else p.glob(self.glob)
        items = [
            path
            for path in paths
            if not (self.exclude and any(path.match(glob) for glob in self.exclude))
        ]
```

`paths` returns both files and directories. `items` is later used to determine the maximum value of the progress-bar which gives an incorrect progress indication.
pull/20424/head^2
Tomer Cagan 2 months ago committed by GitHub
parent 984e7e36c2
commit 463160c3f6
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -129,6 +129,7 @@ class DirectoryLoader(BaseLoader):
path
for path in paths
if not (self.exclude and any(path.match(glob) for glob in self.exclude))
and path.is_file()
]
if self.sample_size > 0:

Loading…
Cancel
Save