You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/langchain/document_loaders
Matt Robinson 2f15c11b87
feat: document loader for MS Word documents (#1282)
### Summary

Adds a document loader for MS Word Documents. Works with both `.docx`
and `.doc` files as longer as the user has installed
`unstructured>=0.4.11`.

### Testing

The follow workflow test the loader for both `.doc` and `.docx` files
using example docs from the `unstructured` repo.

#### `.docx`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.docx"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```

#### `.doc`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.doc"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```
1 year ago
..
__init__.py feat: document loader for MS Word documents (#1282) 1 year ago
airbyte_json.py Harrison/airbyte (#989) 1 year ago
azlyrics.py clean up loaders (#1178) 1 year ago
base.py Harrison/unstructured support (#903) 1 year ago
college_confidential.py clean up loaders (#1178) 1 year ago
directory.py directory loader improvements (#1162) 1 year ago
docx.py Harrison/unstructured structured (#1004) 1 year ago
email.py Harrison/unstructured structured (#1004) 1 year ago
evernote.py Update and rename everynote.py to evernote.py (#1060) 1 year ago
facebook_chat.py Harrison/fb loader (#1277) 1 year ago
gcs_directory.py Harrison/add roam loader (#939) 1 year ago
gcs_file.py Harrison/add roam loader (#939) 1 year ago
gitbook.py Harrison/updating docs (#1196) 1 year ago
googledrive.py Refactor some loops into list comprehensions (#1185) 1 year ago
gutenberg.py gutenberg books (#946) 1 year ago
hn.py Refactor some loops into list comprehensions (#1185) 1 year ago
html.py Harrison/unstructured structured (#1004) 1 year ago
imsdb.py clean up loaders (#1178) 1 year ago
notebook.py cleanup (#1274) 1 year ago
notion.py Harrison/unstructured support (#903) 1 year ago
obsidian.py Harrison/add roam loader (#939) 1 year ago
online_pdf.py fix path (#1168) 1 year ago
paged_pdf.py Refactor some loops into list comprehensions (#1185) 1 year ago
pdf.py Harrison/unstructured structured (#1004) 1 year ago
powerpoint.py clean up loaders (#1178) 1 year ago
readthedocs.py Update readthedocs.py (#943) 1 year ago
roam.py Harrison/add roam loader (#939) 1 year ago
s3_directory.py Harrison/add roam loader (#939) 1 year ago
s3_file.py Harrison/add roam loader (#939) 1 year ago
srt.py add srt loader (#1140) 1 year ago
telegram.py fix telegram imports (#1110) 1 year ago
text.py Harrison/0083 (#996) 1 year ago
unstructured.py Harrison/unstructured io (#1200) 1 year ago
url.py feat: adds `UnstructuredURLLoader` for loading data from urls (#979) 1 year ago
web_base.py Harrison/updating docs (#1196) 1 year ago
word_document.py feat: document loader for MS Word documents (#1282) 1 year ago
youtube.py fix to specific language transcript (#1231) 1 year ago