Commit Graph

2 Commits (897cad58dc09374667f95b31383f7dd18d8eddfe)

Author SHA1 Message Date
Klein Tahiraj d3d4503ce2
Remove redundant .docx loader (closes #1716) + update how_to_guides.rst (#1891)
In https://github.com/hwchase17/langchain/issues/1716 , it was
identified that there were two .py files performing similar tasks. As a
resolution, one of the files has been removed, as its purpose had
already been fulfilled by the other file. Additionally, the init has
been updated accordingly.

Furthermore, the how_to_guides.rst file has been updated to include
links to documentation that was previously missing. This was deemed
necessary as the existing list on
https://langchain.readthedocs.io/en/latest/modules/document_loaders/how_to_guides.html
was incomplete, causing confusion for users who rely on the full list of
documentation on the left sidebar of the website.
1 year ago
Matt Robinson 2f15c11b87
feat: document loader for MS Word documents (#1282)
### Summary

Adds a document loader for MS Word Documents. Works with both `.docx`
and `.doc` files as longer as the user has installed
`unstructured>=0.4.11`.

### Testing

The follow workflow test the loader for both `.doc` and `.docx` files
using example docs from the `unstructured` repo.

#### `.docx`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.docx"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```

#### `.doc`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.doc"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```
1 year ago