forked from Archives/langchain
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2f15c11b87
### Summary Adds a document loader for MS Word Documents. Works with both `.docx` and `.doc` files as longer as the user has installed `unstructured>=0.4.11`. ### Testing The follow workflow test the loader for both `.doc` and `.docx` files using example docs from the `unstructured` repo. #### `.docx` ```python from langchain.document_loaders import UnstructuredWordDocumentLoader filename = "../unstructured/example-docs/fake.docx" loader = UnstructuredWordDocumentLoader(filename) loader.load() ``` #### `.doc` ```python from langchain.document_loaders import UnstructuredWordDocumentLoader filename = "../unstructured/example-docs/fake.doc" loader = UnstructuredWordDocumentLoader(filename) loader.load() ``` |
1 year ago | |
---|---|---|
.. | ||
__init__.py | 1 year ago | |
airbyte_json.py | 1 year ago | |
azlyrics.py | 1 year ago | |
base.py | 1 year ago | |
college_confidential.py | 1 year ago | |
directory.py | 1 year ago | |
docx.py | 1 year ago | |
email.py | 1 year ago | |
evernote.py | 1 year ago | |
facebook_chat.py | 1 year ago | |
gcs_directory.py | 1 year ago | |
gcs_file.py | 1 year ago | |
gitbook.py | 1 year ago | |
googledrive.py | 1 year ago | |
gutenberg.py | 1 year ago | |
hn.py | 1 year ago | |
html.py | 1 year ago | |
imsdb.py | 1 year ago | |
notebook.py | 1 year ago | |
notion.py | 1 year ago | |
obsidian.py | 1 year ago | |
online_pdf.py | 1 year ago | |
paged_pdf.py | 1 year ago | |
pdf.py | 1 year ago | |
powerpoint.py | 1 year ago | |
readthedocs.py | 1 year ago | |
roam.py | 1 year ago | |
s3_directory.py | 1 year ago | |
s3_file.py | 1 year ago | |
srt.py | 1 year ago | |
telegram.py | 1 year ago | |
text.py | 1 year ago | |
unstructured.py | 1 year ago | |
url.py | 1 year ago | |
web_base.py | 1 year ago | |
word_document.py | 1 year ago | |
youtube.py | 1 year ago |