You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/tests/integration_tests/examples
Matt Robinson 0498dad562
feat: enable `UnstructuredEmailLoader` to process attachments (#6977)
### Summary

Updates `UnstructuredEmailLoader` so that it can process attachments in
addition to the e-mail content. The loader will process attachments if
the `process_attachments` kwarg is passed when the loader is
instantiated.

### Testing

```python

file_path = "fake-email-attachment.eml"
loader = UnstructuredEmailLoader(
    file_path, mode="elements", process_attachments=True
)
docs = loader.load()
docs[-1]
```

### Reviewers

-  @rlancemartin 
-  @eyurtsev
- @hwchase17
1 year ago
..
README.org feat: Add `UnstructuredOrgModeLoader` (#6842) 1 year ago
README.rst feat: Add `UnstructuredRSTLoader` (#6594) 1 year ago
brandfetch-brandfetch-2.0.0-resolved.json Add support for passing headers and search params to openai openapi chain (#6782) 1 year ago
default-encoding.py Add PythonLoader which auto-detects encoding of Python files (#3311) 1 year ago
example-utf8.html Add ability to pass kwargs to loader classes in `DirectoryLoader`, add ability to modify encoding and BeautifulSoup behaviour in `BSHTMLLoader` (#2275) 1 year ago
example.html Add HTML document_loader that includes page title metadata (#1720) 2 years ago
example.json JSON loader (#4067) 1 year ago
example.mht Added a MHTML document loader (#6311) 1 year ago
facebook_chat.json Refactor TelegramChatLoader and FacebookChatLoader classes and add tests (#3863) 1 year ago
factbook.xml feat: Add `UnstructuredXMLLoader` for `.xml` files (#5955) 1 year ago
fake-email-attachment.eml feat: enable `UnstructuredEmailLoader` to process attachments (#6977) 1 year ago
fake.odt feat: add loader for open office odt files (#4405) 1 year ago
hello.msg Harrison/msg files (#2375) 1 year ago
hello.pdf Harrison/format agent instructions (#973) 2 years ago
hello_world.js feat (documents): add a source code loader based on AST manipulation (#6486) 1 year ago
hello_world.py feat (documents): add a source code loader based on AST manipulation (#6486) 1 year ago
layout-parser-paper.pdf Harrison/remote paths pdf (#1544) 2 years ago
non-utf8-encoding.py Add PythonLoader which auto-detects encoding of Python files (#3311) 1 year ago
sitemap.xml Harrison/sitemap local (#4704) 1 year ago
slack_export.zip Add Slack Directory Loader (#2841) 1 year ago
stanley-cups.csv feat: Add `UnstructuredCSVLoader` for CSV files (#5844) 1 year ago
stanley-cups.xlsx feat: add `UnstructuredExcelLoader` for `.xlsx` and `.xls` files (#5617) 1 year ago
whatsapp_chat.txt Enhancement : Ignore deleted messages and media in WhatsAppChatLoader (#6839) 1 year ago

README.rst

Example Docs
------------

The sample docs directory contains the following files:

-  ``example-10k.html`` - A 10-K SEC filing in HTML format
-  ``layout-parser-paper.pdf`` - A PDF copy of the layout parser paper
-  ``factbook.xml``/``factbook.xsl`` - Example XML/XLS files that you
   can use to test stylesheets

These documents can be used to test out the parsers in the library. In
addition, here are instructions for pulling in some sample docs that are
too big to store in the repo.

XBRL 10-K
^^^^^^^^^

You can get an example 10-K in inline XBRL format using the following
``curl``. Note, you need to have the user agent set in the header or the
SEC site will reject your request.

.. code:: bash

   curl -O \
     -A '${organization} ${email}'
     https://www.sec.gov/Archives/edgar/data/311094/000117184321001344/0001171843-21-001344.txt

You can parse this document using the HTML parser.