langchain/tests/integration_tests/examples
Matt Robinson a97e4252e3
feat: add UnstructuredExcelLoader for .xlsx and .xls files (#5617)
# Unstructured Excel Loader

Adds an `UnstructuredExcelLoader` class for `.xlsx` and `.xls` files.
Works with `unstructured>=0.6.7`. A plain text representation of the
Excel file will be available under the `page_content` attribute in the
doc. If you use the loader in `"elements"` mode, an HTML representation
of the Excel file will be available under the `text_as_html` metadata
key. Each sheet in the Excel document is its own document.

### Testing

```python
from langchain.document_loaders import UnstructuredExcelLoader

loader = UnstructuredExcelLoader(
    "example_data/stanley-cups.xlsx",
    mode="elements"
)
docs = loader.load()
```

## Who can review?

@hwchase17
@eyurtsev
2023-06-03 12:44:12 -07:00
..
default-encoding.py Add PythonLoader which auto-detects encoding of Python files (#3311) 2023-04-21 10:47:57 -07:00
example-utf8.html Add ability to pass kwargs to loader classes in DirectoryLoader, add ability to modify encoding and BeautifulSoup behaviour in BSHTMLLoader (#2275) 2023-04-01 12:48:27 -07:00
example.html
example.json JSON loader (#4067) 2023-05-05 14:48:13 -07:00
facebook_chat.json Refactor TelegramChatLoader and FacebookChatLoader classes and add tests (#3863) 2023-05-03 15:59:19 -07:00
fake.odt feat: add loader for open office odt files (#4405) 2023-05-10 01:37:17 -07:00
hello.msg Harrison/msg files (#2375) 2023-04-04 06:48:34 -07:00
hello.pdf
layout-parser-paper.pdf
non-utf8-encoding.py Add PythonLoader which auto-detects encoding of Python files (#3311) 2023-04-21 10:47:57 -07:00
sitemap.xml Harrison/sitemap local (#4704) 2023-05-14 22:04:38 -07:00
slack_export.zip Add Slack Directory Loader (#2841) 2023-04-13 21:31:59 -07:00
stanley-cups.xlsx feat: add UnstructuredExcelLoader for .xlsx and .xls files (#5617) 2023-06-03 12:44:12 -07:00
whatsapp_chat.txt Update WhatsAppChatLoader to include the character ~ in the sender name (#4420) 2023-05-09 15:00:04 -07:00