langchain/tests/integration_tests/document_loaders
Matt Robinson a97e4252e3
feat: add UnstructuredExcelLoader for .xlsx and .xls files (#5617)
# Unstructured Excel Loader

Adds an `UnstructuredExcelLoader` class for `.xlsx` and `.xls` files.
Works with `unstructured>=0.6.7`. A plain text representation of the
Excel file will be available under the `page_content` attribute in the
doc. If you use the loader in `"elements"` mode, an HTML representation
of the Excel file will be available under the `text_as_html` metadata
key. Each sheet in the Excel document is its own document.

### Testing

```python
from langchain.document_loaders import UnstructuredExcelLoader

loader = UnstructuredExcelLoader(
    "example_data/stanley-cups.xlsx",
    mode="elements"
)
docs = loader.load()
```

## Who can review?

@hwchase17
@eyurtsev
2023-06-03 12:44:12 -07:00
..
parsers Add html parsers (#4874) 2023-05-17 22:39:11 -04:00
__init__.py Add new iFixit document loader (#1333) 2023-02-27 20:40:20 -08:00
test_arxiv.py Arxiv document loader (#3627) 2023-04-26 21:04:56 -07:00
test_bigquery.py Harrison/big query (#2100) 2023-03-28 08:17:22 -07:00
test_bilibili.py Remove unnecessary spaces from document object’s page_content of BiliBiliLoader (#4619) 2023-05-16 13:13:57 -04:00
test_blockchain.py Enhancement: option to Get All Tokens with a single Blockchain Document Loader call (#3797) 2023-05-03 15:46:44 -07:00
test_confluence.py Several confluence loader improvements (#3300) 2023-04-23 15:06:10 -07:00
test_dataframe.py rm pandas dependency (#2102) 2023-03-28 08:38:19 -07:00
test_duckdb.py Harrison/duckdb (#2064) 2023-03-27 19:51:34 -07:00
test_email.py Harrison/msg files (#2375) 2023-04-04 06:48:34 -07:00
test_excel.py feat: add UnstructuredExcelLoader for .xlsx and .xls files (#5617) 2023-06-03 12:44:12 -07:00
test_facebook_chat.py Refactor TelegramChatLoader and FacebookChatLoader classes and add tests (#3863) 2023-05-03 15:59:19 -07:00
test_figma.py Harrison/figma doc loader (#1908) 2023-03-22 19:57:46 -07:00
test_gitbook.py Harrison/gitbook (#2044) 2023-03-28 15:28:33 -07:00
test_github.py DocumentLoader for GitHub (#5408) 2023-05-29 20:11:21 -07:00
test_ifixit.py Add new iFixit document loader (#1333) 2023-02-27 20:40:20 -08:00
test_joplin.py Add Joplin document loader (#5153) 2023-05-24 12:31:55 -07:00
test_json_loader.py JSON loader (#4067) 2023-05-05 14:48:13 -07:00
test_mastodon.py Add Mastodon toots loader (#5036) 2023-05-22 16:43:07 -07:00
test_max_compute.py add maxcompute (#5533) 2023-06-01 00:54:42 -07:00
test_modern_treasury.py Dev2049/add modern treasury (#3924) 2023-05-01 20:28:02 -07:00
test_odt.py feat: add loader for open office odt files (#4405) 2023-05-10 01:37:17 -07:00
test_pdf.py Dev2049/pypdfium2 (#4209) 2023-05-05 17:55:31 -07:00
test_pyspark_dataframe_loader.py Harrison/spark reader (#5405) 2023-05-29 20:23:17 -07:00
test_python.py Add PythonLoader which auto-detects encoding of Python files (#3311) 2023-04-21 10:47:57 -07:00
test_sitemap.py Harrison/sitemap local (#4704) 2023-05-14 22:04:38 -07:00
test_slack.py Add Slack Directory Loader (#2841) 2023-04-13 21:31:59 -07:00
test_spreedly.py Harrison/spreedly (#3937) 2023-05-01 20:56:56 -07:00
test_stripe.py Dev2049/add modern treasury (#3924) 2023-05-01 20:28:02 -07:00
test_unstructured.py feat: batch multiple files in a single Unstructured API request (#4525) 2023-05-21 20:48:20 -07:00
test_url_playwright.py Harrison/playwright selector (#3185) 2023-04-19 16:54:15 -07:00
test_url.py add continue to fix 'continue_on_failure' parameter for URL doc loader (#2735) 2023-04-11 21:12:39 -07:00
test_whatsapp_chat.py Update WhatsAppChatLoader to include the character ~ in the sender name (#4420) 2023-05-09 15:00:04 -07:00