langchain/tests/integration_tests
Matt Robinson a97e4252e3
feat: add UnstructuredExcelLoader for .xlsx and .xls files (#5617)
# Unstructured Excel Loader

Adds an `UnstructuredExcelLoader` class for `.xlsx` and `.xls` files.
Works with `unstructured>=0.6.7`. A plain text representation of the
Excel file will be available under the `page_content` attribute in the
doc. If you use the loader in `"elements"` mode, an HTML representation
of the Excel file will be available under the `text_as_html` metadata
key. Each sheet in the Excel document is its own document.

### Testing

```python
from langchain.document_loaders import UnstructuredExcelLoader

loader = UnstructuredExcelLoader(
    "example_data/stanley-cups.xlsx",
    mode="elements"
)
docs = loader.load()
```

## Who can review?

@hwchase17
@eyurtsev
2023-06-03 12:44:12 -07:00
..
agent Add Multi-CSV/DF support in CSV and DataFrame Toolkits (#5009) 2023-05-25 14:23:11 -07:00
cache feat: add Momento as a standard cache and chat message history provider (#5221) 2023-05-25 19:13:21 -07:00
callbacks Update Tracer Auth / Reduce Num Calls (#5517) 2023-06-02 12:13:56 -07:00
chains Harrison/neo4j (#5078) 2023-05-22 07:31:48 -07:00
chat_models Harrison/vertex (#5049) 2023-05-24 15:51:12 -07:00
client Add Feedback Methods + Evaluation examples (#5166) 2023-05-31 11:14:27 -07:00
document_loaders feat: add UnstructuredExcelLoader for .xlsx and .xls files (#5617) 2023-06-03 12:44:12 -07:00
embeddings encoding_kwargs for InstructEmbeddings (#5450) 2023-05-30 11:57:04 -07:00
examples feat: add UnstructuredExcelLoader for .xlsx and .xls files (#5617) 2023-06-03 12:44:12 -07:00
llms Harrison/prediction guard update (#5404) 2023-05-29 07:14:59 -07:00
memory feat: add Momento as a standard cache and chat message history provider (#5221) 2023-05-25 19:13:21 -07:00
prompts Cleanup integration test dir (#3308) 2023-04-21 09:44:09 -07:00
retrievers tfidf retriever (#5114) 2023-05-24 10:02:09 -07:00
utilities Tedma4/twilio tool (#5136) 2023-05-25 19:19:22 -07:00
vectorstores fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug (#5584) 2023-06-02 11:12:48 -07:00
__init__.py initial commit 2022-10-24 14:51:15 -07:00
.env.example adding MongoDBAtlasVectorSearch (#5338) 2023-05-30 07:59:01 -07:00
conftest.py feat: improve pinecone tests (#2806) 2023-04-13 21:49:31 -07:00
test_document_transformers.py Contextual compression retriever (#2915) 2023-04-20 17:01:14 -07:00
test_nlp_text_splitters.py OptimizedPrompt -- k-shot example choice backed by semantic search (#91) 2022-11-09 21:15:42 -08:00
test_pdf_pagesplitter.py cleanup: unify 3 different pdf loaders, rename PagedPDFSplitter (#1615) 2023-03-13 23:06:50 -07:00
test_schema.py Add 'get_token_ids' method (#4784) 2023-05-22 13:17:26 +00:00
test_text_splitter.py Fix TextSplitter.from_tiktoken(#4361) 2023-05-08 16:36:38 -07:00