You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/tests/integration_tests
Matt Robinson a97e4252e3
feat: add `UnstructuredExcelLoader` for `.xlsx` and `.xls` files (#5617)
# Unstructured Excel Loader

Adds an `UnstructuredExcelLoader` class for `.xlsx` and `.xls` files.
Works with `unstructured>=0.6.7`. A plain text representation of the
Excel file will be available under the `page_content` attribute in the
doc. If you use the loader in `"elements"` mode, an HTML representation
of the Excel file will be available under the `text_as_html` metadata
key. Each sheet in the Excel document is its own document.

### Testing

```python
from langchain.document_loaders import UnstructuredExcelLoader

loader = UnstructuredExcelLoader(
    "example_data/stanley-cups.xlsx",
    mode="elements"
)
docs = loader.load()
```

## Who can review?

@hwchase17
@eyurtsev
1 year ago
..
agent Add Multi-CSV/DF support in CSV and DataFrame Toolkits (#5009) 1 year ago
cache feat: add Momento as a standard cache and chat message history provider (#5221) 1 year ago
callbacks Update Tracer Auth / Reduce Num Calls (#5517) 1 year ago
chains Harrison/neo4j (#5078) 1 year ago
chat_models Harrison/vertex (#5049) 1 year ago
client Add Feedback Methods + Evaluation examples (#5166) 1 year ago
document_loaders feat: add `UnstructuredExcelLoader` for `.xlsx` and `.xls` files (#5617) 1 year ago
embeddings `encoding_kwargs` for InstructEmbeddings (#5450) 1 year ago
examples feat: add `UnstructuredExcelLoader` for `.xlsx` and `.xls` files (#5617) 1 year ago
llms Harrison/prediction guard update (#5404) 1 year ago
memory feat: add Momento as a standard cache and chat message history provider (#5221) 1 year ago
prompts Cleanup integration test dir (#3308) 1 year ago
retrievers tfidf retriever (#5114) 1 year ago
utilities Tedma4/twilio tool (#5136) 1 year ago
vectorstores fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug (#5584) 1 year ago
.env.example adding MongoDBAtlasVectorSearch (#5338) 1 year ago
__init__.py initial commit 2 years ago
conftest.py feat: improve pinecone tests (#2806) 1 year ago
test_document_transformers.py Contextual compression retriever (#2915) 1 year ago
test_nlp_text_splitters.py OptimizedPrompt -- k-shot example choice backed by semantic search (#91) 2 years ago
test_pdf_pagesplitter.py cleanup: unify 3 different pdf loaders, rename PagedPDFSplitter (#1615) 2 years ago
test_schema.py Add 'get_token_ids' method (#4784) 1 year ago
test_text_splitter.py Fix TextSplitter.from_tiktoken(#4361) 1 year ago