mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
3c489be773
### Summary Adds a post-processing method for Unstructured loaders that allows users to optionally modify or clean extracted elements. ### Testing ```python from langchain.document_loaders import UnstructuredFileLoader from unstructured.cleaners.core import clean_extra_whitespace loader = UnstructuredFileLoader( "./example_data/layout-parser-paper.pdf", mode="elements", post_processors=[clean_extra_whitespace], ) docs = loader.load() docs[:5] ``` ### Reviewrs - @rlancemartin - @eyurtsev - @hwchase17 |
||
---|---|---|
.. | ||
document_loaders/integrations | ||
document_transformers | ||
retrievers | ||
text_embedding/integrations | ||
vectorstores/integrations |