mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
3c489be773
### Summary Adds a post-processing method for Unstructured loaders that allows users to optionally modify or clean extracted elements. ### Testing ```python from langchain.document_loaders import UnstructuredFileLoader from unstructured.cleaners.core import clean_extra_whitespace loader = UnstructuredFileLoader( "./example_data/layout-parser-paper.pdf", mode="elements", post_processors=[clean_extra_whitespace], ) docs = loader.load() docs[:5] ``` ### Reviewrs - @rlancemartin - @eyurtsev - @hwchase17 |
||
---|---|---|
.. | ||
api_reference | ||
docs_skeleton | ||
extras | ||
snippets | ||
.local_build.sh | ||
package-lock.json | ||
requirements.txt |