langchain/docs
Matt Robinson 3c489be773
feat: optional post-processing for Unstructured loaders (#7850)
### Summary

Adds a post-processing method for Unstructured loaders that allows users
to optionally modify or clean extracted elements.

### Testing

```python
from langchain.document_loaders import UnstructuredFileLoader
from unstructured.cleaners.core import clean_extra_whitespace

loader = UnstructuredFileLoader(
    "./example_data/layout-parser-paper.pdf",
    mode="elements",
    post_processors=[clean_extra_whitespace],
)

docs = loader.load()
docs[:5]
```


### Reviewrs
  - @rlancemartin
  - @eyurtsev
  - @hwchase17
2023-07-17 12:13:05 -07:00
..
api_reference [Breaking] Update Evaluation Functionality (#7388) 2023-07-13 02:13:06 -07:00
docs_skeleton docs: Mendable Search Improvements (#7744) 2023-07-15 10:19:21 -04:00
extras feat: optional post-processing for Unstructured loaders (#7850) 2023-07-17 12:13:05 -07:00
snippets Implement async API for Qdrant vector store (#7704) 2023-07-15 09:33:26 -04:00
.local_build.sh Fix make docs_build and related scripts (#7276) 2023-07-11 22:05:14 -04:00
package-lock.json docs: New experimental UI for Mendable Search (#6558) 2023-07-03 20:52:13 +01:00
requirements.txt Page per class-style api reference (#6560) 2023-06-30 09:23:32 -07:00