langchain/docs/extras/modules/data_connection
Matt Robinson 3c489be773
feat: optional post-processing for Unstructured loaders (#7850)
### Summary

Adds a post-processing method for Unstructured loaders that allows users
to optionally modify or clean extracted elements.

### Testing

```python
from langchain.document_loaders import UnstructuredFileLoader
from unstructured.cleaners.core import clean_extra_whitespace

loader = UnstructuredFileLoader(
    "./example_data/layout-parser-paper.pdf",
    mode="elements",
    post_processors=[clean_extra_whitespace],
)

docs = loader.load()
docs[:5]
```


### Reviewrs
  - @rlancemartin
  - @eyurtsev
  - @hwchase17
2023-07-17 12:13:05 -07:00
..
document_loaders/integrations feat: optional post-processing for Unstructured loaders (#7850) 2023-07-17 12:13:05 -07:00
document_transformers add tagger nb (#7637) 2023-07-13 01:48:23 -04:00
retrievers add bm25 module (#7779) 2023-07-17 07:30:17 -07:00
text_embedding/integrations Add GPT4All embeddings (#7743) 2023-07-15 10:04:29 -04:00
vectorstores/integrations fix nb (#7843) 2023-07-17 09:39:11 -07:00