You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/tests/unit_tests/document_loader
Eugene Yurtsev 3c490b5ba3
Docugami DataLoader (#4727)
### Adds a document loader for Docugami

Specifically:

1. Adds a data loader that talks to the [Docugami](http://docugami.com)
API to download processed documents as semantic XML
2. Parses the semantic XML into chunks, with additional metadata
capturing chunk semantics
3. Adds a detailed notebook showing how you can use additional metadata
returned by Docugami for techniques like the [self-querying
retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html)
4. Adds an integration test, and related documentation

Here is an example of a result that is not possible without the
capabilities added by Docugami (from the notebook):

<img width="1585" alt="image"
src="https://github.com/hwchase17/langchain/assets/749277/bb6c1ce3-13dc-4349-a53b-de16681fdd5b">

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>
1 year ago
..
blob_loaders Add progress bar to filesystemblob loader, update pytest config for unit tests (#4212) 1 year ago
loaders Docugami DataLoader (#4727) 1 year ago
parsers Feature: pdfplumber PDF loader with BaseBlobParser (#4552) 1 year ago
test_docs/csv Fix #4087 by setting the correct csv dialect (#4103) 1 year ago
__init__.py Harrison/youtube loader (#1545) 2 years ago
test_base.py Add BlobParser abstraction (#3979) 1 year ago
test_csv_loader.py Fix #4087 by setting the correct csv dialect (#4103) 1 year ago
test_json_loader.py Harrison/json loader fix (#4686) 1 year ago
test_web_base.py Respect User-Specified User-Agent in WebBaseLoader (#4579) 1 year ago
test_youtube.py Improve video_id extraction in YoutubeLoader (#4452) 1 year ago