langchain/docs/modules/indexes
Eugene Yurtsev 3c490b5ba3
Docugami DataLoader (#4727)
### Adds a document loader for Docugami

Specifically:

1. Adds a data loader that talks to the [Docugami](http://docugami.com)
API to download processed documents as semantic XML
2. Parses the semantic XML into chunks, with additional metadata
capturing chunk semantics
3. Adds a detailed notebook showing how you can use additional metadata
returned by Docugami for techniques like the [self-querying
retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html)
4. Adds an integration test, and related documentation

Here is an example of a result that is not possible without the
capabilities added by Docugami (from the notebook):

<img width="1585" alt="image"
src="https://github.com/hwchase17/langchain/assets/749277/bb6c1ce3-13dc-4349-a53b-de16681fdd5b">

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>
2023-05-15 10:53:00 -04:00
..
document_loaders/examples Docugami DataLoader (#4727) 2023-05-15 10:53:00 -04:00
retrievers/examples Harrison/virtual time (#4658) 2023-05-14 10:29:17 -07:00
text_splitters Fix typo (#3728) 2023-04-28 13:01:09 -07:00
vectorstores added documentation on retrieving a PG vectorstore (#4578) 2023-05-12 13:04:06 -04:00
document_loaders.rst docs: document_loaders classification (#4069) 2023-05-13 19:17:32 -07:00
getting_started.ipynb add encoding to avoid UnicodeDecodeError (#2908) 2023-04-14 16:36:03 -07:00
retrievers.rst big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
text_splitters.rst Fix grammar in Text Splitters docs (#4373) 2023-05-08 22:38:40 -04:00
vectorstores.rst big docs refactor (#1978) 2023-03-26 19:49:46 -07:00