Commit Graph

1 Commits

Author SHA1 Message Date
Eugene Yurtsev
8e41143bf5
Add a generic document loader (#4875)
# Add generic document loader

* This PR adds a generic document loader which can assemble a loader
from a blob loader and a parser
* Adds a registry for parsers
* Populate registry with a default mimetype based parser


## Expected changes

- Parsing involves loading content via IO so can be sped up via:
  * Threading in sync
  * Async  
- The actual parsing logic may be computatinoally involved: may need to
figure out to add multi-processing support
- May want to add suffix based parser since suffixes are easier to
specify in comparison to mime types

## Before submitting

No notebooks yet, we first need to get a few of the basic parsers up
(prior to advertising the interface)
2023-05-17 22:38:55 -04:00