langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-31 15:20:26 +00:00

History

Eugene Yurtsev b88dfcb42a Add indexing support (#9614 ) This PR introduces a persistence layer to help with indexing workflows into vectostores. The indexing code helps users to: 1. Avoid writing duplicated content into the vectostore 2. Avoid over-writing content if it's unchanged Importantly, this keeps on working even if the content being written is derived via a set of transformations from some source content (e.g., indexing children documents that were derived from parent documents by chunking.) The two main components are: 1. Persistence layer that keeps track of which keys were updated and when. Keeping track of the timestamp of updates, allows to clean up old content safely, and with minimal complexity. 2. HashedDocument which is used to hash the contents (including metadata) of the documents. We rely on the hashes for identifying duplicates. The indexing code works with ANY document loader. To add transformations to the documents, users for now can add a custom document loader that composes an existing loader together with document transformers. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>		2023-08-23 21:41:38 -04:00
..
document_transformers	Fix typo in long_context_reorder.ipynb (#8811 )	2023-08-06 15:31:38 -07:00
retrievers	Add missing param to parent document retriever notebook (#9569 )	2023-08-21 15:02:12 -07:00
text_embedding	mv	2023-08-23 11:30:44 -07:00
indexing.ipynb	Add indexing support (#9614 )	2023-08-23 21:41:38 -04:00