langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-06 03:20:49 +00:00

History

Eugene Yurtsev b88dfcb42a Add indexing support (#9614 ) This PR introduces a persistence layer to help with indexing workflows into vectostores. The indexing code helps users to: 1. Avoid writing duplicated content into the vectostore 2. Avoid over-writing content if it's unchanged Importantly, this keeps on working even if the content being written is derived via a set of transformations from some source content (e.g., indexing children documents that were derived from parent documents by chunking.) The two main components are: 1. Persistence layer that keeps track of which keys were updated and when. Keeping track of the timestamp of updates, allows to clean up old content safely, and with minimal complexity. 2. HashedDocument which is used to hash the contents (including metadata) of the documents. We rely on the hashes for identifying duplicates. The indexing code works with ANY document loader. To add transformations to the documents, users for now can add a custom document loader that composes an existing loader together with document transformers. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>		2023-08-23 21:41:38 -04:00
..
_templates	Update Integrations links (#8206 )	2023-07-24 21:20:32 -07:00
additional_resources	Added In-Depth Langchain Agent Execution Guide (#9507 )	2023-08-20 15:59:01 -07:00
ecosystem	👀 docs: updated `dependents` (#9426 )	2023-08-18 10:15:39 -04:00
guides	Document AzureML Deployment Example (#9571 )	2023-08-22 07:36:47 -07:00
integrations	Updates to Nomic Atlas and GPT4All documentation (#9414 )	2023-08-23 17:49:44 -07:00
modules	Add indexing support (#9614 )	2023-08-23 21:41:38 -04:00
use_cases	docs: Add memgraph notebook (#9448 )	2023-08-21 13:45:04 -07:00