You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs
Eugene Yurtsev b88dfcb42a
Add indexing support (#9614)
This PR introduces a persistence layer to help with indexing workflows
into
vectostores.

The indexing code helps users to:

1. Avoid writing duplicated content into the vectostore
2. Avoid over-writing content if it's unchanged

Importantly, this keeps on working even if the content being written is
derived
via a set of transformations from some source content (e.g., indexing
children
documents that were derived from parent documents by chunking.)

The two main components are:

1. Persistence layer that keeps track of which keys were updated and
when.
Keeping track of the timestamp of updates, allows to clean up old
content
   safely, and with minimal complexity.
2. HashedDocument which is used to hash the contents (including
metadata) of
   the documents. We rely on the hashes for identifying duplicates.


The indexing code works with **ANY** document loader. To add
transformations
to the documents, users for now can add a custom document loader
that composes an existing loader together with document transformers.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
..
api_reference Fixing deeplake.mdx file as it uses outdates links (#9602) 1 year ago
docs_skeleton mv 1 year ago
extras Add indexing support (#9614) 1 year ago
snippets Fix typo (#9565) 1 year ago
.local_build.sh Update local script for docs build (#8377) 1 year ago
package-lock.json docs: New experimental UI for Mendable Search (#6558) 1 year ago
vercel_requirements.txt Add api cross ref linking (#8275) 1 year ago