langchain/docs/extras/modules
German Martin 3ce4e46c8c
The Fellowship of the Vectors: New Embeddings Filter using clustering. (#7015)
Continuing with Tolkien inspired series of langchain tools. I bring to
you:
**The Fellowship of the Vectors**, AKA EmbeddingsClusteringFilter.
This document filter uses embeddings to group vectors together into
clusters, then allows you to pick an arbitrary number of documents
vector based on proximity to the cluster centers. That's a
representative sample of the cluster.

The original idea is from [Greg Kamradt](https://github.com/gkamradt)
from this video (Level4):
https://www.youtube.com/watch?v=qaPMdcCqtWk&t=365s

I added few tricks to make it a bit more versatile, so you can
parametrize what to do with duplicate documents in case of cluster
overlap: replace the duplicates with the next closest document or remove
it. This allow you to use it as an special kind of redundant filter too.
Additionally you can choose 2 diff orders: grouped by cluster or
respecting the original retriever scores.
In my use case I was using the docs grouped by cluster to run refine
chains per cluster to generate summarization over a large corpus of
documents.
Let me know if you want to change anything!

@rlancemartin, @eyurtsev, @hwchase17,

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-07 10:28:17 -07:00
..
agents Fix sql_database.ipynb link (#6525) 2023-07-06 13:07:37 -04:00
callbacks Remove Promptlayer Notebook (#6996) 2023-06-30 14:30:24 -07:00
chains openai fn update nb (#7352) 2023-07-07 11:52:21 -04:00
data_connection The Fellowship of the Vectors: New Embeddings Filter using clustering. (#7015) 2023-07-07 10:28:17 -07:00
memory Align cassio versions between examples for Cassandra integration (#7099) 2023-07-04 04:21:48 -06:00
model_io Bagatur/clarifai update (#7324) 2023-07-07 02:23:20 -04:00
paul_graham_essay.txt Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
state_of_the_union.txt Doc refactor (#6300) 2023-06-16 11:52:56 -07:00