mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
736a1819aa
"One Retriever to merge them all, One Retriever to expose them, One Retriever to bring them all and in and process them with Document formatters." Hi @dev2049! Here bothering people again! I'm using this simple idea to deal with merging the output of several retrievers into one. I'm aware of DocumentCompressorPipeline and ContextualCompressionRetriever but I don't think they allow us to do something like this. Also I was getting in trouble to get the pipeline working too. Please correct me if i'm wrong. This allow to do some sort of "retrieval" preprocessing and then using the retrieval with the curated results anywhere you could use a retriever. My use case is to generate diff indexes with diff embeddings and sources for a more colorful results then filtering them with one or many document formatters. I saw some people looking for something like this, here: https://github.com/hwchase17/langchain/issues/3991 and something similar here: https://github.com/hwchase17/langchain/issues/5555 This is just a proposal I know I'm missing tests , etc. If you think this is a worth it idea I can work on tests and anything you want to change. Let me know! --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
33 lines
1.2 KiB
Python
33 lines
1.2 KiB
Python
from langchain.embeddings import OpenAIEmbeddings
|
|
from langchain.retrievers.merger_retriever import MergerRetriever
|
|
from langchain.vectorstores import Chroma
|
|
|
|
|
|
def test_merger_retriever_get_relevant_docs() -> None:
|
|
"""Test get_relevant_docs."""
|
|
texts_group_a = [
|
|
"This is a document about the Boston Celtics",
|
|
"Fly me to the moon is one of my favourite songs."
|
|
"I simply love going to the movies",
|
|
]
|
|
texts_group_b = [
|
|
"This is a document about the Poenix Suns",
|
|
"The Boston Celtics won the game by 20 points",
|
|
"Real stupidity beats artificial intelligence every time. TP",
|
|
]
|
|
embeddings = OpenAIEmbeddings()
|
|
retriever_a = Chroma.from_texts(texts_group_a, embedding=embeddings).as_retriever(
|
|
search_kwargs={"k": 1}
|
|
)
|
|
retriever_b = Chroma.from_texts(texts_group_b, embedding=embeddings).as_retriever(
|
|
search_kwargs={"k": 1}
|
|
)
|
|
|
|
# The Lord of the Retrievers.
|
|
lotr = MergerRetriever([retriever_a, retriever_b])
|
|
|
|
actual = lotr.get_relevant_documents("Tell me about the Celtics")
|
|
assert len(actual) == 2
|
|
assert texts_group_a[0] in [d.page_content for d in actual]
|
|
assert texts_group_b[1] in [d.page_content for d in actual]
|