langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

History

German Martin f1eaa9b626 Lost in the middle: We have been ordering documents the WRONG way. (for long context) (#7520 ) Motivation, it seems that when dealing with a long context and "big" number of relevant documents we must avoid using out of the box score ordering from vector stores. See: https://arxiv.org/pdf/2306.01150.pdf So, I added an additional parameter that allows you to reorder the retrieved documents so we can work around this performance degradation. The relevance respect the original search score but accommodates the lest relevant document in the middle of the context. Extract from the paper (one image speaks 1000 tokens): ![image](https://github.com/hwchase17/langchain/assets/1821407/fafe4843-6e18-4fa6-9416-50cc1d32e811) This seems to be common to all diff arquitectures. SO I think we need a good generic way to implement this reordering and run some test in our already running retrievers. It could be that my approach is not the best one from the architecture point of view, happy to have a discussion about that. For me this was the best place to introduce the change and start retesting diff implementations. @rlancemartin, @eyurtsev --------- Co-authored-by: Lance Martin <lance@langchain.dev>		2023-07-18 07:45:15 -07:00
..
amazon_kendra_retriever.ipynb	Fix `make docs_build` and related scripts (#7276 )	2023-07-11 22:05:14 -04:00
arxiv.ipynb	Doc refactor (#6300 )	2023-06-16 11:52:56 -07:00
azure_cognitive_search.ipynb	[Small upgrade] Allow document limit in AzureCognitiveSearchRetriever (#7690 )	2023-07-13 23:04:40 -04:00
bm25.ipynb	add bm25 module (#7779 )	2023-07-17 07:30:17 -07:00
chaindesk.ipynb	Rename Databerry to Chaindesk (#7022 )	2023-07-07 17:28:04 -04:00
chatgpt-plugin.ipynb	Doc refactor (#6300 )	2023-06-16 11:52:56 -07:00
cohere-reranker.ipynb	docs/fix links (#6498 )	2023-06-20 14:06:50 -07:00
docarray_retriever.ipynb	Fix `make docs_build` and related scripts (#7276 )	2023-07-11 22:05:14 -04:00
elastic_search_bm25.ipynb	Doc refactor (#6300 )	2023-06-16 11:52:56 -07:00
knn.ipynb	Doc refactor (#6300 )	2023-06-16 11:52:56 -07:00
merger_retriever.ipynb	Lost in the middle: We have been ordering documents the WRONG way. (for long context) (#7520 )	2023-07-18 07:45:15 -07:00
metal.ipynb	Doc refactor (#6300 )	2023-06-16 11:52:56 -07:00
pinecone_hybrid_search.ipynb	Fixed a typo in pinecone_hybrid_search.ipynb (#7627 )	2023-07-12 23:46:41 -04:00
pubmed.ipynb	docs `retrievers` fixes (#6299 )	2023-06-19 22:04:35 -07:00
svm.ipynb	Doc refactor (#6300 )	2023-06-16 11:52:56 -07:00
tf_idf.ipynb	Doc refactor (#6300 )	2023-06-16 11:52:56 -07:00
vespa.ipynb	Doc refactor (#6300 )	2023-06-16 11:52:56 -07:00
weaviate-hybrid.ipynb	Doc refactor (#6300 )	2023-06-16 11:52:56 -07:00
wikipedia.ipynb	Doc refactor (#6300 )	2023-06-16 11:52:56 -07:00
zep_memorystore.ipynb	Fix `make docs_build` and related scripts (#7276 )	2023-07-11 22:05:14 -04:00