langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-20 03:25:56 +00:00

History

Saba Sturua 427551eabf DocArray as a Retriever (#6031 ) ## DocArray as a Retriever [DocArray](https://github.com/docarray/docarray) is an open-source tool for managing your multi-modal data. It offers flexibility to store and search through your data using various document index backends. This PR introduces `DocArrayRetriever` - which works with any available backend and serves as a retriever for Langchain apps. Also, I added 2 notebooks: DocArray Backends - intro to all 5 currently supported backends, how to initialize, index, and use them as a retriever DocArray Usage - showcasing what additional search parameters you can pass to create versatile retrievers Example: ```python from docarray.index import InMemoryExactNNIndex from docarray import BaseDoc, DocList from docarray.typing import NdArray from langchain.embeddings.openai import OpenAIEmbeddings from langchain.retrievers import DocArrayRetriever # define document schema class MyDoc(BaseDoc): description: str description_embedding: NdArray[1536] embeddings = OpenAIEmbeddings() # create documents descriptions = ["description 1", "description 2"] desc_embeddings = embeddings.embed_documents(texts=descriptions) docs = DocList[MyDoc]( [ MyDoc(description=desc, description_embedding=embedding) for desc, embedding in zip(descriptions, desc_embeddings) ] ) # initialize document index with data db = InMemoryExactNNIndex[MyDoc](docs) # create a retriever retriever = DocArrayRetriever( index=db, embeddings=embeddings, search_field="description_embedding", content_field="description", ) # find the relevant document doc = retriever.get_relevant_documents("action movies") print(doc) ``` #### Who can review? @dev2049 --------- Signed-off-by: jupyterjazz <saba.sturua@jina.ai>		2023-06-17 09:09:33 -07:00
..
docarray	DocArray as a Retriever (#6031 )	2023-06-17 09:09:33 -07:00
document_compressors	Update Cohere Reranker (#4180 )	2023-05-05 09:11:37 -07:00
__init__.py
test_arxiv.py	Add `arxiv` retriever (#4538 )	2023-05-11 22:48:38 -07:00
test_azure_cognitive_search.py	Add azure cognitive search retriever (#4467 )	2023-05-10 15:27:27 -07:00
test_contextual_compression.py
test_merger_retriever.py	LOTR: Lord of the Retrievers. A retriever that merge several retrievers together applying document_formatters to them. (#5798 )	2023-06-10 08:41:02 -07:00
test_pupmed.py	Harrison/pubmed integration (#5664 )	2023-06-03 16:25:28 -07:00
test_weaviate_hybrid_search.py	Remove unnecessary comment (#4845 )	2023-05-17 11:53:03 -04:00
test_wikipedia.py	added `Wikipedia` retriever (#4302 )	2023-05-09 10:08:39 -07:00