You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/tests/integration_tests/retrievers
Saba Sturua 427551eabf
DocArray as a Retriever (#6031)
## DocArray as a Retriever

[DocArray](https://github.com/docarray/docarray) is an open-source tool
for managing your multi-modal data. It offers flexibility to store and
search through your data using various document index backends. This PR
introduces `DocArrayRetriever` - which works with any available backend
and serves as a retriever for Langchain apps.

Also, I added 2 notebooks:
DocArray Backends - intro to all 5 currently supported backends, how to
initialize, index, and use them as a retriever
DocArray Usage - showcasing what additional search parameters you can
pass to create versatile retrievers

Example:
```python
from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.retrievers import DocArrayRetriever


# define document schema
class MyDoc(BaseDoc):
    description: str
    description_embedding: NdArray[1536]


embeddings = OpenAIEmbeddings()
# create documents
descriptions = ["description 1", "description 2"]
desc_embeddings = embeddings.embed_documents(texts=descriptions)
docs = DocList[MyDoc](
    [
        MyDoc(description=desc, description_embedding=embedding)
        for desc, embedding in zip(descriptions, desc_embeddings)
    ]
)

# initialize document index with data
db = InMemoryExactNNIndex[MyDoc](docs)

# create a retriever
retriever = DocArrayRetriever(
    index=db,
    embeddings=embeddings,
    search_field="description_embedding",
    content_field="description",
)

# find the relevant document
doc = retriever.get_relevant_documents("action movies")
print(doc)
```

#### Who can review?

@dev2049

---------

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
1 year ago
..
docarray DocArray as a Retriever (#6031) 1 year ago
document_compressors Update Cohere Reranker (#4180) 1 year ago
__init__.py Contextual compression retriever (#2915) 1 year ago
test_arxiv.py Add `arxiv` retriever (#4538) 1 year ago
test_azure_cognitive_search.py Add azure cognitive search retriever (#4467) 1 year ago
test_contextual_compression.py Contextual compression retriever (#2915) 1 year ago
test_merger_retriever.py LOTR: Lord of the Retrievers. A retriever that merge several retrievers together applying document_formatters to them. (#5798) 1 year ago
test_pupmed.py Harrison/pubmed integration (#5664) 1 year ago
test_weaviate_hybrid_search.py Remove unnecessary comment (#4845) 1 year ago
test_wikipedia.py added `Wikipedia` retriever (#4302) 1 year ago