langchain/tests/integration_tests/retrievers
Saba Sturua 427551eabf
DocArray as a Retriever (#6031)
## DocArray as a Retriever

[DocArray](https://github.com/docarray/docarray) is an open-source tool
for managing your multi-modal data. It offers flexibility to store and
search through your data using various document index backends. This PR
introduces `DocArrayRetriever` - which works with any available backend
and serves as a retriever for Langchain apps.

Also, I added 2 notebooks:
DocArray Backends - intro to all 5 currently supported backends, how to
initialize, index, and use them as a retriever
DocArray Usage - showcasing what additional search parameters you can
pass to create versatile retrievers

Example:
```python
from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.retrievers import DocArrayRetriever


# define document schema
class MyDoc(BaseDoc):
    description: str
    description_embedding: NdArray[1536]


embeddings = OpenAIEmbeddings()
# create documents
descriptions = ["description 1", "description 2"]
desc_embeddings = embeddings.embed_documents(texts=descriptions)
docs = DocList[MyDoc](
    [
        MyDoc(description=desc, description_embedding=embedding)
        for desc, embedding in zip(descriptions, desc_embeddings)
    ]
)

# initialize document index with data
db = InMemoryExactNNIndex[MyDoc](docs)

# create a retriever
retriever = DocArrayRetriever(
    index=db,
    embeddings=embeddings,
    search_field="description_embedding",
    content_field="description",
)

# find the relevant document
doc = retriever.get_relevant_documents("action movies")
print(doc)
```

#### Who can review?

@dev2049

---------

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
2023-06-17 09:09:33 -07:00
..
docarray DocArray as a Retriever (#6031) 2023-06-17 09:09:33 -07:00
document_compressors Update Cohere Reranker (#4180) 2023-05-05 09:11:37 -07:00
__init__.py
test_arxiv.py Add arxiv retriever (#4538) 2023-05-11 22:48:38 -07:00
test_azure_cognitive_search.py Add azure cognitive search retriever (#4467) 2023-05-10 15:27:27 -07:00
test_contextual_compression.py
test_merger_retriever.py LOTR: Lord of the Retrievers. A retriever that merge several retrievers together applying document_formatters to them. (#5798) 2023-06-10 08:41:02 -07:00
test_pupmed.py Harrison/pubmed integration (#5664) 2023-06-03 16:25:28 -07:00
test_weaviate_hybrid_search.py Remove unnecessary comment (#4845) 2023-05-17 11:53:03 -04:00
test_wikipedia.py added Wikipedia retriever (#4302) 2023-05-09 10:08:39 -07:00