Fixed several inconsistencies:
- file names and notebook titles should be similar otherwise ToC on the
[retrievers
page](https://python.langchain.com/en/latest/modules/indexes/retrievers.html)
and on the left ToC tab are different. For example, now, `Self-querying
with Chroma` is not correctly alphabetically sorted because its file
named `chroma_self_query.ipynb`
- `Stringing compressors and document transformers...` demoted from `#`
to `##`. Otherwise, it appears in Toc.
- several formatting problems
#### Who can review?
@hwchase17
@dev2049
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
## DocArray as a Retriever
[DocArray](https://github.com/docarray/docarray) is an open-source tool
for managing your multi-modal data. It offers flexibility to store and
search through your data using various document index backends. This PR
introduces `DocArrayRetriever` - which works with any available backend
and serves as a retriever for Langchain apps.
Also, I added 2 notebooks:
DocArray Backends - intro to all 5 currently supported backends, how to
initialize, index, and use them as a retriever
DocArray Usage - showcasing what additional search parameters you can
pass to create versatile retrievers
Example:
```python
from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.retrievers import DocArrayRetriever
# define document schema
class MyDoc(BaseDoc):
description: str
description_embedding: NdArray[1536]
embeddings = OpenAIEmbeddings()
# create documents
descriptions = ["description 1", "description 2"]
desc_embeddings = embeddings.embed_documents(texts=descriptions)
docs = DocList[MyDoc](
[
MyDoc(description=desc, description_embedding=embedding)
for desc, embedding in zip(descriptions, desc_embeddings)
]
)
# initialize document index with data
db = InMemoryExactNNIndex[MyDoc](docs)
# create a retriever
retriever = DocArrayRetriever(
index=db,
embeddings=embeddings,
search_field="description_embedding",
content_field="description",
)
# find the relevant document
doc = retriever.get_relevant_documents("action movies")
print(doc)
```
#### Who can review?
@dev2049
---------
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>