**Description: a description of the change**
Fixed `make docs_build` and related scripts which caused errors. There
are several changes.
First, I made the build of the documentation and the API Reference into
two separate commands. This is because it takes less time to build. The
commands for documents are `make docs_build`, `make docs_clean`, and
`make docs_linkcheck`. The commands for API Reference are `make
api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`.
It looked like `docs/.local_build.sh` could be used to build the
documentation, so I used that. Since `.local_build.sh` was also building
API Rerefence internally, I removed that process. `.local_build.sh` also
added some Bash options to stop in error or so. Futher more added `cd
"${SCRIPT_DIR}"` at the beginning so that the script will work no matter
which directory it is executed in.
`docs/api_reference/api_reference.rst` is removed, because which is
generated by `docs/api_reference/create_api_rst.py`, and added it to
.gitignore.
Finally, the description of CONTRIBUTING.md was modified.
**Issue: the issue # it fixes (if applicable)**
https://github.com/hwchase17/langchain/issues/6413
**Dependencies: any dependencies required for this change**
`nbdoc` was missing in group docs so it was added. I installed it with
the `poetry add --group docs nbdoc` command. I am concerned if any
modifications are needed to poetry.lock. I would greatly appreciate it
if you could pay close attention to this file during the review.
**Tag maintainer**
- General / Misc / if you don't know who to tag: @baskaryan
If this PR needs any additional changes, I'll be happy to make them!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
## DocArray as a Retriever
[DocArray](https://github.com/docarray/docarray) is an open-source tool
for managing your multi-modal data. It offers flexibility to store and
search through your data using various document index backends. This PR
introduces `DocArrayRetriever` - which works with any available backend
and serves as a retriever for Langchain apps.
Also, I added 2 notebooks:
DocArray Backends - intro to all 5 currently supported backends, how to
initialize, index, and use them as a retriever
DocArray Usage - showcasing what additional search parameters you can
pass to create versatile retrievers
Example:
```python
from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.retrievers import DocArrayRetriever
# define document schema
class MyDoc(BaseDoc):
description: str
description_embedding: NdArray[1536]
embeddings = OpenAIEmbeddings()
# create documents
descriptions = ["description 1", "description 2"]
desc_embeddings = embeddings.embed_documents(texts=descriptions)
docs = DocList[MyDoc](
[
MyDoc(description=desc, description_embedding=embedding)
for desc, embedding in zip(descriptions, desc_embeddings)
]
)
# initialize document index with data
db = InMemoryExactNNIndex[MyDoc](docs)
# create a retriever
retriever = DocArrayRetriever(
index=db,
embeddings=embeddings,
search_field="description_embedding",
content_field="description",
)
# find the relevant document
doc = retriever.get_relevant_documents("action movies")
print(doc)
```
#### Who can review?
@dev2049
---------
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>