Open a Deeplake dataset in read only mode (#2240)

I'm using Deeplake as a vector store for a Q&A application. When several
questions are being processed at the same time for the same dataset, the
2nd one triggers the following error:

> LockedException: This dataset cannot be open for writing as it is
locked by another machine. Try loading the dataset with
`read_only=True`.

Answering questions doesn't require writing new embeddings so it's ok to
open the dataset in read only mode at that time.

This pull request thus adds the `read_only` option to the Deeplake
constructor and to its subsequent `deeplake.load()` call.

The related Deeplake documentation is
[here](https://docs.deeplake.ai/en/latest/deeplake.html#deeplake.load).

I've tested this update on my local dev environment. I don't know if an
integration test and/or additional documentation are expected however.
Let me know if it is, ideally with some guidance as I'm not particularly
experienced in Python.
This commit is contained in:
JC Touzalin 2023-04-01 17:58:53 +02:00 committed by GitHub
parent e49284acde
commit 5a0844bae1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -57,6 +57,7 @@ class DeepLake(VectorStore):
dataset_path: str = _LANGCHAIN_DEFAULT_DEEPLAKE_PATH, dataset_path: str = _LANGCHAIN_DEFAULT_DEEPLAKE_PATH,
token: Optional[str] = None, token: Optional[str] = None,
embedding_function: Optional[Embeddings] = None, embedding_function: Optional[Embeddings] = None,
read_only: Optional[bool] = None,
) -> None: ) -> None:
"""Initialize with Deep Lake client.""" """Initialize with Deep Lake client."""
@ -70,7 +71,7 @@ class DeepLake(VectorStore):
self._deeplake = deeplake self._deeplake = deeplake
if deeplake.exists(dataset_path, token=token): if deeplake.exists(dataset_path, token=token):
self.ds = deeplake.load(dataset_path, token=token) self.ds = deeplake.load(dataset_path, token=token, read_only=read_only)
logger.warning( logger.warning(
f"Deep Lake Dataset in {dataset_path} already exists, " f"Deep Lake Dataset in {dataset_path} already exists, "
f"loading from the storage" f"loading from the storage"