langchain/docs/ecosystem/deeplake.md
Davit Buniatyan b4914888a7
Deep Lake upgrade to include attribute search, distance metrics, returning scores and MMR (#2455)
### Features include

- Metadata based embedding search
- Choice of distance metric function (`L2` for Euclidean, `L1` for
Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot'
for dot product. Defaults to `L2`
- Returning scores
- Max Marginal Relevance Search
- Deleting samples from the dataset

### Notes
- Added numerous tests, let me know if you would like to shorten them or
make smarter

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-04-06 12:47:33 -07:00

1.5 KiB

Deep Lake

This page covers how to use the Deep Lake ecosystem within LangChain.

Why Deep Lake?

  • More than just a (multi-modal) vector store. You can later use the dataset to fine-tune your own LLM models.
  • Not only stores embeddings, but also the original data with automatic version control.
  • Truly serverless. Doesn't require another service and can be used with major cloud providers (AWS S3, GCS, etc.)

More Resources

  1. Ultimate Guide to LangChain & Deep Lake: Build ChatGPT to Answer Questions on Your Financial Data
  2. Here is whitepaper and academic paper for Deep Lake
  3. Here is a set of additional resources available for review: Deep Lake, Getting Started and Tutorials

Installation and Setup

  • Install the Python package with pip install deeplake

Wrappers

VectorStore

There exists a wrapper around Deep Lake, a data lake for Deep Learning applications, allowing you to use it as a vector store (for now), whether for semantic search or example selection.

To import this vectorstore:

from langchain.vectorstores import DeepLake

For a more detailed walkthrough of the Deep Lake wrapper, see this notebook