mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
b4914888a7
### Features include - Metadata based embedding search - Choice of distance metric function (`L2` for Euclidean, `L1` for Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot' for dot product. Defaults to `L2` - Returning scores - Max Marginal Relevance Search - Deleting samples from the dataset ### Notes - Added numerous tests, let me know if you would like to shorten them or make smarter --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>
1.5 KiB
1.5 KiB
Deep Lake
This page covers how to use the Deep Lake ecosystem within LangChain.
Why Deep Lake?
- More than just a (multi-modal) vector store. You can later use the dataset to fine-tune your own LLM models.
- Not only stores embeddings, but also the original data with automatic version control.
- Truly serverless. Doesn't require another service and can be used with major cloud providers (AWS S3, GCS, etc.)
More Resources
- Ultimate Guide to LangChain & Deep Lake: Build ChatGPT to Answer Questions on Your Financial Data
- Here is whitepaper and academic paper for Deep Lake
- Here is a set of additional resources available for review: Deep Lake, Getting Started and Tutorials
Installation and Setup
- Install the Python package with
pip install deeplake
Wrappers
VectorStore
There exists a wrapper around Deep Lake, a data lake for Deep Learning applications, allowing you to use it as a vector store (for now), whether for semantic search or example selection.
To import this vectorstore:
from langchain.vectorstores import DeepLake
For a more detailed walkthrough of the Deep Lake wrapper, see this notebook