langchain-aws InMemoryVectorStore documentation updates (#24347)

Thank you for contributing to LangChain!

- [x] **PR title**: "Add documentaiton on InMemoryVectorStore driver for
MemoryDB to langchain-aws"
  - Langchain-aws repo :Add MemoryDB documentation 
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Added documentation on InMemoryVectorStore driver to
aws.mdx and usage example on MemoryDB clusuter
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
Add memorydb notebook to docs/docs/integrations/ folde


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
This commit is contained in:
Lakshmi Peri 2024-07-28 15:09:51 -04:00 committed by GitHub
parent 56c2a7f6d4
commit 821196c4ee
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 555 additions and 0 deletions

18
docs/docs/integrations/platforms/aws.mdx Normal file → Executable file
View File

@ -197,6 +197,24 @@ See a [usage example](/docs/integrations/vectorstores/documentdb).
```python
from langchain.vectorstores import DocumentDBVectorSearch
```
### Amazon MemoryDB
[Amazon MemoryDB](https://aws.amazon.com/memorydb/) is a durable, in-memory database service that delivers ultra-fast performance. MemoryDB is compatible with Redis OSS, a popular open source data store,
enabling you to quickly build applications using the same flexible and friendly Redis OSS APIs, and commands that they already use today.
InMemoryVectorStore class provides a vectorstore to connect with Amazon MemoryDB.
```python
from langchain_aws.vectorstores.inmemorydb import InMemoryVectorStore
vds = InMemoryVectorStore.from_documents(
chunks,
embeddings,
redis_url="rediss://cluster_endpoint:6379/ssl=True ssl_cert_reqs=none",
vector_schema=vector_schema,
index_name=INDEX_NAME,
)
```
See a [usage example](/docs/integrations/vectorstores/memorydb).
## Retrievers

View File

@ -0,0 +1,537 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Amazon MemoryDB\n",
"\n",
">[Vector Search](https://docs.aws.amazon.com/memorydb/latest/devguide/vector-search.html/) introduction and langchain integration guide.\n",
"\n",
"## What is Amazon MemoryDB?\n",
"\n",
"MemoryDB is compatible with Redis OSS, a popular open source data store, enabling you to quickly build applications using the same flexible and friendly Redis OSS data structures, APIs, and commands that they already use today. With MemoryDB, all of your data is stored in memory, which enables you to achieve microsecond read and single-digit millisecond write latency and high throughput. MemoryDB also stores data durably across multiple Availability Zones (AZs) using a Multi-AZ transactional log to enable fast failover, database recovery, and node restarts.\n",
"\n",
"\n",
"## Vector search for MemoryDB \n",
"\n",
"Vector search for MemoryDB extends the functionality of MemoryDB. Vector search can be used in conjunction with existing MemoryDB functionality. Applications that do not use vector search are unaffected by its presence. Vector search is available in all Regions that MemoryDB is available. You can use your existing MemoryDB data or Redis OSS API to build machine learning and generative AI use cases, such as retrieval-augmented generation, anomaly detection, document retrieval, and real-time recommendations.\n",
"\n",
"* Indexing of multiple fields in Redis hashes and `JSON`\n",
"* Vector similarity search (with `HNSW` (ANN) or `FLAT` (KNN))\n",
"* Vector Range Search (e.g. find all vectors within a radius of a query vector)\n",
"* Incremental indexing without performance loss\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting up\n",
"\n",
"\n",
"### Install Redis Python client\n",
"\n",
"`Redis-py` is a python client that can be used to connect to MemoryDB"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet redis langchain-aws"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from langchain_aws.embeddings import BedrockEmbeddings\n",
"\n",
"embeddings = BedrockEmbeddings()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### MemoryDB Connection\n",
"\n",
"Valid Redis Url schemas are:\n",
"1. `redis://` - Connection to Redis cluster, unencrypted\n",
"2. `rediss://` - Connection to Redis cluster, with TLS encryption\n",
"\n",
"More information about additional connection parameters can be found in the [redis-py documentation](https://redis-py.readthedocs.io/en/stable/connections.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample data\n",
"\n",
"First we will describe some sample data so that the various attributes of the Redis vector store can be demonstrated."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"metadata = [\n",
" {\n",
" \"user\": \"john\",\n",
" \"age\": 18,\n",
" \"job\": \"engineer\",\n",
" \"credit_score\": \"high\",\n",
" },\n",
" {\n",
" \"user\": \"derrick\",\n",
" \"age\": 45,\n",
" \"job\": \"doctor\",\n",
" \"credit_score\": \"low\",\n",
" },\n",
" {\n",
" \"user\": \"nancy\",\n",
" \"age\": 94,\n",
" \"job\": \"doctor\",\n",
" \"credit_score\": \"high\",\n",
" },\n",
" {\n",
" \"user\": \"tyler\",\n",
" \"age\": 100,\n",
" \"job\": \"engineer\",\n",
" \"credit_score\": \"high\",\n",
" },\n",
" {\n",
" \"user\": \"joe\",\n",
" \"age\": 35,\n",
" \"job\": \"dentist\",\n",
" \"credit_score\": \"medium\",\n",
" },\n",
"]\n",
"texts = [\"foo\", \"foo\", \"foo\", \"bar\", \"bar\"]\n",
"index_name = \"users\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create MemoryDB vector store\n",
"\n",
"The InMemoryVectorStore instance can be initialized using the below methods \n",
"- ``InMemoryVectorStore.__init__`` - Initialize directly\n",
"- ``InMemoryVectorStore.from_documents`` - Initialize from a list of ``Langchain.docstore.Document`` objects\n",
"- ``InMemoryVectorStore.from_texts`` - Initialize from a list of texts (optionally with metadata)\n",
"- ``InMemoryVectorStore.from_existing_index`` - Initialize from an existing MemoryDB index\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_aws.vectorstores.inmemorydb import InMemoryVectorStore\n",
"\n",
"vds = InMemoryVectorStore.from_texts(\n",
" embeddings,\n",
" redis_url=\"rediss://cluster_endpoint:6379/ssl=True ssl_cert_reqs=none\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'users'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vds.index_name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Querying\n",
"\n",
"There are multiple ways to query the ``InMemoryVectorStore`` implementation based on what use case you have:\n",
"\n",
"- ``similarity_search``: Find the most similar vectors to a given vector.\n",
"- ``similarity_search_with_score``: Find the most similar vectors to a given vector and return the vector distance\n",
"- ``similarity_search_limit_score``: Find the most similar vectors to a given vector and limit the number of results to the ``score_threshold``\n",
"- ``similarity_search_with_relevance_scores``: Find the most similar vectors to a given vector and return the vector similarities\n",
"- ``max_marginal_relevance_search``: Find the most similar vectors to a given vector while also optimizing for diversity"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"foo\n"
]
}
],
"source": [
"results = vds.similarity_search(\"foo\")\n",
"print(results[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Content: foo --- Score: 0.0\n",
"Content: foo --- Score: 0.0\n",
"Content: foo --- Score: 0.0\n",
"Content: bar --- Score: 0.1566\n",
"Content: bar --- Score: 0.1566\n"
]
}
],
"source": [
"# with scores (distances)\n",
"results = vds.similarity_search_with_score(\"foo\", k=5)\n",
"for result in results:\n",
" print(f\"Content: {result[0].page_content} --- Score: {result[1]}\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Content: foo --- Score: 0.0\n",
"Content: foo --- Score: 0.0\n",
"Content: foo --- Score: 0.0\n"
]
}
],
"source": [
"# limit the vector distance that can be returned\n",
"results = vds.similarity_search_with_score(\"foo\", k=5, distance_threshold=0.1)\n",
"for result in results:\n",
" print(f\"Content: {result[0].page_content} --- Score: {result[1]}\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Content: foo --- Similiarity: 1.0\n",
"Content: foo --- Similiarity: 1.0\n",
"Content: foo --- Similiarity: 1.0\n",
"Content: bar --- Similiarity: 0.8434\n",
"Content: bar --- Similiarity: 0.8434\n"
]
}
],
"source": [
"# with scores\n",
"results = vds.similarity_search_with_relevance_scores(\"foo\", k=5)\n",
"for result in results:\n",
" print(f\"Content: {result[0].page_content} --- Similiarity: {result[1]}\")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['doc:users:b9c71d62a0a34241a37950b448dafd38']"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# you can also add new documents as follows\n",
"new_document = [\"baz\"]\n",
"new_metadata = [{\"user\": \"sam\", \"age\": 50, \"job\": \"janitor\", \"credit_score\": \"high\"}]\n",
"# both the document and metadata must be lists\n",
"vds.add_texts(new_document, new_metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## MemoryDB as Retriever\n",
"\n",
"Here we go over different options for using the vector store as a retriever.\n",
"\n",
"There are three different search methods we can use to do retrieval. By default, it will use semantic similarity."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Content: foo --- Score: 0.0\n",
"Content: foo --- Score: 0.0\n",
"Content: foo --- Score: 0.0\n"
]
}
],
"source": [
"query = \"foo\"\n",
"results = vds.similarity_search_with_score(query, k=3, return_metadata=True)\n",
"\n",
"for result in results:\n",
" print(\"Content:\", result[0].page_content, \" --- Score: \", result[1])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"retriever = vds.as_retriever(search_type=\"similarity\", search_kwargs={\"k\": 4})"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n",
" Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),\n",
" Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'}),\n",
" Document(page_content='bar', metadata={'id': 'doc:users_modified:01ef6caac12b42c28ad870aefe574253', 'user': 'tyler', 'job': 'engineer', 'credit_score': 'high', 'age': '100'})]"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = retriever.invoke(query)\n",
"docs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is also the `similarity_distance_threshold` retriever which allows the user to specify the vector distance"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"retriever = vds.as_retriever(\n",
" search_type=\"similarity_distance_threshold\",\n",
" search_kwargs={\"k\": 4, \"distance_threshold\": 0.1},\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n",
" Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),\n",
" Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'})]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = retriever.invoke(query)\n",
"docs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly, the ``similarity_score_threshold`` allows the user to define the minimum score for similar documents"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"retriever = vds.as_retriever(\n",
" search_type=\"similarity_score_threshold\",\n",
" search_kwargs={\"score_threshold\": 0.9, \"k\": 10},\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n",
" Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),\n",
" Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'})]"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.invoke(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='foo', metadata={'id': 'doc:users:8f6b673b390647809d510112cde01a27', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n",
" Document(page_content='bar', metadata={'id': 'doc:users:93521560735d42328b48c9c6f6418d6a', 'user': 'tyler', 'job': 'engineer', 'credit_score': 'high', 'age': '100'}),\n",
" Document(page_content='foo', metadata={'id': 'doc:users:125ecd39d07845eabf1a699d44134a5b', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'}),\n",
" Document(page_content='foo', metadata={'id': 'doc:users:d6200ab3764c466082fde3eaab972a2a', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'})]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.invoke(\"foo\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delete index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To delete your entries you have to address them by their keys."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# delete the indices too\n",
"InMemoryVectorStore.drop_index(\n",
" index_name=\"users\", delete_documents=True, redis_url=\"redis://localhost:6379\"\n",
")\n",
"InMemoryVectorStore.drop_index(\n",
" index_name=\"users_modified\",\n",
" delete_documents=True,\n",
" redis_url=\"redis://localhost:6379\",\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}