langchain/docs/docs/integrations/vectorstores/google_memorystore_redis.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Google Memorystore for Redis\n",
    "\n",
    "> [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) is a fully-managed service that is powered by the Redis in-memory data store to build application caches that provide sub-millisecond data access. Extend your database application to build AI-powered experiences leveraging Memorystore for Redis's Langchain integrations.\n",
    "\n",
    "This notebook goes over how to use [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) to store vector embeddings with the `MemorystoreVectorStore` class.\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-memorystore-redis-python/blob/main/docs/vector_store.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-reqs"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Before You Begin\n",
    "\n",
    "To run this notebook, you will need to do the following:\n",
    "\n",
    "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
    "* [Create a Memorystore for Redis instance](https://cloud.google.com/memorystore/docs/redis/create-instance-console). Ensure that the version is greater than or equal to 7.2."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 🦜🔗 Library Installation\n",
    "\n",
    "The integration lives in its own `langchain-google-memorystore-redis` package, so we need to install it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%pip install -upgrade --quiet langchain-google-memorystore-redis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # Automatically restart kernel after installs so that your environment can access the new packages\n",
    "# import IPython\n",
    "\n",
    "# app = IPython.Application.instance()\n",
    "# app.kernel.do_shutdown(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ☁ Set Your Google Cloud Project\n",
    "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
    "\n",
    "If you don't know your project ID, try the following:\n",
    "\n",
    "* Run `gcloud config list`.\n",
    "* Run `gcloud projects list`.\n",
    "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
    "\n",
    "PROJECT_ID = \"my-project-id\"  # @param {type:\"string\"}\n",
    "\n",
    "# Set the project id\n",
    "!gcloud config set project {PROJECT_ID}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 🔐 Authentication\n",
    "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
    "\n",
    "* If you are using Colab to run this notebook, use the cell below and continue.\n",
    "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.colab import auth\n",
    "\n",
    "auth.authenticate_user()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic Usage"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initialize a Vector Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "import redis\n",
    "from langchain_google_memorystore_redis import (\n",
    "    DistanceStrategy,\n",
    "    HNSWConfig,\n",
    "    RedisVectorStore,\n",
    ")\n",
    "\n",
    "# Connect to a Memorystore for Redis instance\n",
    "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n",
    "\n",
    "# Configure HNSW index with descriptive parameters\n",
    "index_config = HNSWConfig(\n",
    "    name=\"my_vector_index\", distance_strategy=DistanceStrategy.COSINE, vector_size=128\n",
    ")\n",
    "\n",
    "# Initialize/create the vector store index\n",
    "RedisVectorStore.init_index(client=redis_client, index_config=index_config)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare Documents\n",
    "\n",
    "Text needs processing and numerical representation before interacting with a vector store. This involves:\n",
    "\n",
    "* Loading Text: The TextLoader obtains text data from a file (e.g., \"state_of_the_union.txt\").\n",
    "* Text Splitting: The CharacterTextSplitter breaks the text into smaller chunks for embedding models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import TextLoader\n",
    "from langchain_text_splitters import CharacterTextSplitter\n",
    "\n",
    "loader = TextLoader(\"./state_of_the_union.txt\")\n",
    "documents = loader.load()\n",
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "docs = text_splitter.split_documents(documents)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Add Documents to the Vector Store\n",
    "\n",
    "After text preparation and embedding generation, the following methods insert them into the Redis vector store."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Method 1: Classmethod for Direct Insertion\n",
    "\n",
    "This approach combines embedding creation and insertion into a single step using the from_documents classmethod:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.embeddings.fake import FakeEmbeddings\n",
    "\n",
    "embeddings = FakeEmbeddings(size=128)\n",
    "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n",
    "rvs = RedisVectorStore.from_documents(\n",
    "    docs, embedding=embeddings, client=redis_client, index_name=\"my_vector_index\"\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Method 2: Instance-Based Insertion\n",
    "This approach offers flexibility when working with a new or existing RedisVectorStore:\n",
    "\n",
    "* [Optional] Create a RedisVectorStore Instance: Instantiate a RedisVectorStore object for customization. If you already have an instance, proceed to the next step.\n",
    "* Add Text with Metadata: Provide raw text and metadata to the instance. Embedding generation and insertion into the vector store are handled automatically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rvs = RedisVectorStore(\n",
    "    client=redis_client, index_name=\"my_vector_index\", embeddings=embeddings\n",
    ")\n",
    "ids = rvs.add_texts(\n",
    "    texts=[d.page_content for d in docs], metadatas=[d.metadata for d in docs]\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform a Similarity Search (KNN)\n",
    "\n",
    "With the vector store populated, it's possible to search for text semantically similar to a query.  Here's how to use KNN (K-Nearest Neighbors) with default settings:\n",
    "\n",
    "* Formulate the Query: A natural language question expresses the search intent (e.g., \"What did the president say about Ketanji Brown Jackson\").\n",
    "* Retrieve Similar Results: The `similarity_search` method finds items in the vector store closest to the query in meaning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pprint\n",
    "\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "knn_results = rvs.similarity_search(query=query)\n",
    "pprint.pprint(knn_results)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform a Range-Based Similarity Search\n",
    "\n",
    "Range queries provide more control by specifying a desired similarity threshold along with the query text:\n",
    "\n",
    "* Formulate the Query: A natural language question defines the search intent.\n",
    "* Set Similarity Threshold: The distance_threshold parameter determines how close a match must be considered relevant.\n",
    "* Retrieve Results: The `similarity_search_with_score` method finds items from the vector store that fall within the specified similarity threshold."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rq_results = rvs.similarity_search_with_score(query=query, distance_threshold=0.8)\n",
    "pprint.pprint(rq_results)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform a Maximal Marginal Relevance (MMR) Search\n",
    "\n",
    "MMR queries aim to find results that are both relevant to the query and diverse from each other, reducing redundancy in search results.\n",
    "\n",
    "* Formulate the Query: A natural language question defines the search intent.\n",
    "* Balance Relevance and Diversity: The lambda_mult parameter controls the trade-off between strict relevance and promoting variety in the results.\n",
    "* Retrieve MMR Results: The `max_marginal_relevance_search` method returns items that optimize the combination of relevance and diversity based on the lambda setting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mmr_results = rvs.max_marginal_relevance_search(query=query, lambda_mult=0.90)\n",
    "pprint.pprint(mmr_results)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use the Vector Store as a Retriever\n",
    "\n",
    "For seamless integration with other LangChain components, a vector store can be converted into a Retriever. This offers several advantages:\n",
    "\n",
    "* LangChain Compatibility: Many LangChain tools and methods are designed to directly interact with retrievers.\n",
    "* Ease of Use: The `as_retriever()` method converts the vector store into a format that simplifies querying."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = rvs.as_retriever()\n",
    "results = retriever.invoke(query)\n",
    "pprint.pprint(results)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean up"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Delete Documents from the Vector Store\n",
    "\n",
    "Occasionally, it's necessary to remove documents (and their associated vectors) from the vector store.  The `delete` method provides this functionality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rvs.delete(ids)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Delete a Vector Index\n",
    "\n",
    "There might be circumstances where the deletion of an existing vector index is necessary. Common reasons include:\n",
    "\n",
    "* Index Configuration Changes: If index parameters need modification, it's often required to delete and recreate the index.\n",
    "* Storage Management: Removing unused indices can help free up space within the Redis instance.\n",
    "\n",
    "Caution: Vector index deletion is an irreversible operation. Be certain that the stored vectors and search functionality are no longer required before proceeding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete the vector index\n",
    "RedisVectorStore.drop_index(client=redis_client, index_name=\"my_vector_index\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}