mirror of https://github.com/hwchase17/langchain
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
430 lines
14 KiB
Plaintext
430 lines
14 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Google Memorystore for Redis\n",
|
|
"\n",
|
|
"> [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) is a fully-managed service that is powered by the Redis in-memory data store to build application caches that provide sub-millisecond data access. Extend your database application to build AI-powered experiences leveraging Memorystore for Redis's Langchain integrations.\n",
|
|
"\n",
|
|
"This notebook goes over how to use [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) to store vector embeddings with the `MemorystoreVectorStore` class.\n",
|
|
"\n",
|
|
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-memorystore-redis-python/blob/main/docs/vector_store.ipynb)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Pre-reqs"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Before You Begin\n",
|
|
"\n",
|
|
"To run this notebook, you will need to do the following:\n",
|
|
"\n",
|
|
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
|
|
"* [Create a Memorystore for Redis instance](https://cloud.google.com/memorystore/docs/redis/create-instance-console). Ensure that the version is greater than or equal to 7.2."
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### 🦜🔗 Library Installation\n",
|
|
"\n",
|
|
"The integration lives in its own `langchain-google-memorystore-redis` package, so we need to install it."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install -upgrade --quiet langchain-google-memorystore-redis"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
|
|
"# import IPython\n",
|
|
"\n",
|
|
"# app = IPython.Application.instance()\n",
|
|
"# app.kernel.do_shutdown(True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### ☁ Set Your Google Cloud Project\n",
|
|
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
|
|
"\n",
|
|
"If you don't know your project ID, try the following:\n",
|
|
"\n",
|
|
"* Run `gcloud config list`.\n",
|
|
"* Run `gcloud projects list`.\n",
|
|
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
|
|
"\n",
|
|
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
|
|
"\n",
|
|
"# Set the project id\n",
|
|
"!gcloud config set project {PROJECT_ID}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### 🔐 Authentication\n",
|
|
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
|
|
"\n",
|
|
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
|
|
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from google.colab import auth\n",
|
|
"\n",
|
|
"auth.authenticate_user()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Basic Usage"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Initialize a Vector Index"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import redis\n",
|
|
"from langchain_google_memorystore_redis import (\n",
|
|
" DistanceStrategy,\n",
|
|
" HNSWConfig,\n",
|
|
" RedisVectorStore,\n",
|
|
")\n",
|
|
"\n",
|
|
"# Connect to a Memorystore for Redis instance\n",
|
|
"redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n",
|
|
"\n",
|
|
"# Configure HNSW index with descriptive parameters\n",
|
|
"index_config = HNSWConfig(\n",
|
|
" name=\"my_vector_index\", distance_strategy=DistanceStrategy.COSINE, vector_size=128\n",
|
|
")\n",
|
|
"\n",
|
|
"# Initialize/create the vector store index\n",
|
|
"RedisVectorStore.init_index(client=redis_client, index_config=index_config)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prepare Documents\n",
|
|
"\n",
|
|
"Text needs processing and numerical representation before interacting with a vector store. This involves:\n",
|
|
"\n",
|
|
"* Loading Text: The TextLoader obtains text data from a file (e.g., \"state_of_the_union.txt\").\n",
|
|
"* Text Splitting: The CharacterTextSplitter breaks the text into smaller chunks for embedding models."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.document_loaders import TextLoader\n",
|
|
"from langchain_text_splitters import CharacterTextSplitter\n",
|
|
"\n",
|
|
"loader = TextLoader(\"./state_of_the_union.txt\")\n",
|
|
"documents = loader.load()\n",
|
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
|
"docs = text_splitter.split_documents(documents)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Add Documents to the Vector Store\n",
|
|
"\n",
|
|
"After text preparation and embedding generation, the following methods insert them into the Redis vector store."
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Method 1: Classmethod for Direct Insertion\n",
|
|
"\n",
|
|
"This approach combines embedding creation and insertion into a single step using the from_documents classmethod:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.embeddings.fake import FakeEmbeddings\n",
|
|
"\n",
|
|
"embeddings = FakeEmbeddings(size=128)\n",
|
|
"redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n",
|
|
"rvs = RedisVectorStore.from_documents(\n",
|
|
" docs, embedding=embeddings, client=redis_client, index_name=\"my_vector_index\"\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Method 2: Instance-Based Insertion\n",
|
|
"This approach offers flexibility when working with a new or existing RedisVectorStore:\n",
|
|
"\n",
|
|
"* [Optional] Create a RedisVectorStore Instance: Instantiate a RedisVectorStore object for customization. If you already have an instance, proceed to the next step.\n",
|
|
"* Add Text with Metadata: Provide raw text and metadata to the instance. Embedding generation and insertion into the vector store are handled automatically."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"rvs = RedisVectorStore(\n",
|
|
" client=redis_client, index_name=\"my_vector_index\", embeddings=embeddings\n",
|
|
")\n",
|
|
"ids = rvs.add_texts(\n",
|
|
" texts=[d.page_content for d in docs], metadatas=[d.metadata for d in docs]\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Perform a Similarity Search (KNN)\n",
|
|
"\n",
|
|
"With the vector store populated, it's possible to search for text semantically similar to a query. Here's how to use KNN (K-Nearest Neighbors) with default settings:\n",
|
|
"\n",
|
|
"* Formulate the Query: A natural language question expresses the search intent (e.g., \"What did the president say about Ketanji Brown Jackson\").\n",
|
|
"* Retrieve Similar Results: The `similarity_search` method finds items in the vector store closest to the query in meaning."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import pprint\n",
|
|
"\n",
|
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
|
"knn_results = rvs.similarity_search(query=query)\n",
|
|
"pprint.pprint(knn_results)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Perform a Range-Based Similarity Search\n",
|
|
"\n",
|
|
"Range queries provide more control by specifying a desired similarity threshold along with the query text:\n",
|
|
"\n",
|
|
"* Formulate the Query: A natural language question defines the search intent.\n",
|
|
"* Set Similarity Threshold: The distance_threshold parameter determines how close a match must be considered relevant.\n",
|
|
"* Retrieve Results: The `similarity_search_with_score` method finds items from the vector store that fall within the specified similarity threshold."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"rq_results = rvs.similarity_search_with_score(query=query, distance_threshold=0.8)\n",
|
|
"pprint.pprint(rq_results)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Perform a Maximal Marginal Relevance (MMR) Search\n",
|
|
"\n",
|
|
"MMR queries aim to find results that are both relevant to the query and diverse from each other, reducing redundancy in search results.\n",
|
|
"\n",
|
|
"* Formulate the Query: A natural language question defines the search intent.\n",
|
|
"* Balance Relevance and Diversity: The lambda_mult parameter controls the trade-off between strict relevance and promoting variety in the results.\n",
|
|
"* Retrieve MMR Results: The `max_marginal_relevance_search` method returns items that optimize the combination of relevance and diversity based on the lambda setting."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"mmr_results = rvs.max_marginal_relevance_search(query=query, lambda_mult=0.90)\n",
|
|
"pprint.pprint(mmr_results)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Use the Vector Store as a Retriever\n",
|
|
"\n",
|
|
"For seamless integration with other LangChain components, a vector store can be converted into a Retriever. This offers several advantages:\n",
|
|
"\n",
|
|
"* LangChain Compatibility: Many LangChain tools and methods are designed to directly interact with retrievers.\n",
|
|
"* Ease of Use: The `as_retriever()` method converts the vector store into a format that simplifies querying."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"retriever = rvs.as_retriever()\n",
|
|
"results = retriever.invoke(query)\n",
|
|
"pprint.pprint(results)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Clean up"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Delete Documents from the Vector Store\n",
|
|
"\n",
|
|
"Occasionally, it's necessary to remove documents (and their associated vectors) from the vector store. The `delete` method provides this functionality."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"rvs.delete(ids)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Delete a Vector Index\n",
|
|
"\n",
|
|
"There might be circumstances where the deletion of an existing vector index is necessary. Common reasons include:\n",
|
|
"\n",
|
|
"* Index Configuration Changes: If index parameters need modification, it's often required to delete and recreate the index.\n",
|
|
"* Storage Management: Removing unused indices can help free up space within the Redis instance.\n",
|
|
"\n",
|
|
"Caution: Vector index deletion is an irreversible operation. Be certain that the stored vectors and search functionality are no longer required before proceeding."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 22,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Delete the vector index\n",
|
|
"RedisVectorStore.drop_index(client=redis_client, index_name=\"my_vector_index\")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|