docs: add Cloud SQL for MySQL vector store integration docs (#20278)

Adding docs page for `Google Cloud SQL for MySQL` vector store
integration. This was recently released as part of the Cloud SQL for
MySQL LangChain package
([release](https://github.com/googleapis/langchain-google-cloud-sql-mysql-python/releases/tag/v0.2.0))

Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
Jack Wotherspoon 2024-04-11 17:57:46 -04:00 committed by GitHub
parent 7cf2d2759d
commit 204a16addc
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 603 additions and 3 deletions

View File

@ -229,7 +229,7 @@ pip install langchain-google-cloud-sql-pg
See [usage example](/docs/integrations/document_loaders/google_cloud_sql_pg).
```python
from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgreSQLLoader
from langchain_google_cloud_sql_pg import PostgresEngine, PostgresLoader
```
### Cloud Storage
@ -486,6 +486,21 @@ See [usage example](/docs/integrations/vectorstores/google_spanner).
from langchain_google_spanner import SpannerVectorStore
```
### Cloud SQL for MySQL
> [Google Cloud SQL for MySQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your MySQL relational databases on Google Cloud.
Install the python package:
```bash
pip install langchain-google-cloud-sql-mysql
```
See [usage example](/docs/integrations/vectorstores/google_cloud_sql_mysql).
```python
from langchain_google_cloud_sql_mysql import MySQLEngine, MySQLVectorStore
```
### Cloud SQL for PostgreSQL
> [Google Cloud SQL for PostgreSQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your PostgreSQL relational databases on Google Cloud.
@ -498,7 +513,7 @@ pip install langchain-google-cloud-sql-pg
See [usage example](/docs/integrations/vectorstores/google_cloud_sql_pg).
```python
from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgresVectorStore
from langchain_google_cloud_sql_pg import PostgresEngine, PostgresVectorStore
```
### Vertex AI Vector Search
@ -783,7 +798,7 @@ See [usage example](/docs/integrations/memory/google_sql_pg).
```python
from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgreSQLChatMessageHistory
from langchain_google_cloud_sql_pg import PostgresEngine, PostgresChatMessageHistory
```
### Cloud SQL for MySQL

View File

@ -0,0 +1,585 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Cloud SQL for MySQL\n",
"\n",
"> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers PostgreSQL, MySQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's LangChain integrations.\n",
"\n",
"This notebook goes over how to use `Cloud SQL for MySQL` to store vector embeddings with the `MySQLVectorStore` class.\n",
"\n",
"Learn more about the package on [GitHub](https://github.com/googleapis/langchain-google-cloud-sql-mysql-python/).\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-cloud-sql-mysql-python/blob/main/docs/vector_store.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before you begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
" * [Enable the Cloud SQL Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com)\n",
" * [Create a Cloud SQL instance.](https://cloud.google.com/sql/docs/mysql/connect-instance-auth-proxy#create-instance) (version must be >= **8.0.36** with **cloudsql_vector** database flag configured to \"On\")\n",
" * [Create a Cloud SQL database.](https://cloud.google.com/sql/docs/mysql/create-manage-databases)\n",
" * [Add a User to the database.](https://cloud.google.com/sql/docs/mysql/create-manage-users)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"Install the integration library, `langchain-google-cloud-sql-mysql`, and the library for the embedding service, `langchain-google-vertexai`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0ZITIDE160OD"
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-cloud-sql-mysql langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v40bB_GMcr9f"
},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "v6jBDnYnNM08"
},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yygMe6rPWxHS"
},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PTXN1_DSXj2b"
},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NEvB9BoLEulY"
},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "gfkS3yVRE4_W"
},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f8f2830ee9ca1e01"
},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OMvzMWRrR6n7"
},
"source": [
"### Set Cloud SQL database values\n",
"Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687).\n",
"\n",
"**Note:** MySQL vector support is only available on MySQL instances with version **>= 8.0.36**.\n",
"\n",
"For existing instances, you may need to perform a [self-service maintenance update](https://cloud.google.com/sql/docs/mysql/self-service-maintenance) to update your maintenance version to **MYSQL_8_0_36.R20240401.03_00** or greater. Once updated, [configure your database flags](https://cloud.google.com/sql/docs/mysql/flags) to have the new **cloudsql_vector** flag to \"On\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"REGION = \"us-central1\" # @param {type: \"string\"}\n",
"INSTANCE = \"my-mysql-instance\" # @param {type: \"string\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"vector_store\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QuQigs4UoFQ2"
},
"source": [
"### MySQLEngine Connection Pool\n",
"\n",
"One of the requirements and arguments to establish Cloud SQL as a vector store is a `MySQLEngine` object. The `MySQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n",
"\n",
"To create a `MySQLEngine` using `MySQLEngine.from_instance()` you need to provide only 4 things:\n",
"\n",
"1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n",
"1. `region` : Region where the Cloud SQL instance is located.\n",
"1. `instance` : The name of the Cloud SQL instance.\n",
"1. `database` : The name of the database to connect to on the Cloud SQL instance.\n",
"\n",
"By default, [IAM database authentication](https://cloud.google.com/sql/docs/mysql/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n",
"\n",
"For more informatin on IAM database authentication please see:\n",
"\n",
"* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/mysql/create-edit-iam-instances)\n",
"* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users)\n",
"\n",
"Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/mysql/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `MySQLEngine.from_instance()`:\n",
"\n",
"* `user` : Database user to use for built-in database authentication and login\n",
"* `password` : Database password to use for built-in database authentication and login.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "guVURf1QLL53"
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mysql import MySQLEngine\n",
"\n",
"engine = MySQLEngine.from_instance(\n",
" project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize a table\n",
"The `MySQLVectorStore` class requires a database table. The `MySQLEngine` class has a helper method `init_vectorstore_table()` that can be used to create a table with the proper schema for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "avlyHEMn6gzU"
},
"outputs": [],
"source": [
"engine.init_vectorstore_table(\n",
" table_name=TABLE_NAME,\n",
" vector_size=768, # Vector size for VertexAI model(textembedding-gecko@latest)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an embedding class instance\n",
"\n",
"You can use any [LangChain embeddings model](/docs/integrations/text_embedding/).\n",
"You may need to enable the Vertex AI API to use `VertexAIEmbeddings`.\n",
"\n",
"We recommend pinning the embedding model's version for production, learn more about the [Text embeddings models](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5utKIdq7KYi5"
},
"outputs": [],
"source": [
"# enable Vertex AI API\n",
"!gcloud services enable aiplatform.googleapis.com"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Vb2RJocV9_LQ"
},
"outputs": [],
"source": [
"from langchain_google_vertexai import VertexAIEmbeddings\n",
"\n",
"embedding = VertexAIEmbeddings(\n",
" model_name=\"textembedding-gecko@latest\", project=PROJECT_ID\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e1tl0aNx7SWy"
},
"source": [
"### Initialize a default MySQLVectorStore\n",
"\n",
"To initialize a `MySQLVectorStore` class you need to provide only 3 things:\n",
"\n",
"1. `engine` - An instance of a `MySQLEngine` engine.\n",
"1. `embedding_service` - An instance of a LangChain embedding model.\n",
"1. `table_name` : The name of the table within the Cloud SQL database to use as the vector store."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "z-AZyzAQ7bsf"
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mysql import MySQLVectorStore\n",
"\n",
"store = MySQLVectorStore(\n",
" engine=engine,\n",
" embedding_service=embedding,\n",
" table_name=TABLE_NAME,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add texts"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nrDvGWIOLL54"
},
"outputs": [],
"source": [
"import uuid\n",
"\n",
"all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n",
"metadatas = [{\"len\": len(t)} for t in all_texts]\n",
"ids = [str(uuid.uuid4()) for _ in all_texts]\n",
"\n",
"store.add_texts(all_texts, metadatas=metadatas, ids=ids)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete texts\n",
"\n",
"Delete vectors from the vector store by ID."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"store.delete([ids[1]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Search for documents"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"id": "fpqeZgUeLL54",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "f674a3af-452c-4d58-bb62-cbf514a9e1e3"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Pineapple\n"
]
}
],
"source": [
"query = \"I'd like a fruit.\"\n",
"docs = store.similarity_search(query)\n",
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Search for documents by vector\n",
"\n",
"It is also possible to do a search for documents similar to a given embedding vector using `similarity_search_by_vector` which accepts an embedding vector as a parameter instead of a string."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "N-NC5jgGLL55",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "69a1f9de-a830-450d-8a5e-118b36815a46"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[Document(page_content='Pineapple', metadata={'len': 9}), Document(page_content='Banana', metadata={'len': 6})]\n"
]
}
],
"source": [
"query_vector = embedding.embed_query(query)\n",
"docs = store.similarity_search_by_vector(query_vector, k=2)\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add an index\n",
"Speed up vector search queries by applying a vector index. Learn more about [MySQL vector indexes](https://github.com/googleapis/langchain-google-cloud-sql-mysql-python/blob/main/src/langchain_google_cloud_sql_mysql/indexes.py).\n",
"\n",
"**Note:** For IAM database authentication (default usage), the IAM database user will need to be granted the following permissions by a privileged database user for full control of vector indexes.\n",
"\n",
"```\n",
"GRANT EXECUTE ON PROCEDURE mysql.create_vector_index TO '<IAM_DB_USER>'@'%';\n",
"GRANT EXECUTE ON PROCEDURE mysql.alter_vector_index TO '<IAM_DB_USER>'@'%';\n",
"GRANT EXECUTE ON PROCEDURE mysql.drop_vector_index TO '<IAM_DB_USER>'@'%';\n",
"GRANT SELECT ON mysql.vector_indexes TO '<IAM_DB_USER>'@'%';\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mysql import VectorIndex\n",
"\n",
"store.apply_vector_index(VectorIndex())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Remove an index"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"store.drop_vector_index()"
]
},
{
"cell_type": "markdown",
"source": [
"## Advanced Usage"
],
"metadata": {}
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a MySQLVectorStore with custom metadata\n",
"\n",
"A vector store can take advantage of relational data to filter similarity searches.\n",
"\n",
"Create a table and `MySQLVectorStore` instance with custom metadata columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "eANG7_8qLL55"
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mysql import Column\n",
"\n",
"# set table name\n",
"CUSTOM_TABLE_NAME = \"vector_store_custom\"\n",
"\n",
"engine.init_vectorstore_table(\n",
" table_name=CUSTOM_TABLE_NAME,\n",
" vector_size=768, # VertexAI model: textembedding-gecko@latest\n",
" metadata_columns=[Column(\"len\", \"INTEGER\")],\n",
")\n",
"\n",
"\n",
"# initialize MySQLVectorStore with custom metadata columns\n",
"custom_store = MySQLVectorStore(\n",
" engine=engine,\n",
" embedding_service=embedding,\n",
" table_name=CUSTOM_TABLE_NAME,\n",
" metadata_columns=[\"len\"],\n",
" # connect to an existing VectorStore by customizing the table schema:\n",
" # id_column=\"uuid\",\n",
" # content_column=\"documents\",\n",
" # embedding_column=\"vectors\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bj2d-c2sLL5-"
},
"source": [
"### Search for documents with metadata filter\n",
"\n",
"It can be helpful to narrow down the documents before working with them.\n",
"\n",
"For example, documents can be filtered on metadata using the `filter` argument."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Sqfgk6EOLL5-",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a10c74e2-fe48-4cf9-ba2f-85aecb2490d0"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[Document(page_content='Pineapple', metadata={'len': 9}), Document(page_content='Banana', metadata={'len': 6}), Document(page_content='Apples and oranges', metadata={'len': 18}), Document(page_content='Cars and airplanes', metadata={'len': 18})]\n"
]
}
],
"source": [
"import uuid\n",
"\n",
"# add texts to the vector store\n",
"all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n",
"metadatas = [{\"len\": len(t)} for t in all_texts]\n",
"ids = [str(uuid.uuid4()) for _ in all_texts]\n",
"custom_store.add_texts(all_texts, metadatas=metadatas, ids=ids)\n",
"\n",
"# use filter on search\n",
"query_vector = embedding.embed_query(\"I'd like a fruit.\")\n",
"docs = custom_store.similarity_search_by_vector(query_vector, filter=\"len >= 6\")\n",
"\n",
"print(docs)"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 0
}