mirror of
https://github.com/hwchase17/langchain
synced 2024-11-16 06:13:16 +00:00
docs: add Cloud SQL for MySQL vector store integration docs (#20278)
Adding docs page for `Google Cloud SQL for MySQL` vector store integration. This was recently released as part of the Cloud SQL for MySQL LangChain package ([release](https://github.com/googleapis/langchain-google-cloud-sql-mysql-python/releases/tag/v0.2.0)) Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
parent
7cf2d2759d
commit
204a16addc
@ -229,7 +229,7 @@ pip install langchain-google-cloud-sql-pg
|
||||
See [usage example](/docs/integrations/document_loaders/google_cloud_sql_pg).
|
||||
|
||||
```python
|
||||
from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgreSQLLoader
|
||||
from langchain_google_cloud_sql_pg import PostgresEngine, PostgresLoader
|
||||
```
|
||||
|
||||
### Cloud Storage
|
||||
@ -486,6 +486,21 @@ See [usage example](/docs/integrations/vectorstores/google_spanner).
|
||||
from langchain_google_spanner import SpannerVectorStore
|
||||
```
|
||||
|
||||
### Cloud SQL for MySQL
|
||||
|
||||
> [Google Cloud SQL for MySQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your MySQL relational databases on Google Cloud.
|
||||
Install the python package:
|
||||
|
||||
```bash
|
||||
pip install langchain-google-cloud-sql-mysql
|
||||
```
|
||||
|
||||
See [usage example](/docs/integrations/vectorstores/google_cloud_sql_mysql).
|
||||
|
||||
```python
|
||||
from langchain_google_cloud_sql_mysql import MySQLEngine, MySQLVectorStore
|
||||
```
|
||||
|
||||
### Cloud SQL for PostgreSQL
|
||||
|
||||
> [Google Cloud SQL for PostgreSQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your PostgreSQL relational databases on Google Cloud.
|
||||
@ -498,7 +513,7 @@ pip install langchain-google-cloud-sql-pg
|
||||
See [usage example](/docs/integrations/vectorstores/google_cloud_sql_pg).
|
||||
|
||||
```python
|
||||
from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgresVectorStore
|
||||
from langchain_google_cloud_sql_pg import PostgresEngine, PostgresVectorStore
|
||||
```
|
||||
|
||||
### Vertex AI Vector Search
|
||||
@ -783,7 +798,7 @@ See [usage example](/docs/integrations/memory/google_sql_pg).
|
||||
|
||||
|
||||
```python
|
||||
from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgreSQLChatMessageHistory
|
||||
from langchain_google_cloud_sql_pg import PostgresEngine, PostgresChatMessageHistory
|
||||
```
|
||||
|
||||
### Cloud SQL for MySQL
|
||||
|
585
docs/docs/integrations/vectorstores/google_cloud_sql_mysql.ipynb
Normal file
585
docs/docs/integrations/vectorstores/google_cloud_sql_mysql.ipynb
Normal file
@ -0,0 +1,585 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Google Cloud SQL for MySQL\n",
|
||||
"\n",
|
||||
"> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers PostgreSQL, MySQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's LangChain integrations.\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use `Cloud SQL for MySQL` to store vector embeddings with the `MySQLVectorStore` class.\n",
|
||||
"\n",
|
||||
"Learn more about the package on [GitHub](https://github.com/googleapis/langchain-google-cloud-sql-mysql-python/).\n",
|
||||
"\n",
|
||||
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-cloud-sql-mysql-python/blob/main/docs/vector_store.ipynb)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Before you begin\n",
|
||||
"\n",
|
||||
"To run this notebook, you will need to do the following:\n",
|
||||
"\n",
|
||||
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
|
||||
" * [Enable the Cloud SQL Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com)\n",
|
||||
" * [Create a Cloud SQL instance.](https://cloud.google.com/sql/docs/mysql/connect-instance-auth-proxy#create-instance) (version must be >= **8.0.36** with **cloudsql_vector** database flag configured to \"On\")\n",
|
||||
" * [Create a Cloud SQL database.](https://cloud.google.com/sql/docs/mysql/create-manage-databases)\n",
|
||||
" * [Add a User to the database.](https://cloud.google.com/sql/docs/mysql/create-manage-users)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 🦜🔗 Library Installation\n",
|
||||
"Install the integration library, `langchain-google-cloud-sql-mysql`, and the library for the embedding service, `langchain-google-vertexai`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "0ZITIDE160OD"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet langchain-google-cloud-sql-mysql langchain-google-vertexai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "v40bB_GMcr9f"
|
||||
},
|
||||
"source": [
|
||||
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "v6jBDnYnNM08"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
|
||||
"# import IPython\n",
|
||||
"\n",
|
||||
"# app = IPython.Application.instance()\n",
|
||||
"# app.kernel.do_shutdown(True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "yygMe6rPWxHS"
|
||||
},
|
||||
"source": [
|
||||
"### 🔐 Authentication\n",
|
||||
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
|
||||
"\n",
|
||||
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
|
||||
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "PTXN1_DSXj2b"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from google.colab import auth\n",
|
||||
"\n",
|
||||
"auth.authenticate_user()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "NEvB9BoLEulY"
|
||||
},
|
||||
"source": [
|
||||
"### ☁ Set Your Google Cloud Project\n",
|
||||
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
|
||||
"\n",
|
||||
"If you don't know your project ID, try the following:\n",
|
||||
"\n",
|
||||
"* Run `gcloud config list`.\n",
|
||||
"* Run `gcloud projects list`.\n",
|
||||
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "gfkS3yVRE4_W"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
|
||||
"\n",
|
||||
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
|
||||
"\n",
|
||||
"# Set the project id\n",
|
||||
"!gcloud config set project {PROJECT_ID}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "f8f2830ee9ca1e01"
|
||||
},
|
||||
"source": [
|
||||
"## Basic Usage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "OMvzMWRrR6n7"
|
||||
},
|
||||
"source": [
|
||||
"### Set Cloud SQL database values\n",
|
||||
"Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687).\n",
|
||||
"\n",
|
||||
"**Note:** MySQL vector support is only available on MySQL instances with version **>= 8.0.36**.\n",
|
||||
"\n",
|
||||
"For existing instances, you may need to perform a [self-service maintenance update](https://cloud.google.com/sql/docs/mysql/self-service-maintenance) to update your maintenance version to **MYSQL_8_0_36.R20240401.03_00** or greater. Once updated, [configure your database flags](https://cloud.google.com/sql/docs/mysql/flags) to have the new **cloudsql_vector** flag to \"On\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# @title Set Your Values Here { display-mode: \"form\" }\n",
|
||||
"REGION = \"us-central1\" # @param {type: \"string\"}\n",
|
||||
"INSTANCE = \"my-mysql-instance\" # @param {type: \"string\"}\n",
|
||||
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
|
||||
"TABLE_NAME = \"vector_store\" # @param {type: \"string\"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "QuQigs4UoFQ2"
|
||||
},
|
||||
"source": [
|
||||
"### MySQLEngine Connection Pool\n",
|
||||
"\n",
|
||||
"One of the requirements and arguments to establish Cloud SQL as a vector store is a `MySQLEngine` object. The `MySQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n",
|
||||
"\n",
|
||||
"To create a `MySQLEngine` using `MySQLEngine.from_instance()` you need to provide only 4 things:\n",
|
||||
"\n",
|
||||
"1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n",
|
||||
"1. `region` : Region where the Cloud SQL instance is located.\n",
|
||||
"1. `instance` : The name of the Cloud SQL instance.\n",
|
||||
"1. `database` : The name of the database to connect to on the Cloud SQL instance.\n",
|
||||
"\n",
|
||||
"By default, [IAM database authentication](https://cloud.google.com/sql/docs/mysql/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n",
|
||||
"\n",
|
||||
"For more informatin on IAM database authentication please see:\n",
|
||||
"\n",
|
||||
"* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/mysql/create-edit-iam-instances)\n",
|
||||
"* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users)\n",
|
||||
"\n",
|
||||
"Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/mysql/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `MySQLEngine.from_instance()`:\n",
|
||||
"\n",
|
||||
"* `user` : Database user to use for built-in database authentication and login\n",
|
||||
"* `password` : Database password to use for built-in database authentication and login.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "guVURf1QLL53"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_cloud_sql_mysql import MySQLEngine\n",
|
||||
"\n",
|
||||
"engine = MySQLEngine.from_instance(\n",
|
||||
" project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Initialize a table\n",
|
||||
"The `MySQLVectorStore` class requires a database table. The `MySQLEngine` class has a helper method `init_vectorstore_table()` that can be used to create a table with the proper schema for you."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "avlyHEMn6gzU"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"engine.init_vectorstore_table(\n",
|
||||
" table_name=TABLE_NAME,\n",
|
||||
" vector_size=768, # Vector size for VertexAI model(textembedding-gecko@latest)\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create an embedding class instance\n",
|
||||
"\n",
|
||||
"You can use any [LangChain embeddings model](/docs/integrations/text_embedding/).\n",
|
||||
"You may need to enable the Vertex AI API to use `VertexAIEmbeddings`.\n",
|
||||
"\n",
|
||||
"We recommend pinning the embedding model's version for production, learn more about the [Text embeddings models](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "5utKIdq7KYi5"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# enable Vertex AI API\n",
|
||||
"!gcloud services enable aiplatform.googleapis.com"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "Vb2RJocV9_LQ"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_vertexai import VertexAIEmbeddings\n",
|
||||
"\n",
|
||||
"embedding = VertexAIEmbeddings(\n",
|
||||
" model_name=\"textembedding-gecko@latest\", project=PROJECT_ID\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "e1tl0aNx7SWy"
|
||||
},
|
||||
"source": [
|
||||
"### Initialize a default MySQLVectorStore\n",
|
||||
"\n",
|
||||
"To initialize a `MySQLVectorStore` class you need to provide only 3 things:\n",
|
||||
"\n",
|
||||
"1. `engine` - An instance of a `MySQLEngine` engine.\n",
|
||||
"1. `embedding_service` - An instance of a LangChain embedding model.\n",
|
||||
"1. `table_name` : The name of the table within the Cloud SQL database to use as the vector store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "z-AZyzAQ7bsf"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_cloud_sql_mysql import MySQLVectorStore\n",
|
||||
"\n",
|
||||
"store = MySQLVectorStore(\n",
|
||||
" engine=engine,\n",
|
||||
" embedding_service=embedding,\n",
|
||||
" table_name=TABLE_NAME,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Add texts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "nrDvGWIOLL54"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import uuid\n",
|
||||
"\n",
|
||||
"all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n",
|
||||
"metadatas = [{\"len\": len(t)} for t in all_texts]\n",
|
||||
"ids = [str(uuid.uuid4()) for _ in all_texts]\n",
|
||||
"\n",
|
||||
"store.add_texts(all_texts, metadatas=metadatas, ids=ids)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete texts\n",
|
||||
"\n",
|
||||
"Delete vectors from the vector store by ID."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"store.delete([ids[1]])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Search for documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"id": "fpqeZgUeLL54",
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"outputId": "f674a3af-452c-4d58-bb62-cbf514a9e1e3"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
"name": "stdout",
|
||||
"text": [
|
||||
"Pineapple\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"I'd like a fruit.\"\n",
|
||||
"docs = store.similarity_search(query)\n",
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Search for documents by vector\n",
|
||||
"\n",
|
||||
"It is also possible to do a search for documents similar to a given embedding vector using `similarity_search_by_vector` which accepts an embedding vector as a parameter instead of a string."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "N-NC5jgGLL55",
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"outputId": "69a1f9de-a830-450d-8a5e-118b36815a46"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
"name": "stdout",
|
||||
"text": [
|
||||
"[Document(page_content='Pineapple', metadata={'len': 9}), Document(page_content='Banana', metadata={'len': 6})]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query_vector = embedding.embed_query(query)\n",
|
||||
"docs = store.similarity_search_by_vector(query_vector, k=2)\n",
|
||||
"print(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Add an index\n",
|
||||
"Speed up vector search queries by applying a vector index. Learn more about [MySQL vector indexes](https://github.com/googleapis/langchain-google-cloud-sql-mysql-python/blob/main/src/langchain_google_cloud_sql_mysql/indexes.py).\n",
|
||||
"\n",
|
||||
"**Note:** For IAM database authentication (default usage), the IAM database user will need to be granted the following permissions by a privileged database user for full control of vector indexes.\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"GRANT EXECUTE ON PROCEDURE mysql.create_vector_index TO '<IAM_DB_USER>'@'%';\n",
|
||||
"GRANT EXECUTE ON PROCEDURE mysql.alter_vector_index TO '<IAM_DB_USER>'@'%';\n",
|
||||
"GRANT EXECUTE ON PROCEDURE mysql.drop_vector_index TO '<IAM_DB_USER>'@'%';\n",
|
||||
"GRANT SELECT ON mysql.vector_indexes TO '<IAM_DB_USER>'@'%';\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_cloud_sql_mysql import VectorIndex\n",
|
||||
"\n",
|
||||
"store.apply_vector_index(VectorIndex())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Remove an index"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"store.drop_vector_index()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Advanced Usage"
|
||||
],
|
||||
"metadata": {}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create a MySQLVectorStore with custom metadata\n",
|
||||
"\n",
|
||||
"A vector store can take advantage of relational data to filter similarity searches.\n",
|
||||
"\n",
|
||||
"Create a table and `MySQLVectorStore` instance with custom metadata columns."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "eANG7_8qLL55"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_cloud_sql_mysql import Column\n",
|
||||
"\n",
|
||||
"# set table name\n",
|
||||
"CUSTOM_TABLE_NAME = \"vector_store_custom\"\n",
|
||||
"\n",
|
||||
"engine.init_vectorstore_table(\n",
|
||||
" table_name=CUSTOM_TABLE_NAME,\n",
|
||||
" vector_size=768, # VertexAI model: textembedding-gecko@latest\n",
|
||||
" metadata_columns=[Column(\"len\", \"INTEGER\")],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# initialize MySQLVectorStore with custom metadata columns\n",
|
||||
"custom_store = MySQLVectorStore(\n",
|
||||
" engine=engine,\n",
|
||||
" embedding_service=embedding,\n",
|
||||
" table_name=CUSTOM_TABLE_NAME,\n",
|
||||
" metadata_columns=[\"len\"],\n",
|
||||
" # connect to an existing VectorStore by customizing the table schema:\n",
|
||||
" # id_column=\"uuid\",\n",
|
||||
" # content_column=\"documents\",\n",
|
||||
" # embedding_column=\"vectors\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "bj2d-c2sLL5-"
|
||||
},
|
||||
"source": [
|
||||
"### Search for documents with metadata filter\n",
|
||||
"\n",
|
||||
"It can be helpful to narrow down the documents before working with them.\n",
|
||||
"\n",
|
||||
"For example, documents can be filtered on metadata using the `filter` argument."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "Sqfgk6EOLL5-",
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"outputId": "a10c74e2-fe48-4cf9-ba2f-85aecb2490d0"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
"name": "stdout",
|
||||
"text": [
|
||||
"[Document(page_content='Pineapple', metadata={'len': 9}), Document(page_content='Banana', metadata={'len': 6}), Document(page_content='Apples and oranges', metadata={'len': 18}), Document(page_content='Cars and airplanes', metadata={'len': 18})]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import uuid\n",
|
||||
"\n",
|
||||
"# add texts to the vector store\n",
|
||||
"all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n",
|
||||
"metadatas = [{\"len\": len(t)} for t in all_texts]\n",
|
||||
"ids = [str(uuid.uuid4()) for _ in all_texts]\n",
|
||||
"custom_store.add_texts(all_texts, metadatas=metadatas, ids=ids)\n",
|
||||
"\n",
|
||||
"# use filter on search\n",
|
||||
"query_vector = embedding.embed_query(\"I'd like a fruit.\")\n",
|
||||
"docs = custom_store.similarity_search_by_vector(query_vector, filter=\"len >= 6\")\n",
|
||||
"\n",
|
||||
"print(docs)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"toc_visible": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
Loading…
Reference in New Issue
Block a user