fix: Update Google Cloud Enterprise Search to Vertex AI Search (#10513)

- Description: Google Cloud Enterprise Search was renamed to Vertex AI
Search
-
https://cloud.google.com/blog/products/ai-machine-learning/vertex-ai-search-and-conversation-is-now-generally-available
- This PR updates the documentation and Retriever class to use the new
terminology.
- Changed retriever class from `GoogleCloudEnterpriseSearchRetriever` to
`GoogleVertexAISearchRetriever`
- Updated documentation to specify that `extractive_segments` requires
the new [Enterprise
edition](https://cloud.google.com/generative-ai-app-builder/docs/about-advanced-features#enterprise-features)
to be enabled.
  - Fixed spelling errors in documentation.
- Change parameter for Retriever from `search_engine_id` to
`data_store_id`
- When this retriever was originally implemented, there was no
distinction between a data store and search engine, but now these have
been split.
- Fixed an issue blocking some users where the api_endpoint can't be set
This commit is contained in:
Holt Skinner 2023-10-05 12:47:47 -05:00 committed by GitHub
parent 1d678f805f
commit 9f73fec057
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
11 changed files with 694 additions and 577 deletions

File diff suppressed because one or more lines are too long

View File

@ -3739,6 +3739,10 @@
{
"source": "/docs/ecosystem/dependents",
"destination": "/docs/additional_resources/dependents"
},
{
"source": "docs/integrations/retrievers/google_cloud_enterprise_search",
"destination": "docs/integrations/retrievers/google_vertex_ai_search"
}
]
}

View File

@ -34,7 +34,7 @@ from langchain.chat_models import ChatVertexAI
## Document Loader
### Google BigQuery
>[Google BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.
> [Google BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.
`BigQuery` is a part of the `Google Cloud Platform`.
First, we need to install `google-cloud-bigquery` python package.
@ -134,6 +134,23 @@ See a [usage example](/docs/integrations/vectorstores/scann).
from langchain.vectorstores import ScaNN
```
## Retrievers
### Vertex AI Search
> [Google Cloud Vertex AI Search](https://cloud.google.com/generative-ai-app-builder/docs/introduction)
> allows developers to quickly build generative AI powered search engines for customers and employees.
First, you need to install the `google-cloud-discoveryengine` Python package.
```bash
pip install google-cloud-discoveryengine
```
See a [usage example](/docs/integrations/retrievers/google_vertex_ai_search).
```python
from langchain.retrievers import GoogleVertexAISearchRetriever
```
## Tools
### Google Search
@ -142,7 +159,7 @@ from langchain.vectorstores import ScaNN
- Set up a Custom Search Engine, following [these instructions](https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search)
- Get an API Key and Custom Search Engine ID from the previous step, and set them as environment variables `GOOGLE_API_KEY` and `GOOGLE_CSE_ID` respectively
There exists a GoogleSearchAPIWrapper utility which wraps this API. To import this utility:
There exists a `GoogleSearchAPIWrapper` utility which wraps this API. To import this utility:
```python
from langchain.utilities import GoogleSearchAPIWrapper

View File

@ -1,272 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Cloud Enterprise Search\n",
"\n",
"\n",
"[Enterprise Search](https://cloud.google.com/enterprise-search) is a part of the Generative AI App Builder suite of tools offered by Google Cloud.\n",
"\n",
"Gen AI App Builder lets developers, even those with limited machine learning skills, quickly and easily tap into the power of Googles foundation models, search expertise, and conversational AI technologies to create enterprise-grade generative AI applications. \n",
"\n",
"Enterprise Search lets organizations quickly build generative AI powered search engines for customers and employees.Enterprise Search is underpinned by a variety of Google Search technologies, including semantic search, which helps deliver more relevant results than traditional keyword-based search techniques by using natural language processing and machine learning techniques to infer relationships within the content and intent from the users query input. Enterprise Search also benefits from Googles expertise in understanding how users search and factors in content relevance to order displayed results. \n",
"\n",
"Google Cloud offers Enterprise Search via Gen App Builder in Google Cloud Console and via an API for enterprise workflow integration. \n",
"\n",
"This notebook demonstrates how to configure Enterprise Search and use the Enterprise Search retriever. The Enterprise Search retriever encapsulates the [Generative AI App Builder Python client library](https://cloud.google.com/generative-ai-app-builder/docs/libraries#client-libraries-install-python) and uses it to access the Enterprise Search [Search Service API](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1beta.services.search_service)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install pre-requisites\n",
"\n",
"You need to install the `google-cloud-discoverengine` package to use the Enterprise Search retriever."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip install google-cloud-discoveryengine"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure access to Google Cloud and Google Cloud Enterprise Search\n",
"\n",
"Enterprise Search is generally available for the allowlist (which means customers need to be approved for access) as of June 6, 2023. Contact your Google Cloud sales team for access and pricing details. We are previewing additional features that are coming soon to the generally available offering as part of our [Trusted Tester](https://cloud.google.com/ai/earlyaccess/join?hl=en) program. Sign up for [Trusted Tester](https://cloud.google.com/ai/earlyaccess/join?hl=en) and contact your Google Cloud sales team for an expedited trial.\n",
"\n",
"Before you can run this notebook you need to:\n",
"- Set or create a Google Cloud project and turn on Gen App Builder\n",
"- Create and populate an unstructured data store\n",
"- Set credentials to access `Enterprise Search API`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set or create a Google Cloud poject and turn on Gen App Builder\n",
"\n",
"Follow the instructions in the [Enterprise Search Getting Started guide](https://cloud.google.com/generative-ai-app-builder/docs/before-you-begin) to set/create a GCP project and enable Gen App Builder.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create and populate an unstructured data store\n",
"\n",
"[Use Google Cloud Console to create an unstructured data store](https://cloud.google.com/generative-ai-app-builder/docs/create-engine-es#unstructured-data) and populate it with the example PDF documents from the `gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs` Cloud Storage folder. Make sure to use the `Cloud Storage (without metadata)` option."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set credentials to access Enterprise Search API\n",
"\n",
"The [Gen App Builder client libraries](https://cloud.google.com/generative-ai-app-builder/docs/libraries) used by the Enterprise Search retriever provide high-level language support for authenticating to Gen App Builder programmatically. Client libraries support [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials); the libraries look for credentials in a set of defined locations and use those credentials to authenticate requests to the API. With ADC, you can make credentials available to your application in a variety of environments, such as local development or production, without needing to modify your application code.\n",
"\n",
"If running in [Google Colab](https://colab.google) authenticate with `google.colab.google.auth` otherwise follow one of the [supported methods](https://cloud.google.com/docs/authentication/application-default-credentials) to make sure that you Application Default Credentials are properly set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"if \"google.colab\" in sys.modules:\n",
" from google.colab import auth as google_auth\n",
"\n",
" google_auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure and use the Enterprise Search retriever\n",
"\n",
"The Enterprise Search retriever is implemented in the `langchain.retriever.GoogleCloudEntepriseSearchRetriever` class. The `get_relevant_documents` method returns a list of `langchain.schema.Document` documents where the `page_content` field of each document is populated the document content.\n",
"Depending on the data type used in Enterprise search (structured or unstructured) the `page_content` field is populated as follows:\n",
"- Structured data source: either an `extractive segment` or an `extractive answer` that matches a query. The `metadata` field is populated with metadata (if any) of the document from which the segments or answers were extracted.\n",
"- Unstructured data source: a string json containing all the fields returned from the structured data source. The `metadata` field is populated with metadata (if any) of the document \n",
"\n",
"### Only for Unstructured data sources:\n",
"An extractive answer is verbatim text that is returned with each search result. It is extracted directly from the original document. Extractive answers are typically displayed near the top of web pages to provide an end user with a brief answer that is contextually relevant to their query. Extractive answers are available for website and unstructured search.\n",
"\n",
"An extractive segment is verbatim text that is returned with each search result. An extractive segment is usually more verbose than an extractive answer. Extractive segments can be displayed as an answer to a query, and can be used to perform post-processing tasks and as input for large language models to generate answers or new text. Extractive segments are available for unstructured search.\n",
"\n",
"For more information about extractive segments and extractive answers refer to [product documentation](https://cloud.google.com/generative-ai-app-builder/docs/snippets).\n",
"\n",
"When creating an instance of the retriever you can specify a number of parameters that control which Enterprise data store to access and how a natural language query is processed, including configurations for extractive answers and segments.\n",
"\n",
"\n",
"### The mandatory parameters are:\n",
"\n",
"- `project_id` - Your Google Cloud PROJECT_ID\n",
"- `search_engine_id` - The ID of the data store you want to use. \n",
"\n",
"The `project_id` and `search_engine_id` parameters can be provided explicitly in the retriever's constructor or through the environment variables - `PROJECT_ID` and `SEARCH_ENGINE_ID`.\n",
"\n",
"You can also configure a number of optional parameters, including:\n",
"\n",
"- `max_documents` - The maximum number of documents used to provide extractive segments or extractive answers\n",
"- `get_extractive_answers` - By default, the retriever is configured to return extractive segments. Set this field to `True` to return extractive answers. This is used only when `engine_data_type` set to 0 (unstructured) \n",
"- `max_extractive_answer_count` - The maximum number of extractive answers returned in each search result.\n",
" At most 5 answers will be returned. This is used only when `engine_data_type` set to 0 (unstructured) \n",
"- `max_extractive_segment_count` - The maximum number of extractive segments returned in each search result.\n",
" Currently one segment will be returned. This is used only when `engine_data_type` set to 0 (unstructured) \n",
"- `filter` - The filter expression that allows you filter the search results based on the metadata associated with the documents in the searched data store. \n",
"- `query_expansion_condition` - Specification to determine under which conditions query expansion should occur.\n",
" 0 - Unspecified query expansion condition. In this case, server behavior defaults to disabled.\n",
" 1 - Disabled query expansion. Only the exact search query is used, even if SearchResponse.total_size is zero.\n",
" 2 - Automatic query expansion built by the Search API.\n",
"- `engine_data_type` - Defines the enterprise search data type\n",
" 0 - Unstructured data \n",
" 1 - Structured data\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure and use the retriever for **unstructured** data with extractve segments "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.retrievers import GoogleCloudEnterpriseSearchRetriever\n",
"\n",
"PROJECT_ID = \"<YOUR PROJECT ID>\" # Set to your Project ID\n",
"SEARCH_ENGINE_ID = \"<YOUR SEARCH ENGINE ID>\" # Set to your data store ID"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleCloudEnterpriseSearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" search_engine_id=SEARCH_ENGINE_ID,\n",
" max_documents=3,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query = \"What are Alphabet's Other Bets?\"\n",
"\n",
"result = retriever.get_relevant_documents(query)\n",
"for doc in result:\n",
" print(doc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure and use the retriever for **unstructured** data with extractve answers "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleCloudEnterpriseSearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" search_engine_id=SEARCH_ENGINE_ID,\n",
" max_documents=3,\n",
" max_extractive_answer_count=3,\n",
" get_extractive_answers=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query = \"What are Alphabet's Other Bets?\"\n",
"\n",
"result = retriever.get_relevant_documents(query)\n",
"for doc in result:\n",
" print(doc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure and use the retriever for **structured** data with extractve answers "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleCloudEnterpriseSearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" search_engine_id=SEARCH_ENGINE_ID,\n",
" max_documents=3,\n",
" engine_data_type=1\n",
")\n",
"\n",
"result = retriever.get_relevant_documents(query)\n",
"for doc in result:\n",
" print(doc)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -0,0 +1,274 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Vertex AI Search\n",
"\n",
"[Vertex AI Search](https://cloud.google.com/enterprise-search) (formerly known as Enterprise Search on Generative AI App Builder) is a part of the [Vertex AI](https://cloud.google.com/vertex-ai) machine learning platform offered by Google Cloud.\n",
"\n",
"Vertex AI Search lets organizations quickly build generative AI powered search engines for customers and employees. It's underpinned by a variety of Google Search technologies, including semantic search, which helps deliver more relevant results than traditional keyword-based search techniques by using natural language processing and machine learning techniques to infer relationships within the content and intent from the users query input. Vertex AI Search also benefits from Googles expertise in understanding how users search and factors in content relevance to order displayed results.\n",
"\n",
"Vertex AI Search is available in the Google Cloud Console and via an API for enterprise workflow integration.\n",
"\n",
"This notebook demonstrates how to configure Vertex AI Search and use the Vertex AI Search retriever. The Vertex AI Search retriever encapsulates the [Python client library](https://cloud.google.com/generative-ai-app-builder/docs/libraries#client-libraries-install-python) and uses it to access the [Search Service API](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1beta.services.search_service).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install pre-requisites\n",
"\n",
"You need to install the `google-cloud-discoveryengine` package to use the Vertex AI Search retriever.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip install google-cloud-discoveryengine"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure access to Google Cloud and Vertex AI Search\n",
"\n",
"Vertex AI Search is generally available without allowlist as of August 2023.\n",
"\n",
"Before you can use the retriever, you need to complete the following steps:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a search engine and populate an unstructured data store\n",
"\n",
"- Follow the instructions in the [Vertex AI Search Getting Started guide](https://cloud.google.com/generative-ai-app-builder/docs/try-enterprise-search) to set up a Google Cloud project and Vertex AI Search.\n",
"- [Use the Google Cloud Console to create an unstructured data store](https://cloud.google.com/generative-ai-app-builder/docs/create-engine-es#unstructured-data)\n",
" - Populate it with the example PDF documents from the `gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs` Cloud Storage folder.\n",
" - Make sure to use the `Cloud Storage (without metadata)` option.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set credentials to access Vertex AI Search API\n",
"\n",
"The [Vertex AI Search client libraries](https://cloud.google.com/generative-ai-app-builder/docs/libraries) used by the Vertex AI Search retriever provide high-level language support for authenticating to Google Cloud programmatically.\n",
"Client libraries support [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials); the libraries look for credentials in a set of defined locations and use those credentials to authenticate requests to the API.\n",
"With ADC, you can make credentials available to your application in a variety of environments, such as local development or production, without needing to modify your application code.\n",
"\n",
"If running in [Google Colab](https://colab.google) authenticate with `google.colab.google.auth` otherwise follow one of the [supported methods](https://cloud.google.com/docs/authentication/application-default-credentials) to make sure that you Application Default Credentials are properly set.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"if \"google.colab\" in sys.modules:\n",
" from google.colab import auth as google_auth\n",
"\n",
" google_auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure and use the Vertex AI Search retriever\n",
"\n",
"The Vertex AI Search retriever is implemented in the `langchain.retriever.GoogleVertexAISearchRetriever` class. The `get_relevant_documents` method returns a list of `langchain.schema.Document` documents where the `page_content` field of each document is populated the document content.\n",
"Depending on the data type used in Vertex AI Search (structured or unstructured) the `page_content` field is populated as follows:\n",
"\n",
"- Structured data source: either an `extractive segment` or an `extractive answer` that matches a query. The `metadata` field is populated with metadata (if any) of the document from which the segments or answers were extracted.\n",
"- Unstructured data source: a string json containing all the fields returned from the structured data source. The `metadata` field is populated with metadata (if any) of the document\n",
"\n",
"### Only for Unstructured data sources:\n",
"\n",
"An extractive answer is verbatim text that is returned with each search result. It is extracted directly from the original document. Extractive answers are typically displayed near the top of web pages to provide an end user with a brief answer that is contextually relevant to their query. Extractive answers are available for website and unstructured search.\n",
"\n",
"An extractive segment is verbatim text that is returned with each search result. An extractive segment is usually more verbose than an extractive answer. Extractive segments can be displayed as an answer to a query, and can be used to perform post-processing tasks and as input for large language models to generate answers or new text. Extractive segments are available for unstructured search.\n",
"\n",
"For more information about extractive segments and extractive answers refer to [product documentation](https://cloud.google.com/generative-ai-app-builder/docs/snippets).\n",
"\n",
"NOTE: Extractive segments require the [Enterprise edition](https://cloud.google.com/generative-ai-app-builder/docs/about-advanced-features#enterprise-features) features to be enabled.\n",
"\n",
"When creating an instance of the retriever you can specify a number of parameters that control which data store to access and how a natural language query is processed, including configurations for extractive answers and segments.\n",
"\n",
"### The mandatory parameters are:\n",
"\n",
"- `project_id` - Your Google Cloud Project ID.\n",
"- `location_id` - The location of the data store.\n",
" - `global` (default)\n",
" - `us`\n",
" - `eu`\n",
"- `data_store_id` - The ID of the data store you want to use.\n",
" - Note: This was called `search_engine_id` in previous versions of the retriever.\n",
"\n",
"The `project_id` and `data_store_id` parameters can be provided explicitly in the retriever's constructor or through the environment variables - `PROJECT_ID` and `DATA_STORE_ID`.\n",
"\n",
"You can also configure a number of optional parameters, including:\n",
"\n",
"- `max_documents` - The maximum number of documents used to provide extractive segments or extractive answers\n",
"- `get_extractive_answers` - By default, the retriever is configured to return extractive segments.\n",
" - Set this field to `True` to return extractive answers. This is used only when `engine_data_type` set to `0` (unstructured)\n",
"- `max_extractive_answer_count` - The maximum number of extractive answers returned in each search result.\n",
" - At most 5 answers will be returned. This is used only when `engine_data_type` set to `0` (unstructured).\n",
"- `max_extractive_segment_count` - The maximum number of extractive segments returned in each search result.\n",
" - Currently one segment will be returned. This is used only when `engine_data_type` set to `0` (unstructured).\n",
"- `filter` - The filter expression for the search results based on the metadata associated with the documents in the data store.\n",
"- `query_expansion_condition` - Specification to determine under which conditions query expansion should occur.\n",
" - `0` - Unspecified query expansion condition. In this case, server behavior defaults to disabled.\n",
" - `1` - Disabled query expansion. Only the exact search query is used, even if SearchResponse.total_size is zero.\n",
" - `2` - Automatic query expansion built by the Search API.\n",
"- `engine_data_type` - Defines the Vertex AI Search data type\n",
" - `0` - Unstructured data\n",
" - `1` - Structured data\n",
"\n",
"### Migration guide for `GoogleCloudEnterpriseSearchRetriever`\n",
"\n",
"In previous versions, this retriever was called `GoogleCloudEnterpriseSearchRetriever`. Some backwards-incompatible changes had to be made to the retriever after the General Availability launch due to changes in the product behavior.\n",
"\n",
"To update to the new retriever, make the following changes:\n",
"\n",
"- Change the import from: `from langchain.retrievers import GoogleCloudEnterpriseSearchRetriever` -> `from langchain.retrievers import GoogleVertexAISearchRetriever`.\n",
"- Change all class references from `GoogleCloudEnterpriseSearchRetriever` -> `GoogleVertexAISearchRetriever`.\n",
"- Upon class initialization, change the `search_engine_id` parameter name to `data_store_id`.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure and use the retriever for **unstructured** data with extractive segments\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.retrievers import GoogleVertexAISearchRetriever\n",
"\n",
"PROJECT_ID = \"<YOUR PROJECT ID>\" # Set to your Project ID\n",
"LOCATION_ID = \"<YOUR LOCATION>\" # Set to your data store location\n",
"DATA_STORE_ID = \"<YOUR DATA STORE ID>\" # Set to your data store ID"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleVertexAISearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" location_id=LOCATION_ID,\n",
" data_store_id=DATA_STORE_ID,\n",
" max_documents=3,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query = \"What are Alphabet's Other Bets?\"\n",
"\n",
"result = retriever.get_relevant_documents(query)\n",
"for doc in result:\n",
" print(doc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure and use the retriever for **unstructured** data with extractive answers\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleVertexAISearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" location_id=LOCATION_ID,\n",
" data_store_id=DATA_STORE_ID,\n",
" max_documents=3,\n",
" max_extractive_answer_count=3,\n",
" get_extractive_answers=True,\n",
")\n",
"\n",
"result = retriever.get_relevant_documents(query)\n",
"for doc in result:\n",
" print(doc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure and use the retriever for **structured** data\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleVertexAISearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" location_id=LOCATION_ID,\n",
" data_store_id=DATA_STORE_ID,\n",
" max_documents=3,\n",
" engine_data_type=1,\n",
")\n",
"\n",
"result = retriever.get_relevant_documents(query)\n",
"for doc in result:\n",
" print(doc)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -940,7 +940,7 @@
"- DocArrayRetriever\n",
"- ElasticSearchBM25Retriever\n",
"- EnsembleRetriever\n",
"- GoogleCloudEnterpriseSearchRetriever\n",
"- GoogleVertexAISearchRetriever\n",
"- AmazonKendraRetriever\n",
"- KNNRetriever\n",
"- LlamaIndexGraphRetriever and LlamaIndexRetriever\n",
@ -992,7 +992,7 @@
{
"data": {
"text/plain": [
"{'question': 'LangChain possesses a variety of retrievers including:\\n\\n1. ArxivRetriever\\n2. AzureCognitiveSearchRetriever\\n3. BM25Retriever\\n4. ChaindeskRetriever\\n5. ChatGPTPluginRetriever\\n6. ContextualCompressionRetriever\\n7. DocArrayRetriever\\n8. ElasticSearchBM25Retriever\\n9. EnsembleRetriever\\n10. GoogleCloudEnterpriseSearchRetriever\\n11. AmazonKendraRetriever\\n12. KNNRetriever\\n13. LlamaIndexGraphRetriever\\n14. LlamaIndexRetriever\\n15. MergerRetriever\\n16. MetalRetriever\\n17. MilvusRetriever\\n18. MultiQueryRetriever\\n19. ParentDocumentRetriever\\n20. PineconeHybridSearchRetriever\\n21. PubMedRetriever\\n22. RePhraseQueryRetriever\\n23. RemoteLangChainRetriever\\n24. SelfQueryRetriever\\n25. SVMRetriever\\n26. TFIDFRetriever\\n27. TimeWeightedVectorStoreRetriever\\n28. VespaRetriever\\n29. WeaviateHybridSearchRetriever\\n30. WebResearchRetriever\\n31. WikipediaRetriever\\n32. ZepRetriever\\n33. ZillizRetriever\\n\\nIt also includes self query translators like:\\n\\n1. ChromaTranslator\\n2. DeepLakeTranslator\\n3. MyScaleTranslator\\n4. PineconeTranslator\\n5. QdrantTranslator\\n6. WeaviateTranslator\\n\\nAnd remote retrievers like:\\n\\n1. RemoteLangChainRetriever'}"
"{'question': 'LangChain possesses a variety of retrievers including:\\n\\n1. ArxivRetriever\\n2. AzureCognitiveSearchRetriever\\n3. BM25Retriever\\n4. ChaindeskRetriever\\n5. ChatGPTPluginRetriever\\n6. ContextualCompressionRetriever\\n7. DocArrayRetriever\\n8. ElasticSearchBM25Retriever\\n9. EnsembleRetriever\\n10. GoogleVertexAISearchRetriever\\n11. AmazonKendraRetriever\\n12. KNNRetriever\\n13. LlamaIndexGraphRetriever\\n14. LlamaIndexRetriever\\n15. MergerRetriever\\n16. MetalRetriever\\n17. MilvusRetriever\\n18. MultiQueryRetriever\\n19. ParentDocumentRetriever\\n20. PineconeHybridSearchRetriever\\n21. PubMedRetriever\\n22. RePhraseQueryRetriever\\n23. RemoteLangChainRetriever\\n24. SelfQueryRetriever\\n25. SVMRetriever\\n26. TFIDFRetriever\\n27. TimeWeightedVectorStoreRetriever\\n28. VespaRetriever\\n29. WeaviateHybridSearchRetriever\\n30. WebResearchRetriever\\n31. WikipediaRetriever\\n32. ZepRetriever\\n33. ZillizRetriever\\n\\nIt also includes self query translators like:\\n\\n1. ChromaTranslator\\n2. DeepLakeTranslator\\n3. MyScaleTranslator\\n4. PineconeTranslator\\n5. QdrantTranslator\\n6. WeaviateTranslator\\n\\nAnd remote retrievers like:\\n\\n1. RemoteLangChainRetriever'}"
]
},
"execution_count": 31,
@ -1124,7 +1124,7 @@
"- DocArrayRetriever\n",
"- ElasticSearchBM25Retriever\n",
"- EnsembleRetriever\n",
"- GoogleCloudEnterpriseSearchRetriever\n",
"- GoogleVertexAISearchRetriever\n",
"- AmazonKendraRetriever\n",
"- KNNRetriever\n",
"- LlamaIndexGraphRetriever and LlamaIndexRetriever\n",

View File

@ -30,6 +30,9 @@ from langchain.retrievers.ensemble import EnsembleRetriever
from langchain.retrievers.google_cloud_enterprise_search import (
GoogleCloudEnterpriseSearchRetriever,
)
from langchain.retrievers.google_vertex_ai_search import (
GoogleVertexAISearchRetriever,
)
from langchain.retrievers.kay import KayAiRetriever
from langchain.retrievers.kendra import AmazonKendraRetriever
from langchain.retrievers.knn import KNNRetriever
@ -70,6 +73,7 @@ __all__ = [
"ChaindeskRetriever",
"ElasticSearchBM25Retriever",
"GoogleCloudEnterpriseSearchRetriever",
"GoogleVertexAISearchRetriever",
"KayAiRetriever",
"KNNRetriever",
"LlamaIndexGraphRetriever",

View File

@ -1,275 +1,22 @@
"""Retriever wrapper for Google Cloud Enterprise Search on Gen App Builder."""
from __future__ import annotations
"""Retriever wrapper for Google Vertex AI Search.
DEPRECATED: Maintained for backwards compatibility.
"""
from typing import Any
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence
from langchain.callbacks.manager import CallbackManagerForRetrieverRun
from langchain.pydantic_v1 import Extra, Field, root_validator
from langchain.schema import BaseRetriever, Document
from langchain.utils import get_from_dict_or_env
if TYPE_CHECKING:
from google.cloud.discoveryengine_v1beta import (
SearchRequest,
SearchResult,
SearchServiceClient,
)
from langchain.retrievers.google_vertex_ai_search import GoogleVertexAISearchRetriever
class GoogleCloudEnterpriseSearchRetriever(BaseRetriever):
"""`Google Cloud Enterprise Search API` retriever.
For a detailed explanation of the Enterprise Search concepts
and configuration parameters, refer to the product documentation.
https://cloud.google.com/generative-ai-app-builder/docs/enterprise-search-introduction
class GoogleCloudEnterpriseSearchRetriever(GoogleVertexAISearchRetriever):
"""`Google Vertex Search API` retriever alias for backwards compatibility.
DEPRECATED: Use `GoogleVertexAISearchRetriever` instead.
"""
project_id: str
"""Google Cloud Project ID."""
search_engine_id: str
"""Enterprise Search engine ID."""
serving_config_id: str = "default_config"
"""Enterprise Search serving config ID."""
location_id: str = "global"
"""Enterprise Search engine location."""
filter: Optional[str] = None
"""Filter expression."""
get_extractive_answers: bool = False
"""If True return Extractive Answers, otherwise return Extractive Segments."""
max_documents: int = Field(default=5, ge=1, le=100)
"""The maximum number of documents to return."""
max_extractive_answer_count: int = Field(default=1, ge=1, le=5)
"""The maximum number of extractive answers returned in each search result.
At most 5 answers will be returned for each SearchResult.
"""
max_extractive_segment_count: int = Field(default=1, ge=1, le=1)
"""The maximum number of extractive segments returned in each search result.
Currently one segment will be returned for each SearchResult.
"""
query_expansion_condition: int = Field(default=1, ge=0, le=2)
"""Specification to determine under which conditions query expansion should occur.
0 - Unspecified query expansion condition. In this case, server behavior defaults
to disabled
1 - Disabled query expansion. Only the exact search query is used, even if
SearchResponse.total_size is zero.
2 - Automatic query expansion built by the Search API.
"""
spell_correction_mode: int = Field(default=2, ge=0, le=2)
"""Specification to determine under which conditions query expansion should occur.
0 - Unspecified spell correction mode. In this case, server behavior defaults
to auto.
1 - Suggestion only. Search API will try to find a spell suggestion if there is any
and put in the `SearchResponse.corrected_query`.
The spell suggestion will not be used as the search query.
2 - Automatic spell correction built by the Search API.
Search will be based on the corrected query if found.
"""
credentials: Any = None
"""The default custom credentials (google.auth.credentials.Credentials) to use
when making API calls. If not provided, credentials will be ascertained from
the environment."""
def __init__(self, **data: Any):
import warnings
# TODO: Add extra data type handling for type website
engine_data_type: int = Field(default=0, ge=0, le=1)
""" Defines the enterprise search data type
0 - Unstructured data
1 - Structured data
"""
_client: SearchServiceClient
_serving_config: str
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
arbitrary_types_allowed = True
underscore_attrs_are_private = True
@root_validator(pre=True)
def validate_environment(cls, values: Dict) -> Dict:
"""Validates the environment."""
try:
from google.cloud import discoveryengine_v1beta # noqa: F401
except ImportError as exc:
raise ImportError(
"google.cloud.discoveryengine is not installed."
"Please install it with pip install google-cloud-discoveryengine"
) from exc
try:
from google.api_core.exceptions import InvalidArgument # noqa: F401
except ImportError as exc:
raise ImportError(
"google.api_core.exceptions is not installed. "
"Please install it with pip install google-api-core"
) from exc
values["project_id"] = get_from_dict_or_env(values, "project_id", "PROJECT_ID")
values["search_engine_id"] = get_from_dict_or_env(
values, "search_engine_id", "SEARCH_ENGINE_ID"
warnings.warn(
"GoogleCloudEnterpriseSearchRetriever is deprecated, use GoogleVertexAISearchRetriever", # noqa: E501
DeprecationWarning,
)
return values
def __init__(self, **data: Any) -> None:
"""Initializes private fields."""
try:
from google.cloud.discoveryengine_v1beta import SearchServiceClient
except ImportError:
raise ImportError(
"google.cloud.discoveryengine is not installed."
"Please install it with pip install google-cloud-discoveryengine"
)
super().__init__(**data)
self._client = SearchServiceClient(credentials=self.credentials)
self._serving_config = self._client.serving_config_path(
project=self.project_id,
location=self.location_id,
data_store=self.search_engine_id,
serving_config=self.serving_config_id,
)
def _convert_unstructured_search_response(
self, results: Sequence[SearchResult]
) -> List[Document]:
"""Converts a sequence of search results to a list of LangChain documents."""
from google.protobuf.json_format import MessageToDict
documents: List[Document] = []
for result in results:
document_dict = MessageToDict(
result.document._pb, preserving_proto_field_name=True
)
derived_struct_data = document_dict.get("derived_struct_data")
if not derived_struct_data:
continue
doc_metadata = document_dict.get("struct_data", {})
doc_metadata["id"] = document_dict["id"]
chunk_type = (
"extractive_answers"
if self.get_extractive_answers
else "extractive_segments"
)
if chunk_type not in derived_struct_data:
continue
for chunk in derived_struct_data[chunk_type]:
doc_metadata["source"] = derived_struct_data.get("link", "")
if chunk_type == "extractive_answers":
doc_metadata["source"] += f":{chunk.get('pageNumber', '')}"
documents.append(
Document(
page_content=chunk.get("content", ""), metadata=doc_metadata
)
)
return documents
def _convert_structured_search_response(
self, results: Sequence[SearchResult]
) -> List[Document]:
"""Converts a sequence of search results to a list of LangChain documents."""
import json
from google.protobuf.json_format import MessageToDict
documents: List[Document] = []
for result in results:
document_dict = MessageToDict(
result.document._pb, preserving_proto_field_name=True
)
documents.append(
Document(
page_content=json.dumps(document_dict.get("struct_data", {})),
metadata={"id": document_dict["id"], "name": document_dict["name"]},
)
)
return documents
def _create_search_request(self, query: str) -> SearchRequest:
"""Prepares a SearchRequest object."""
from google.cloud.discoveryengine_v1beta import SearchRequest
query_expansion_spec = SearchRequest.QueryExpansionSpec(
condition=self.query_expansion_condition,
)
spell_correction_spec = SearchRequest.SpellCorrectionSpec(
mode=self.spell_correction_mode
)
if self.engine_data_type == 0:
if self.get_extractive_answers:
extractive_content_spec = (
SearchRequest.ContentSearchSpec.ExtractiveContentSpec(
max_extractive_answer_count=self.max_extractive_answer_count,
)
)
else:
extractive_content_spec = (
SearchRequest.ContentSearchSpec.ExtractiveContentSpec(
max_extractive_segment_count=self.max_extractive_segment_count,
)
)
content_search_spec = SearchRequest.ContentSearchSpec(
extractive_content_spec=extractive_content_spec
)
elif self.engine_data_type == 1:
content_search_spec = None
else:
# TODO: Add extra data type handling for type website
raise NotImplementedError(
"Only engine data type 0 (Unstructured) or 1 (Structured)"
+ " are supported currently."
+ f" Got {self.engine_data_type}"
)
return SearchRequest(
query=query,
filter=self.filter,
serving_config=self._serving_config,
page_size=self.max_documents,
content_search_spec=content_search_spec,
query_expansion_spec=query_expansion_spec,
spell_correction_spec=spell_correction_spec,
)
def _get_relevant_documents(
self, query: str, *, run_manager: CallbackManagerForRetrieverRun
) -> List[Document]:
"""Get documents relevant for a query."""
from google.api_core.exceptions import InvalidArgument
search_request = self._create_search_request(query)
try:
response = self._client.search(search_request)
except InvalidArgument as e:
raise type(e)(
e.message + " This might be due to engine_data_type not set correctly."
)
if self.engine_data_type == 0:
documents = self._convert_unstructured_search_response(response.results)
elif self.engine_data_type == 1:
documents = self._convert_structured_search_response(response.results)
else:
# TODO: Add extra data type handling for type website
raise NotImplementedError(
"Only engine data type 0 (Unstructured) or 1 (Structured)"
+ " are supported currently."
+ f" Got {self.engine_data_type}"
)
return documents

View File

@ -0,0 +1,314 @@
"""Retriever wrapper for Google Vertex AI Search."""
from __future__ import annotations
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence
from langchain.callbacks.manager import CallbackManagerForRetrieverRun
from langchain.pydantic_v1 import Extra, Field, root_validator
from langchain.schema import BaseRetriever, Document
from langchain.utils import get_from_dict_or_env
if TYPE_CHECKING:
from google.cloud.discoveryengine_v1beta import (
SearchRequest,
SearchResult,
SearchServiceClient,
)
class GoogleVertexAISearchRetriever(BaseRetriever):
"""`Google Vertex AI Search` retriever.
For a detailed explanation of the Vertex AI Search concepts
and configuration parameters, refer to the product documentation.
https://cloud.google.com/generative-ai-app-builder/docs/enterprise-search-introduction
"""
project_id: str
"""Google Cloud Project ID."""
data_store_id: str
"""Vertex AI Search data store ID."""
serving_config_id: str = "default_config"
"""Vertex AI Search serving config ID."""
location_id: str = "global"
"""Vertex AI Search data store location."""
filter: Optional[str] = None
"""Filter expression."""
get_extractive_answers: bool = False
"""If True return Extractive Answers, otherwise return Extractive Segments."""
max_documents: int = Field(default=5, ge=1, le=100)
"""The maximum number of documents to return."""
max_extractive_answer_count: int = Field(default=1, ge=1, le=5)
"""The maximum number of extractive answers returned in each search result.
At most 5 answers will be returned for each SearchResult.
"""
max_extractive_segment_count: int = Field(default=1, ge=1, le=1)
"""The maximum number of extractive segments returned in each search result.
Currently one segment will be returned for each SearchResult.
"""
query_expansion_condition: int = Field(default=1, ge=0, le=2)
"""Specification to determine under which conditions query expansion should occur.
0 - Unspecified query expansion condition. In this case, server behavior defaults
to disabled
1 - Disabled query expansion. Only the exact search query is used, even if
SearchResponse.total_size is zero.
2 - Automatic query expansion built by the Search API.
"""
spell_correction_mode: int = Field(default=2, ge=0, le=2)
"""Specification to determine under which conditions query expansion should occur.
0 - Unspecified spell correction mode. In this case, server behavior defaults
to auto.
1 - Suggestion only. Search API will try to find a spell suggestion if there is any
and put in the `SearchResponse.corrected_query`.
The spell suggestion will not be used as the search query.
2 - Automatic spell correction built by the Search API.
Search will be based on the corrected query if found.
"""
credentials: Any = None
"""The default custom credentials (google.auth.credentials.Credentials) to use
when making API calls. If not provided, credentials will be ascertained from
the environment."""
# TODO: Add extra data type handling for type website
engine_data_type: int = Field(default=0, ge=0, le=1)
""" Defines the Vertex AI Search data type
0 - Unstructured data
1 - Structured data
"""
_client: SearchServiceClient
_serving_config: str
class Config:
"""Configuration for this pydantic object."""
extra = Extra.ignore
arbitrary_types_allowed = True
underscore_attrs_are_private = True
@root_validator(pre=True)
def validate_environment(cls, values: Dict) -> Dict:
"""Validates the environment."""
try:
from google.cloud import discoveryengine_v1beta # noqa: F401
except ImportError as exc:
raise ImportError(
"google.cloud.discoveryengine is not installed."
"Please install it with pip install google-cloud-discoveryengine"
) from exc
try:
from google.api_core.exceptions import InvalidArgument # noqa: F401
except ImportError as exc:
raise ImportError(
"google.api_core.exceptions is not installed. "
"Please install it with pip install google-api-core"
) from exc
values["project_id"] = get_from_dict_or_env(values, "project_id", "PROJECT_ID")
try:
# For backwards compatibility
search_engine_id = get_from_dict_or_env(
values, "search_engine_id", "SEARCH_ENGINE_ID"
)
if search_engine_id:
import warnings
warnings.warn(
"The `search_engine_id` parameter is deprecated. Use `data_store_id` instead.", # noqa: E501
DeprecationWarning,
)
values["data_store_id"] = search_engine_id
except: # noqa: E722
pass
values["data_store_id"] = get_from_dict_or_env(
values, "data_store_id", "DATA_STORE_ID"
)
return values
def __init__(self, **data: Any) -> None:
"""Initializes private fields."""
try:
from google.cloud.discoveryengine_v1beta import SearchServiceClient
except ImportError as exc:
raise ImportError(
"google.cloud.discoveryengine is not installed."
"Please install it with pip install google-cloud-discoveryengine"
) from exc
try:
from google.api_core.client_options import ClientOptions
except ImportError as exc:
raise ImportError(
"google.api_core.client_options is not installed."
"Please install it with pip install google-api-core"
) from exc
super().__init__(**data)
# For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
api_endpoint = (
"discoveryengine.googleapis.com"
if self.location_id == "global"
else f"{self.location_id}-discoveryengine.googleapis.com"
)
self._client = SearchServiceClient(
credentials=self.credentials,
client_options=ClientOptions(api_endpoint=api_endpoint),
)
self._serving_config = self._client.serving_config_path(
project=self.project_id,
location=self.location_id,
data_store=self.data_store_id,
serving_config=self.serving_config_id,
)
def _convert_unstructured_search_response(
self, results: Sequence[SearchResult]
) -> List[Document]:
"""Converts a sequence of search results to a list of LangChain documents."""
from google.protobuf.json_format import MessageToDict
documents: List[Document] = []
for result in results:
document_dict = MessageToDict(
result.document._pb, preserving_proto_field_name=True
)
derived_struct_data = document_dict.get("derived_struct_data")
if not derived_struct_data:
continue
doc_metadata = document_dict.get("struct_data", {})
doc_metadata["id"] = document_dict["id"]
chunk_type = (
"extractive_answers"
if self.get_extractive_answers
else "extractive_segments"
)
if chunk_type not in derived_struct_data:
continue
for chunk in derived_struct_data[chunk_type]:
doc_metadata["source"] = derived_struct_data.get("link", "")
if chunk_type == "extractive_answers":
doc_metadata["source"] += f":{chunk.get('pageNumber', '')}"
documents.append(
Document(
page_content=chunk.get("content", ""), metadata=doc_metadata
)
)
return documents
def _convert_structured_search_response(
self, results: Sequence[SearchResult]
) -> List[Document]:
"""Converts a sequence of search results to a list of LangChain documents."""
import json
from google.protobuf.json_format import MessageToDict
documents: List[Document] = []
for result in results:
document_dict = MessageToDict(
result.document._pb, preserving_proto_field_name=True
)
documents.append(
Document(
page_content=json.dumps(document_dict.get("struct_data", {})),
metadata={"id": document_dict["id"], "name": document_dict["name"]},
)
)
return documents
def _create_search_request(self, query: str) -> SearchRequest:
"""Prepares a SearchRequest object."""
from google.cloud.discoveryengine_v1beta import SearchRequest
query_expansion_spec = SearchRequest.QueryExpansionSpec(
condition=self.query_expansion_condition,
)
spell_correction_spec = SearchRequest.SpellCorrectionSpec(
mode=self.spell_correction_mode
)
if self.engine_data_type == 0:
if self.get_extractive_answers:
extractive_content_spec = (
SearchRequest.ContentSearchSpec.ExtractiveContentSpec(
max_extractive_answer_count=self.max_extractive_answer_count,
)
)
else:
extractive_content_spec = (
SearchRequest.ContentSearchSpec.ExtractiveContentSpec(
max_extractive_segment_count=self.max_extractive_segment_count,
)
)
content_search_spec = SearchRequest.ContentSearchSpec(
extractive_content_spec=extractive_content_spec
)
elif self.engine_data_type == 1:
content_search_spec = None
else:
# TODO: Add extra data type handling for type website
raise NotImplementedError(
"Only engine data type 0 (Unstructured) or 1 (Structured)"
+ " are supported currently."
+ f" Got {self.engine_data_type}"
)
return SearchRequest(
query=query,
filter=self.filter,
serving_config=self._serving_config,
page_size=self.max_documents,
content_search_spec=content_search_spec,
query_expansion_spec=query_expansion_spec,
spell_correction_spec=spell_correction_spec,
)
def _get_relevant_documents(
self, query: str, *, run_manager: CallbackManagerForRetrieverRun
) -> List[Document]:
"""Get documents relevant for a query."""
from google.api_core.exceptions import InvalidArgument
search_request = self._create_search_request(query)
try:
response = self._client.search(search_request)
except InvalidArgument as exc:
raise type(exc)(
exc.message
+ " This might be due to engine_data_type not set correctly."
)
if self.engine_data_type == 0:
documents = self._convert_unstructured_search_response(response.results)
elif self.engine_data_type == 1:
documents = self._convert_structured_search_response(response.results)
else:
# TODO: Add extra data type handling for type website
raise NotImplementedError(
"Only engine data type 0 (Unstructured) or 1 (Structured)"
+ " are supported currently."
+ f" Got {self.engine_data_type}"
)
return documents

View File

@ -1,32 +0,0 @@
"""Test Google Cloud Enterprise Search retriever.
You need to create a Gen App Builder search app and populate it
with data to run the integration tests.
Follow the instructions in the example notebook:
google_cloud_enterprise_search.ipynb
to set up the app and configure authentication.
Set the following environment variables before the tests:
PROJECT_ID - set to your Google Cloud project ID
SEARCH_ENGINE_ID - the ID of the search engine to use for the test
"""
import pytest
from langchain.retrievers.google_cloud_enterprise_search import (
GoogleCloudEnterpriseSearchRetriever,
)
from langchain.schema import Document
@pytest.mark.requires("google_api_core")
def test_google_cloud_enterprise_search_get_relevant_documents() -> None:
"""Test the get_relevant_documents() method."""
retriever = GoogleCloudEnterpriseSearchRetriever()
documents = retriever.get_relevant_documents("What are Alphabet's Other Bets?")
assert len(documents) > 0
for doc in documents:
assert isinstance(doc, Document)
assert doc.page_content
assert doc.metadata["id"]
assert doc.metadata["source"]

View File

@ -0,0 +1,61 @@
"""Test Google Vertex AI Search retriever.
You need to create a Vertex AI Search app and populate it
with data to run the integration tests.
Follow the instructions in the example notebook:
google_vertex_ai_search.ipynb
to set up the app and configure authentication.
Set the following environment variables before the tests:
PROJECT_ID - set to your Google Cloud project ID
DATA_STORE_ID - the ID of the search engine to use for the test
"""
import os
import pytest
from langchain.retrievers.google_cloud_enterprise_search import (
GoogleCloudEnterpriseSearchRetriever,
)
from langchain.retrievers.google_vertex_ai_search import GoogleVertexAISearchRetriever
from langchain.schema import Document
@pytest.mark.requires("google_api_core")
def test_google_vertex_ai_search_get_relevant_documents() -> None:
"""Test the get_relevant_documents() method."""
retriever = GoogleVertexAISearchRetriever()
documents = retriever.get_relevant_documents("What are Alphabet's Other Bets?")
assert len(documents) > 0
for doc in documents:
assert isinstance(doc, Document)
assert doc.page_content
assert doc.metadata["id"]
assert doc.metadata["source"]
@pytest.mark.requires("google_api_core")
def test_google_vertex_ai_search_enterprise_search_deprecation() -> None:
"""Test the deprecation of GoogleCloudEnterpriseSearchRetriever."""
with pytest.warns(
DeprecationWarning,
match="GoogleCloudEnterpriseSearchRetriever is deprecated, use GoogleVertexAISearchRetriever", # noqa: E501
):
retriever = GoogleCloudEnterpriseSearchRetriever()
os.environ["SEARCH_ENGINE_ID"] = os.getenv("DATA_STORE_ID", "data_store_id")
with pytest.warns(
DeprecationWarning,
match="The `search_engine_id` parameter is deprecated. Use `data_store_id` instead.", # noqa: E501
):
retriever = GoogleCloudEnterpriseSearchRetriever()
# Check that mapped methods still work.
documents = retriever.get_relevant_documents("What are Alphabet's Other Bets?")
assert len(documents) > 0
for doc in documents:
assert isinstance(doc, Document)
assert doc.page_content
assert doc.metadata["id"]
assert doc.metadata["source"]