community[minor]: Adds a vector store for Azure Cosmos DB for NoSQL (#21676)

This PR add supports for Azure Cosmos DB for NoSQL vector store.

Summary:

Description: added vector store integration for Azure Cosmos DB for
NoSQL Vector Store,
Dependencies: azure-cosmos dependency,
Tag maintainer: @hwchase17, @baskaryan @efriis @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This commit is contained in:
Aayush Kataria 2024-06-11 10:34:01 -07:00 committed by GitHub
parent 36cad5d25c
commit 71811e0547
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
14 changed files with 917 additions and 97 deletions

View File

@ -60,7 +60,7 @@
" * document addition by id (`add_documents` method with `ids` argument)\n",
" * delete by id (`delete` method with `ids` argument)\n",
"\n",
"Compatible Vectorstores: `Aerospike`, `AnalyticDB`, `AstraDB`, `AwaDB`, `Bagel`, `Cassandra`, `Chroma`, `CouchbaseVectorStore`, `DashVector`, `DatabricksVectorSearch`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `HanaDB`, `Milvus`, `MyScale`, `OpenSearchVectorSearch`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `Rockset`, `ScaNN`, `SupabaseVectorStore`, `SurrealDBStore`, `TimescaleVector`, `Vald`, `VDMS`, `Vearch`, `VespaStore`, `Weaviate`, `Yellowbrick`, `ZepVectorStore`, `TencentVectorDB`, `OpenSearchVectorSearch`.\n",
"Compatible Vectorstores: `Aerospike`, `AnalyticDB`, `AstraDB`, `AwaDB`, `AzureCosmosDBNoSqlVectorSearch`, `AzureCosmosDBVectorSearch`, `Bagel`, `Cassandra`, `Chroma`, `CouchbaseVectorStore`, `DashVector`, `DatabricksVectorSearch`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `HanaDB`, `Milvus`, `MyScale`, `OpenSearchVectorSearch`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `Rockset`, `ScaNN`, `SupabaseVectorStore`, `SurrealDBStore`, `TimescaleVector`, `Vald`, `VDMS`, `Vearch`, `VespaStore`, `Weaviate`, `Yellowbrick`, `ZepVectorStore`, `TencentVectorDB`, `OpenSearchVectorSearch`.\n",
" \n",
"## Caution\n",
"\n",

View File

@ -225,7 +225,7 @@ from langchain_community.document_loaders.onenote import OneNoteLoader
## Vector stores
### Azure Cosmos DB
### Azure Cosmos DB MongoDB vCore
>[Azure Cosmos DB for MongoDB vCore](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/) makes it easy to create a database with full native MongoDB support.
> You can apply your MongoDB experience and continue to use your favorite MongoDB drivers, SDKs, and tools by pointing your application to the API for MongoDB vCore account's connection string.
@ -255,6 +255,38 @@ See a [usage example](/docs/integrations/vectorstores/azure_cosmos_db).
from langchain_community.vectorstores import AzureCosmosDBVectorSearch
```
### Azure Cosmos DB NoSQL
>[Azure Cosmos DB for NoSQL](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/vector-search) now offers vector indexing and search in preview.
This feature is designed to handle high-dimensional vectors, enabling efficient and accurate vector search at any scale. You can now store vectors
directly in the documents alongside your data. This means that each document in your database can contain not only traditional schema-free data,
but also high-dimensional vectors as other properties of the documents. This colocation of data and vectors allows for efficient indexing and searching,
as the vectors are stored in the same logical unit as the data they represent. This simplifies data management, AI application architectures, and the
efficiency of vector-based operations.
#### Installation and Setup
See [detail configuration instructions](/docs/integrations/vectorstores/azure_cosmos_db_no_sql).
We need to install `azure-cosmos` python package.
```bash
pip install azure-cosmos
```
#### Deploy Azure Cosmos DB on Microsoft Azure
Azure Cosmos DB offers a solution for modern apps and intelligent workloads by being very responsive with dynamic and elastic autoscale. It is available
in every Azure region and can automatically replicate data closer to users. It has SLA guaranteed low-latency and high availability.
[Sign Up](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/quickstart-python?pivots=devcontainer-codespace) for free to get started today.
See a [usage example](/docs/integrations/vectorstores/azure_cosmos_db_no_sql).
```python
from langchain_community.vectorstores import AzureCosmosDBNoSQLVectorSearch
```
## Retrievers
### Azure AI Search

View File

@ -3,11 +3,9 @@
{
"cell_type": "markdown",
"id": "245c0aa70db77606",
"metadata": {
"collapsed": false
},
"metadata": {},
"source": [
"# Azure Cosmos DB\n",
"# Azure Cosmos DB Mongo vCore\n",
"\n",
"This notebook shows you how to leverage this integrated [vector database](https://learn.microsoft.com/en-us/azure/cosmos-db/vector-database) to store documents in collections, create indicies and perform vector search queries using approximate nearest neighbor algorithms such as COS (cosine distance), L2 (Euclidean distance), and IP (inner product) to locate documents close to the query vectors. \n",
" \n",
@ -22,9 +20,7 @@
{
"cell_type": "markdown",
"id": "8c493e205ce1dda5",
"metadata": {
"collapsed": false
},
"metadata": {},
"source": []
},
{
@ -35,8 +31,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:25:05.278480Z",
"start_time": "2024-02-08T18:24:51.560677Z"
},
"collapsed": false
}
},
"outputs": [
{
@ -62,8 +57,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:25:56.926147Z",
"start_time": "2024-02-08T18:25:56.900087Z"
},
"collapsed": false
}
},
"outputs": [],
"source": [
@ -78,9 +72,7 @@
{
"cell_type": "markdown",
"id": "f2e66b097c6ce2e3",
"metadata": {
"collapsed": false
},
"metadata": {},
"source": [
"We want to use `OpenAIEmbeddings` so we need to set up our Azure OpenAI API Key alongside other environment variables. "
]
@ -93,8 +85,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:26:06.558294Z",
"start_time": "2024-02-08T18:26:06.550008Z"
},
"collapsed": false
}
},
"outputs": [],
"source": [
@ -114,9 +105,7 @@
{
"cell_type": "markdown",
"id": "ebaa28c6e2b35063",
"metadata": {
"collapsed": false
},
"metadata": {},
"source": [
"Now, we need to load the documents into the collection, create the index and then run our queries against the index to retrieve matches.\n",
"\n",
@ -131,8 +120,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:27:00.782280Z",
"start_time": "2024-02-08T18:26:47.339151Z"
},
"collapsed": false
}
},
"outputs": [],
"source": [
@ -172,8 +160,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:31:13.486173Z",
"start_time": "2024-02-08T18:30:54.175890Z"
},
"collapsed": false
}
},
"outputs": [
{
@ -236,8 +223,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:31:47.468902Z",
"start_time": "2024-02-08T18:31:46.053602Z"
},
"collapsed": false
}
},
"outputs": [],
"source": [
@ -254,8 +240,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:31:50.982598Z",
"start_time": "2024-02-08T18:31:50.977605Z"
},
"collapsed": false
}
},
"outputs": [
{
@ -279,9 +264,7 @@
{
"cell_type": "markdown",
"id": "37e4df8c7d7db851",
"metadata": {
"collapsed": false
},
"metadata": {},
"source": [
"Once the documents have been loaded and the index has been created, you can now instantiate the vector store directly and run queries against the index"
]
@ -294,8 +277,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:32:14.299599Z",
"start_time": "2024-02-08T18:32:12.923464Z"
},
"collapsed": false
}
},
"outputs": [
{
@ -332,8 +314,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:32:24.021434Z",
"start_time": "2024-02-08T18:32:22.867658Z"
},
"collapsed": false
}
},
"outputs": [
{
@ -366,30 +347,28 @@
"cell_type": "code",
"execution_count": null,
"id": "b63c73c7e905001c",
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,

File diff suppressed because one or more lines are too long

View File

@ -1,5 +1,5 @@
"""Wrapper for Rememberizer APIs."""
from typing import Dict, List, Optional
from typing import Dict, List, Optional, cast
import requests
from langchain_core.documents import Document
@ -26,7 +26,9 @@ class RememberizerAPIWrapper(BaseModel):
def search(self, query: str) -> dict:
"""Search for a query in the Rememberizer API."""
url = f"https://api.rememberizer.ai/api/v1/documents/search?q={query}&n={self.top_k_results}"
response = requests.get(url, headers={"x-api-key": self.rememberizer_api_key})
response = requests.get(
url, headers={"x-api-key": cast(str, self.rememberizer_api_key)}
)
data = response.json()
if response.status_code != 200:

View File

@ -55,6 +55,9 @@ if TYPE_CHECKING:
from langchain_community.vectorstores.azure_cosmos_db import (
AzureCosmosDBVectorSearch,
)
from langchain_community.vectorstores.azure_cosmos_db_no_sql import (
AzureCosmosDBNoSqlVectorSearch,
)
from langchain_community.vectorstores.azuresearch import (
AzureSearch,
)
@ -311,6 +314,7 @@ __all__ = [
"AstraDB",
"AtlasDB",
"AwaDB",
"AzureCosmosDBNoSqlVectorSearch",
"AzureCosmosDBVectorSearch",
"AzureSearch",
"BESVectorStore",
@ -412,7 +416,8 @@ _module_lookup = {
"AstraDB": "langchain_community.vectorstores.astradb",
"AtlasDB": "langchain_community.vectorstores.atlas",
"AwaDB": "langchain_community.vectorstores.awadb",
"AzureCosmosDBVectorSearch": "langchain_community.vectorstores.azure_cosmos_db",
"AzureCosmosDBNoSqlVectorSearch": "langchain_community.vectorstores.azure_cosmos_db_no_sql", # noqa: E501
"AzureCosmosDBVectorSearch": "langchain_community.vectorstores.azure_cosmos_db", # noqa: E501
"AzureSearch": "langchain_community.vectorstores.azuresearch",
"BaiduVectorDB": "langchain_community.vectorstores.baiduvectordb",
"BESVectorStore": "langchain_community.vectorstores.baiducloud_vector_search",

View File

@ -11,7 +11,6 @@ from typing import (
List,
Optional,
Tuple,
TypeVar,
Union,
)
@ -47,8 +46,6 @@ class CosmosDBVectorSearchType(str, Enum):
"""HNSW vector index"""
CosmosDBDocumentType = TypeVar("CosmosDBDocumentType", bound=Dict[str, Any])
logger = logging.getLogger(__name__)
DEFAULT_INSERT_BATCH_SIZE = 128
@ -64,7 +61,8 @@ class AzureCosmosDBVectorSearch(VectorStore):
Example:
. code-block:: python
from langchain_community.vectorstores import AzureCosmosDBVectorSearch
from langchain_community.vectorstores import
AzureCosmosDBVectorSearch
from langchain_community.embeddings.openai import OpenAIEmbeddings
from pymongo import MongoClient
@ -76,12 +74,13 @@ class AzureCosmosDBVectorSearch(VectorStore):
def __init__(
self,
collection: Collection[CosmosDBDocumentType],
collection: Collection,
embedding: Embeddings,
*,
index_name: str = "vectorSearchIndex",
text_key: str = "textContent",
embedding_key: str = "vectorContent",
application_name: str = "LANGCHAIN_PYTHON",
):
"""Constructor for AzureCosmosDBVectorSearch
@ -99,6 +98,7 @@ class AzureCosmosDBVectorSearch(VectorStore):
self._index_name = index_name
self._text_key = text_key
self._embedding_key = embedding_key
self._application_name = application_name
@property
def embeddings(self) -> Embeddings:
@ -122,7 +122,8 @@ class AzureCosmosDBVectorSearch(VectorStore):
application_name: str = "LANGCHAIN_PYTHON",
**kwargs: Any,
) -> AzureCosmosDBVectorSearch:
"""Creates an Instance of AzureCosmosDBVectorSearch from a Connection String
"""Creates an Instance of AzureCosmosDBVectorSearch
from a Connection String
Args:
connection_string: The MongoDB vCore instance connection string
@ -357,7 +358,7 @@ class AzureCosmosDBVectorSearch(VectorStore):
texts: List[str],
embedding: Embeddings,
metadatas: Optional[List[dict]] = None,
collection: Optional[Collection[CosmosDBDocumentType]] = None,
collection: Optional[Collection] = None,
**kwargs: Any,
) -> AzureCosmosDBVectorSearch:
if collection is None:
@ -581,5 +582,5 @@ class AzureCosmosDBVectorSearch(VectorStore):
)
return docs
def get_collection(self) -> Collection[CosmosDBDocumentType]:
def get_collection(self) -> Collection:
return self._collection

View File

@ -0,0 +1,337 @@
from __future__ import annotations
import uuid
import warnings
from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Optional, Tuple
import numpy as np
from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings
from langchain_core.vectorstores import VectorStore
from langchain_community.vectorstores.utils import maximal_marginal_relevance
if TYPE_CHECKING:
from azure.cosmos.cosmos_client import CosmosClient
class AzureCosmosDBNoSqlVectorSearch(VectorStore):
"""`Azure Cosmos DB for NoSQL` vector store.
To use, you should have both:
- the ``azure-cosmos`` python package installed
You can read more about vector search using AzureCosmosDBNoSQL here:
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/vector-search
"""
def __init__(
self,
*,
cosmos_client: CosmosClient,
embedding: Embeddings,
vector_embedding_policy: Dict[str, Any],
indexing_policy: Dict[str, Any],
cosmos_container_properties: Dict[str, Any],
cosmos_database_properties: Dict[str, Any],
database_name: str = "vectorSearchDB",
container_name: str = "vectorSearchContainer",
create_container: bool = True,
):
"""
Constructor for AzureCosmosDBNoSqlVectorSearch
Args:
cosmos_client: Client used to connect to azure cosmosdb no sql account.
database_name: Name of the database to be created.
container_name: Name of the container to be created.
embedding: Text embedding model to use.
vector_embedding_policy: Vector Embedding Policy for the container.
indexing_policy: Indexing Policy for the container.
cosmos_container_properties: Container Properties for the container.
cosmos_database_properties: Database Properties for the container.
"""
self._cosmos_client = cosmos_client
self._database_name = database_name
self._container_name = container_name
self._embedding = embedding
self._vector_embedding_policy = vector_embedding_policy
self._indexing_policy = indexing_policy
self._cosmos_container_properties = cosmos_container_properties
self._cosmos_database_properties = cosmos_database_properties
self._create_container = create_container
if self._create_container:
if (
indexing_policy["vectorIndexes"] is None
or len(indexing_policy["vectorIndexes"]) == 0
):
raise ValueError(
"vectorIndexes cannot be null or empty in the indexing_policy."
)
if (
vector_embedding_policy is None
or len(vector_embedding_policy["vectorEmbeddings"]) == 0
):
raise ValueError(
"vectorEmbeddings cannot be null "
"or empty in the vector_embedding_policy."
)
if self._cosmos_container_properties["partition_key"] is None:
raise ValueError(
"partition_key cannot be null or empty for a container."
)
# Create the database if it already doesn't exist
self._database = self._cosmos_client.create_database_if_not_exists(
id=self._database_name,
offer_throughput=self._cosmos_database_properties.get("offer_throughput"),
session_token=self._cosmos_database_properties.get("session_token"),
initial_headers=self._cosmos_database_properties.get("initial_headers"),
etag=self._cosmos_database_properties.get("etag"),
match_condition=self._cosmos_database_properties.get("match_condition"),
)
# Create the collection if it already doesn't exist
self._container = self._database.create_container_if_not_exists(
id=self._container_name,
partition_key=self._cosmos_container_properties["partition_key"],
indexing_policy=self._indexing_policy,
default_ttl=self._cosmos_container_properties.get("default_ttl"),
offer_throughput=self._cosmos_container_properties.get("offer_throughput"),
unique_key_policy=self._cosmos_container_properties.get(
"unique_key_policy"
),
conflict_resolution_policy=self._cosmos_container_properties.get(
"conflict_resolution_policy"
),
analytical_storage_ttl=self._cosmos_container_properties.get(
"analytical_storage_ttl"
),
computed_properties=self._cosmos_container_properties.get(
"computed_properties"
),
etag=self._cosmos_container_properties.get("etag"),
match_condition=self._cosmos_container_properties.get("match_condition"),
session_token=self._cosmos_container_properties.get("session_token"),
initial_headers=self._cosmos_container_properties.get("initial_headers"),
vector_embedding_policy=self._vector_embedding_policy,
)
self._embedding_key = self._vector_embedding_policy["vectorEmbeddings"][0][
"path"
][1:]
def add_texts(
self,
texts: Iterable[str],
metadatas: Optional[List[dict]] = None,
**kwargs: Any,
) -> List[str]:
"""Run more texts through the embeddings and add to the vectorstore.
Args:
texts: Iterable of strings to add to the vectorstore.
metadatas: Optional list of metadatas associated with the texts.
Returns:
List of ids from adding the texts into the vectorstore.
"""
_metadatas = list(metadatas if metadatas is not None else ({} for _ in texts))
return self._insert_texts(list(texts), _metadatas)
def _insert_texts(
self, texts: List[str], metadatas: List[Dict[str, Any]]
) -> List[str]:
"""Used to Load Documents into the collection
Args:
texts: The list of documents strings to load
metadatas: The list of metadata objects associated with each document
Returns:
List of ids from adding the texts into the vectorstore.
"""
# If the texts is empty, throw an error
if not texts:
raise Exception("Texts can not be null or empty")
# Embed and create the documents
embeddings = self._embedding.embed_documents(texts)
text_key = "text"
to_insert = [
{"id": str(uuid.uuid4()), text_key: t, self._embedding_key: embedding, **m}
for t, m, embedding in zip(texts, metadatas, embeddings)
]
# insert the documents in CosmosDB No Sql
doc_ids: List[str] = []
for item in to_insert:
created_doc = self._container.create_item(item)
doc_ids.append(created_doc["id"])
return doc_ids
@classmethod
def _from_kwargs(
cls,
embedding: Embeddings,
*,
cosmos_client: CosmosClient,
vector_embedding_policy: Dict[str, Any],
indexing_policy: Dict[str, Any],
cosmos_container_properties: Dict[str, Any],
cosmos_database_properties: Dict[str, Any],
database_name: str = "vectorSearchDB",
container_name: str = "vectorSearchContainer",
**kwargs: Any,
) -> AzureCosmosDBNoSqlVectorSearch:
if kwargs:
warnings.warn(
"Method 'from_texts' of AzureCosmosDBNoSql vector "
"store invoked with "
f"unsupported arguments "
f"({', '.join(sorted(kwargs))}), "
"which will be ignored."
)
return cls(
embedding=embedding,
cosmos_client=cosmos_client,
vector_embedding_policy=vector_embedding_policy,
indexing_policy=indexing_policy,
cosmos_container_properties=cosmos_container_properties,
cosmos_database_properties=cosmos_database_properties,
database_name=database_name,
container_name=container_name,
)
@classmethod
def from_texts(
cls,
texts: List[str],
embedding: Embeddings,
metadatas: Optional[List[dict]] = None,
**kwargs: Any,
) -> AzureCosmosDBNoSqlVectorSearch:
"""Create an AzureCosmosDBNoSqlVectorSearch vectorstore from raw texts.
Args:
texts: the texts to insert.
embedding: the embedding function to use in the store.
metadatas: metadata dicts for the texts.
**kwargs: you can pass any argument that you would
to :meth:`~add_texts` and/or to the 'AstraDB' constructor
(see these methods for details). These arguments will be
routed to the respective methods as they are.
Returns:
an `AzureCosmosDBNoSqlVectorSearch` vectorstore.
"""
vectorstore = AzureCosmosDBNoSqlVectorSearch._from_kwargs(embedding, **kwargs)
vectorstore.add_texts(
texts=texts,
metadatas=metadatas,
)
return vectorstore
def delete(self, ids: Optional[List[str]] = None, **kwargs: Any) -> Optional[bool]:
if ids is None:
raise ValueError("No document ids provided to delete.")
for document_id in ids:
self._container.delete_item(document_id)
return True
def delete_document_by_id(self, document_id: Optional[str] = None) -> None:
"""Removes a Specific Document by id
Args:
document_id: The document identifier
"""
if document_id is None:
raise ValueError("No document ids provided to delete.")
self._container.delete_item(document_id, partition_key=document_id)
def _similarity_search_with_score(
self,
embeddings: List[float],
k: int = 4,
) -> List[Tuple[Document, float]]:
query = (
"SELECT TOP {} c.id, c.{}, c.text, VectorDistance(c.{}, {}) AS "
"SimilarityScore FROM c ORDER BY VectorDistance(c.{}, {})".format(
k,
self._embedding_key,
self._embedding_key,
embeddings,
self._embedding_key,
embeddings,
)
)
docs_and_scores = []
items = list(
self._container.query_items(query=query, enable_cross_partition_query=True)
)
for item in items:
text = item["text"]
score = item["SimilarityScore"]
docs_and_scores.append((Document(page_content=text, metadata=item), score))
return docs_and_scores
def similarity_search_with_score(
self,
query: str,
k: int = 4,
) -> List[Tuple[Document, float]]:
embeddings = self._embedding.embed_query(query)
docs_and_scores = self._similarity_search_with_score(embeddings=embeddings, k=k)
return docs_and_scores
def similarity_search(
self, query: str, k: int = 4, **kwargs: Any
) -> List[Document]:
docs_and_scores = self.similarity_search_with_score(query, k=k)
return [doc for doc, _ in docs_and_scores]
def max_marginal_relevance_search_by_vector(
self,
embedding: List[float],
k: int = 4,
fetch_k: int = 20,
lambda_mult: float = 0.5,
**kwargs: Any,
) -> List[Document]:
# Retrieves the docs with similarity scores
docs = self._similarity_search_with_score(embeddings=embedding, k=fetch_k)
# Re-ranks the docs using MMR
mmr_doc_indexes = maximal_marginal_relevance(
np.array(embedding),
[doc.metadata[self._embedding_key] for doc, _ in docs],
k=k,
lambda_mult=lambda_mult,
)
mmr_docs = [docs[i][0] for i in mmr_doc_indexes]
return mmr_docs
def max_marginal_relevance_search(
self,
query: str,
k: int = 4,
fetch_k: int = 20,
lambda_mult: float = 0.5,
**kwargs: Any,
) -> List[Document]:
# compute the embeddings vector from the query string
embeddings = self._embedding.embed_query(query)
docs = self.max_marginal_relevance_search_by_vector(
embeddings,
k=k,
fetch_k=fetch_k,
lambda_mult=lambda_mult,
)
return docs

View File

@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.7.1 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.5.1 and should not be changed by hand.
[[package]]
name = "aiohttp"
@ -2116,7 +2116,7 @@ files = [
[[package]]
name = "langchain"
version = "0.2.2"
version = "0.2.3"
description = "Building applications with LLMs through composability"
optional = false
python-versions = ">=3.8.1,<4.0"
@ -2125,6 +2125,7 @@ develop = true
[package.dependencies]
aiohttp = "^3.8.3"
async-timeout = {version = "^4.0.0", markers = "python_version < \"3.11\""}
langchain-core = "^0.2.0"
langchain-text-splitters = "^0.2.0"
langsmith = "^0.1.17"
@ -2141,7 +2142,7 @@ url = "../langchain"
[[package]]
name = "langchain-core"
version = "0.2.4"
version = "0.2.5"
description = "Building applications with LLMs through composability"
optional = false
python-versions = ">=3.8.1,<4.0"
@ -2150,7 +2151,7 @@ develop = true
[package.dependencies]
jsonpatch = "^1.33"
langsmith = "^0.1.66"
langsmith = "^0.1.75"
packaging = "^23.2"
pydantic = ">=1,<3"
PyYAML = ">=5.3"
@ -2178,13 +2179,13 @@ url = "../text-splitters"
[[package]]
name = "langsmith"
version = "0.1.73"
version = "0.1.75"
description = "Client library to connect to the LangSmith LLM Tracing and Evaluation Platform."
optional = false
python-versions = "<4.0,>=3.8.1"
files = [
{file = "langsmith-0.1.73-py3-none-any.whl", hash = "sha256:38bfcce2cfcf0b2da2e9628b903c9e768e1ce59d450e8a584514c1638c595e93"},
{file = "langsmith-0.1.73.tar.gz", hash = "sha256:0055471cb1fddb76ec65499716764ad0b0314affbdf33ff1f72ad5e2d6a3b224"},
{file = "langsmith-0.1.75-py3-none-any.whl", hash = "sha256:d08b08dd6b3fa4da170377f95123d77122ef4c52999d10fff4ae08ff70d07aed"},
{file = "langsmith-0.1.75.tar.gz", hash = "sha256:61274e144ea94c297dd78ce03e6dfae18459fe9bd8ab5094d61a0c4816561279"},
]
[package.dependencies]
@ -3022,7 +3023,7 @@ files = [
[package.dependencies]
numpy = [
{version = ">=1.20.3", markers = "python_version < \"3.10\""},
{version = ">=1.21.0", markers = "python_version >= \"3.10\" and python_version < \"3.11\""},
{version = ">=1.21.0", markers = "python_version >= \"3.10\""},
{version = ">=1.23.2", markers = "python_version >= \"3.11\""},
]
python-dateutil = ">=2.8.2"
@ -4520,7 +4521,7 @@ files = [
]
[package.dependencies]
greenlet = {version = "!=0.4.17", markers = "platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\""}
greenlet = {version = "!=0.4.17", markers = "platform_machine == \"win32\" or platform_machine == \"WIN32\" or platform_machine == \"AMD64\" or platform_machine == \"amd64\" or platform_machine == \"x86_64\" or platform_machine == \"ppc64le\" or platform_machine == \"aarch64\""}
typing-extensions = ">=4.6.0"
[package.extras]
@ -5098,20 +5099,6 @@ files = [
[package.dependencies]
types-urllib3 = "*"
[[package]]
name = "types-requests"
version = "2.32.0.20240602"
description = "Typing stubs for requests"
optional = false
python-versions = ">=3.8"
files = [
{file = "types-requests-2.32.0.20240602.tar.gz", hash = "sha256:3f98d7bbd0dd94ebd10ff43a7fbe20c3b8528acace6d8efafef0b6a184793f06"},
{file = "types_requests-2.32.0.20240602-py3-none-any.whl", hash = "sha256:ed3946063ea9fbc6b5fc0c44fa279188bae42d582cb63760be6cb4b9d06c3de8"},
]
[package.dependencies]
urllib3 = ">=2"
[[package]]
name = "types-setuptools"
version = "70.0.0.20240524"
@ -5212,23 +5199,6 @@ brotli = ["brotli (==1.0.9)", "brotli (>=1.0.9)", "brotlicffi (>=0.8.0)", "brotl
secure = ["certifi", "cryptography (>=1.3.4)", "idna (>=2.0.0)", "ipaddress", "pyOpenSSL (>=0.14)", "urllib3-secure-extra"]
socks = ["PySocks (>=1.5.6,!=1.5.7,<2.0)"]
[[package]]
name = "urllib3"
version = "2.2.1"
description = "HTTP library with thread-safe connection pooling, file post, and more."
optional = false
python-versions = ">=3.8"
files = [
{file = "urllib3-2.2.1-py3-none-any.whl", hash = "sha256:450b20ec296a467077128bff42b73080516e71b56ff59a60a02bef2232c4fa9d"},
{file = "urllib3-2.2.1.tar.gz", hash = "sha256:d0570876c61ab9e520d776c38acbbb5b05a776d3f9ff98a5c8fd5162a444cf19"},
]
[package.extras]
brotli = ["brotli (>=1.0.9)", "brotlicffi (>=0.8.0)"]
h2 = ["h2 (>=4,<5)"]
socks = ["pysocks (>=1.5.6,!=1.5.7,<2.0)"]
zstd = ["zstandard (>=0.18.0)"]
[[package]]
name = "vcrpy"
version = "6.0.1"
@ -5651,4 +5621,4 @@ test = ["big-O", "importlib-resources", "jaraco.functools", "jaraco.itertools",
[metadata]
lock-version = "2.0"
python-versions = ">=3.8.1,<4.0"
content-hash = "21ead159a299fcc5cc0a9038ddcee5b4f355c893f23ef80456b72941ad3122fd"
content-hash = "deaf3c8d3c816e32d176b2ccc8646e3100be1f96e497833c2a528e274b50bfef"

View File

@ -20,7 +20,6 @@ tenacity = "^8.1.0"
dataclasses-json = ">= 0.5.7, < 0.7"
langsmith = "^0.1.0"
[tool.poetry.group.test]
optional = true
@ -59,6 +58,7 @@ optional = true
# Instead read the following link:
# https://python.langchain.com/docs/contributing/code#working-with-optional-dependencies
pytest-vcr = "^1.0.2"
vcrpy = "^6"
wrapt = "^1.15.0"
openai = "^1"
python-dotenv = "^1.0.0"
@ -96,7 +96,7 @@ optional = true
[tool.poetry.group.dev.dependencies]
jupyter = "^1.0.0"
setuptools = "^67.6.1"
langchain-core = { path = "../core", develop = true }
langchain-core = {path = "../core", develop = true}
[tool.ruff]
exclude = [

View File

@ -0,0 +1,155 @@
"""Test AzureCosmosDBNoSqlVectorSearch functionality."""
import logging
import os
from time import sleep
from typing import Any
import pytest
from langchain_core.documents import Document
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores.azure_cosmos_db_no_sql import (
AzureCosmosDBNoSqlVectorSearch,
)
logging.basicConfig(level=logging.DEBUG)
model_deployment = os.getenv(
"OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada"
)
model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002")
# Host and Key for CosmosDB No SQl
HOST = os.environ.get("HOST")
KEY = os.environ.get("KEY")
database_name = "langchain_python_db"
container_name = "langchain_python_container"
@pytest.fixture()
def cosmos_client() -> Any:
from azure.cosmos import CosmosClient
return CosmosClient(HOST, KEY)
@pytest.fixture()
def partition_key() -> Any:
from azure.cosmos import PartitionKey
return PartitionKey(path="/id")
@pytest.fixture()
def azure_openai_embeddings() -> Any:
openai_embeddings: OpenAIEmbeddings = OpenAIEmbeddings(
deployment=model_deployment, model=model_name, chunk_size=1
)
return openai_embeddings
def safe_delete_database(cosmos_client: Any) -> None:
cosmos_client.delete_database(database_name)
def get_vector_indexing_policy(embedding_type: str) -> dict:
return {
"indexingMode": "consistent",
"includedPaths": [{"path": "/*"}],
"excludedPaths": [{"path": '/"_etag"/?'}],
"vectorIndexes": [{"path": "/embedding", "type": embedding_type}],
}
def get_vector_embedding_policy(
distance_function: str, data_type: str, dimensions: int
) -> dict:
return {
"vectorEmbeddings": [
{
"path": "/embedding",
"dataType": data_type,
"dimensions": dimensions,
"distanceFunction": distance_function,
}
]
}
class TestAzureCosmosDBNoSqlVectorSearch:
def test_from_documents_cosine_distance(
self,
cosmos_client: Any,
partition_key: Any,
azure_openai_embeddings: OpenAIEmbeddings,
) -> None:
"""Test end to end construction and search."""
documents = [
Document(page_content="Dogs are tough.", metadata={"a": 1}),
Document(page_content="Cats have fluff.", metadata={"b": 1}),
Document(page_content="What is a sandwich?", metadata={"c": 1}),
Document(page_content="That fence is purple.", metadata={"d": 1, "e": 2}),
]
store = AzureCosmosDBNoSqlVectorSearch.from_documents(
documents,
azure_openai_embeddings,
cosmos_client=cosmos_client,
database_name=database_name,
container_name=container_name,
vector_embedding_policy=get_vector_embedding_policy(
"cosine", "float32", 400
),
indexing_policy=get_vector_indexing_policy("flat"),
cosmos_container_properties={"partition_key": partition_key},
)
sleep(1) # waits for Cosmos DB to save contents to the collection
output = store.similarity_search("Dogs", k=2)
assert output
assert output[0].page_content == "Dogs are tough."
safe_delete_database(cosmos_client)
def test_from_texts_cosine_distance_delete_one(
self,
cosmos_client: Any,
partition_key: Any,
azure_openai_embeddings: OpenAIEmbeddings,
) -> None:
texts = [
"Dogs are tough.",
"Cats have fluff.",
"What is a sandwich?",
"That fence is purple.",
]
metadatas = [{"a": 1}, {"b": 1}, {"c": 1}, {"d": 1, "e": 2}]
store = AzureCosmosDBNoSqlVectorSearch.from_texts(
texts,
azure_openai_embeddings,
metadatas,
cosmos_client=cosmos_client,
database_name=database_name,
container_name=container_name,
vector_embedding_policy=get_vector_embedding_policy(
"cosine", "float32", 400
),
indexing_policy=get_vector_indexing_policy("flat"),
cosmos_container_properties={"partition_key": partition_key},
)
sleep(1) # waits for Cosmos DB to save contents to the collection
output = store.similarity_search("Dogs", k=1)
assert output
assert output[0].page_content == "Dogs are tough."
# delete one document
store.delete_document_by_id(str(output[0].metadata["id"]))
sleep(2)
output2 = store.similarity_search("Dogs", k=1)
assert output2
assert output2[0].page_content != "Dogs are tough."
safe_delete_database(cosmos_client)

View File

@ -20,5 +20,6 @@ def test_vectorstores() -> None:
"AlibabaCloudOpenSearchSettings",
"ClickhouseSettings",
"MyScaleSettings",
"AzureCosmosDBVectorSearch",
]:
assert issubclass(getattr(vectorstores, cls), VectorStore)

View File

@ -13,6 +13,7 @@ EXPECTED_ALL = [
"AstraDB",
"AtlasDB",
"AwaDB",
"AzureCosmosDBNoSqlVectorSearch",
"AzureCosmosDBVectorSearch",
"AzureSearch",
"BESVectorStore",

View File

@ -50,6 +50,7 @@ def test_compatible_vectorstore_documentation() -> None:
"AnalyticDB",
"AstraDB",
"AzureCosmosDBVectorSearch",
"AzureCosmosDBNoSqlVectorSearch",
"AzureSearch",
"AwaDB",
"Bagel",