community[minor]: Adds a vector store for Azure Cosmos DB for NoSQL (#21676)

This PR add supports for Azure Cosmos DB for NoSQL vector store. Summary: Description: added vector store integration for Azure Cosmos DB for NoSQL Vector Store, Dependencies: azure-cosmos dependency, Tag maintainer: @hwchase17, @baskaryan @efriis @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
4 months ago · 71811e0547
parent 36cad5d25c
commit 71811e0547
14 changed files with 917 additions and 97 deletions
--- a/docs/docs/how_to/indexing.ipynb
+++ b/docs/docs/how_to/indexing.ipynb
@ -60,7 +60,7 @@
    "   * document addition by id (`add_documents` method with `ids` argument)\n",
    "   * delete by id (`delete` method with `ids` argument)\n",
    "\n",
-    "Compatible Vectorstores: `Aerospike`, `AnalyticDB`, `AstraDB`, `AwaDB`, `Bagel`, `Cassandra`, `Chroma`, `CouchbaseVectorStore`, `DashVector`, `DatabricksVectorSearch`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `HanaDB`, `Milvus`, `MyScale`, `OpenSearchVectorSearch`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `Rockset`, `ScaNN`, `SupabaseVectorStore`, `SurrealDBStore`, `TimescaleVector`, `Vald`, `VDMS`, `Vearch`, `VespaStore`, `Weaviate`, `Yellowbrick`, `ZepVectorStore`, `TencentVectorDB`, `OpenSearchVectorSearch`.\n",
+    "Compatible Vectorstores: `Aerospike`, `AnalyticDB`, `AstraDB`, `AwaDB`, `AzureCosmosDBNoSqlVectorSearch`, `AzureCosmosDBVectorSearch`, `Bagel`, `Cassandra`, `Chroma`, `CouchbaseVectorStore`, `DashVector`, `DatabricksVectorSearch`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `HanaDB`, `Milvus`, `MyScale`, `OpenSearchVectorSearch`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `Rockset`, `ScaNN`, `SupabaseVectorStore`, `SurrealDBStore`, `TimescaleVector`, `Vald`, `VDMS`, `Vearch`, `VespaStore`, `Weaviate`, `Yellowbrick`, `ZepVectorStore`, `TencentVectorDB`, `OpenSearchVectorSearch`.\n",
    "  \n",
    "## Caution\n",
    "\n",
--- a/docs/docs/integrations/platforms/microsoft.mdx
+++ b/docs/docs/integrations/platforms/microsoft.mdx
@ -225,7 +225,7 @@ from langchain_community.document_loaders.onenote import OneNoteLoader

 ## Vector stores

-### Azure Cosmos DB
+### Azure Cosmos DB MongoDB vCore

 >[Azure Cosmos DB for MongoDB vCore](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/) makes it easy to create a database with full native MongoDB support.
 > You can apply your MongoDB experience and continue to use your favorite MongoDB drivers, SDKs, and tools by pointing your application to the API for MongoDB vCore account's connection string.
@ -255,6 +255,38 @@ See a [usage example](/docs/integrations/vectorstores/azure_cosmos_db).
 from langchain_community.vectorstores import AzureCosmosDBVectorSearch
 ```

+### Azure Cosmos DB NoSQL
+
+>[Azure Cosmos DB for NoSQL](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/vector-search) now offers vector indexing and search in preview.
+This feature is designed to handle high-dimensional vectors, enabling efficient and accurate vector search at any scale. You can now store vectors
+directly in the documents alongside your data. This means that each document in your database can contain not only traditional schema-free data,
+but also high-dimensional vectors as other properties of the documents. This colocation of data and vectors allows for efficient indexing and searching,
+as the vectors are stored in the same logical unit as the data they represent. This simplifies data management, AI application architectures, and the
+efficiency of vector-based operations.
+
+#### Installation and Setup
+
+See [detail configuration instructions](/docs/integrations/vectorstores/azure_cosmos_db_no_sql).
+
+We need to install `azure-cosmos` python package.
+
+```bash
+pip install azure-cosmos
+```
+
+#### Deploy Azure Cosmos DB on Microsoft Azure
+
+Azure Cosmos DB offers a solution for modern apps and intelligent workloads by being very responsive with dynamic and elastic autoscale. It is available
+in every Azure region and can automatically replicate data closer to users. It has SLA guaranteed low-latency and high availability.
+
+[Sign Up](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/quickstart-python?pivots=devcontainer-codespace) for free to get started today.
+
+See a [usage example](/docs/integrations/vectorstores/azure_cosmos_db_no_sql).
+
+```python
+from langchain_community.vectorstores import AzureCosmosDBNoSQLVectorSearch
+```
+
 ## Retrievers
 ### Azure AI Search

--- a/docs/docs/integrations/vectorstores/azure_cosmos_db.ipynb
+++ b/docs/docs/integrations/vectorstores/azure_cosmos_db.ipynb
@ -3,11 +3,9 @@
  {
   "cell_type": "markdown",
   "id": "245c0aa70db77606",
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
   "source": [
-    "# Azure Cosmos DB\n",
+    "# Azure Cosmos DB Mongo vCore\n",
    "\n",
    "This notebook shows you how to leverage this integrated [vector database](https://learn.microsoft.com/en-us/azure/cosmos-db/vector-database) to store documents in collections, create indicies and perform vector search queries using approximate nearest neighbor algorithms such as COS (cosine distance), L2 (Euclidean distance), and IP (inner product) to locate documents close to the query vectors. \n",
    "    \n",
@ -22,9 +20,7 @@
  {
   "cell_type": "markdown",
   "id": "8c493e205ce1dda5",
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
   "source": []
  },
  {
@ -35,8 +31,7 @@
    "ExecuteTime": {
     "end_time": "2024-02-08T18:25:05.278480Z",
     "start_time": "2024-02-08T18:24:51.560677Z"
-    },
-    "collapsed": false
+    }
   },
   "outputs": [
    {
@ -62,8 +57,7 @@
    "ExecuteTime": {
     "end_time": "2024-02-08T18:25:56.926147Z",
     "start_time": "2024-02-08T18:25:56.900087Z"
-    },
-    "collapsed": false
+    }
   },
   "outputs": [],
   "source": [
@ -78,9 +72,7 @@
  {
   "cell_type": "markdown",
   "id": "f2e66b097c6ce2e3",
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
   "source": [
    "We want to use `OpenAIEmbeddings` so we need to set up our Azure OpenAI API Key alongside other environment variables. "
   ]
@ -93,8 +85,7 @@
    "ExecuteTime": {
     "end_time": "2024-02-08T18:26:06.558294Z",
     "start_time": "2024-02-08T18:26:06.550008Z"
-    },
-    "collapsed": false
+    }
   },
   "outputs": [],
   "source": [
@ -114,9 +105,7 @@
  {
   "cell_type": "markdown",
   "id": "ebaa28c6e2b35063",
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
   "source": [
    "Now, we need to load the documents into the collection, create the index and then run our queries against the index to retrieve matches.\n",
    "\n",
@ -131,8 +120,7 @@
    "ExecuteTime": {
     "end_time": "2024-02-08T18:27:00.782280Z",
     "start_time": "2024-02-08T18:26:47.339151Z"
-    },
-    "collapsed": false
+    }
   },
   "outputs": [],
   "source": [
@ -172,8 +160,7 @@
    "ExecuteTime": {
     "end_time": "2024-02-08T18:31:13.486173Z",
     "start_time": "2024-02-08T18:30:54.175890Z"
-    },
-    "collapsed": false
+    }
   },
   "outputs": [
    {
@ -236,8 +223,7 @@
    "ExecuteTime": {
     "end_time": "2024-02-08T18:31:47.468902Z",
     "start_time": "2024-02-08T18:31:46.053602Z"
-    },
-    "collapsed": false
+    }
   },
   "outputs": [],
   "source": [
@ -254,8 +240,7 @@
    "ExecuteTime": {
     "end_time": "2024-02-08T18:31:50.982598Z",
     "start_time": "2024-02-08T18:31:50.977605Z"
-    },
-    "collapsed": false
+    }
   },
   "outputs": [
    {
@ -279,9 +264,7 @@
  {
   "cell_type": "markdown",
   "id": "37e4df8c7d7db851",
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
   "source": [
    "Once the documents have been loaded and the index has been created, you can now instantiate the vector store directly and run queries against the index"
   ]
@ -294,8 +277,7 @@
    "ExecuteTime": {
     "end_time": "2024-02-08T18:32:14.299599Z",
     "start_time": "2024-02-08T18:32:12.923464Z"
-    },
-    "collapsed": false
+    }
   },
   "outputs": [
    {
@ -332,8 +314,7 @@
    "ExecuteTime": {
     "end_time": "2024-02-08T18:32:24.021434Z",
     "start_time": "2024-02-08T18:32:22.867658Z"
-    },
-    "collapsed": false
+    }
   },
   "outputs": [
    {
@ -366,30 +347,28 @@
   "cell_type": "code",
   "execution_count": null,
   "id": "b63c73c7e905001c",
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
-    "version": 2
+    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.6"
+   "pygments_lexer": "ipython3",
+   "version": "3.11.4"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/vectorstores/azure_cosmos_db_no_sql.ipynb
+++ b/docs/docs/integrations/vectorstores/azure_cosmos_db_no_sql.ipynb
--- a/libs/community/langchain_community/utilities/rememberizer.py
+++ b/libs/community/langchain_community/utilities/rememberizer.py
@ -1,5 +1,5 @@
 """Wrapper for Rememberizer APIs."""
-from typing import Dict, List, Optional
+from typing import Dict, List, Optional, cast

 import requests
 from langchain_core.documents import Document
@ -26,7 +26,9 @@ class RememberizerAPIWrapper(BaseModel):
    def search(self, query: str) -> dict:
        """Search for a query in the Rememberizer API."""
        url = f"https://api.rememberizer.ai/api/v1/documents/search?q={query}&n={self.top_k_results}"
-        response = requests.get(url, headers={"x-api-key": self.rememberizer_api_key})
+        response = requests.get(
+            url, headers={"x-api-key": cast(str, self.rememberizer_api_key)}
+        )
        data = response.json()

        if response.status_code != 200:
--- a/libs/community/langchain_community/vectorstores/init.py
+++ b/libs/community/langchain_community/vectorstores/init.py
@ -55,6 +55,9 @@ if TYPE_CHECKING:
    from langchain_community.vectorstores.azure_cosmos_db import (
        AzureCosmosDBVectorSearch,
    )
+    from langchain_community.vectorstores.azure_cosmos_db_no_sql import (
+        AzureCosmosDBNoSqlVectorSearch,
+    )
    from langchain_community.vectorstores.azuresearch import (
        AzureSearch,
    )
@ -311,6 +314,7 @@ __all__ = [
    "AstraDB",
    "AtlasDB",
    "AwaDB",
+    "AzureCosmosDBNoSqlVectorSearch",
    "AzureCosmosDBVectorSearch",
    "AzureSearch",
    "BESVectorStore",
@ -412,7 +416,8 @@ _module_lookup = {
    "AstraDB": "langchain_community.vectorstores.astradb",
    "AtlasDB": "langchain_community.vectorstores.atlas",
    "AwaDB": "langchain_community.vectorstores.awadb",
-    "AzureCosmosDBVectorSearch": "langchain_community.vectorstores.azure_cosmos_db",
+    "AzureCosmosDBNoSqlVectorSearch": "langchain_community.vectorstores.azure_cosmos_db_no_sql",  # noqa: E501
+    "AzureCosmosDBVectorSearch": "langchain_community.vectorstores.azure_cosmos_db",  # noqa: E501
    "AzureSearch": "langchain_community.vectorstores.azuresearch",
    "BaiduVectorDB": "langchain_community.vectorstores.baiduvectordb",
    "BESVectorStore": "langchain_community.vectorstores.baiducloud_vector_search",
--- a/libs/community/langchain_community/vectorstores/azure_cosmos_db.py
+++ b/libs/community/langchain_community/vectorstores/azure_cosmos_db.py
@ -11,7 +11,6 @@ from typing import (
    List,
    Optional,
    Tuple,
-    TypeVar,
    Union,
 )

@ -47,8 +46,6 @@ class CosmosDBVectorSearchType(str, Enum):
    """HNSW vector index"""


-CosmosDBDocumentType = TypeVar("CosmosDBDocumentType", bound=Dict[str, Any])
-
 logger = logging.getLogger(__name__)

 DEFAULT_INSERT_BATCH_SIZE = 128
@ -64,7 +61,8 @@ class AzureCosmosDBVectorSearch(VectorStore):
    Example:
        . code-block:: python

-            from langchain_community.vectorstores import AzureCosmosDBVectorSearch
+            from langchain_community.vectorstores import
+            AzureCosmosDBVectorSearch
            from langchain_community.embeddings.openai import OpenAIEmbeddings
            from pymongo import MongoClient

@ -76,12 +74,13 @@ class AzureCosmosDBVectorSearch(VectorStore):

    def __init__(
        self,
-        collection: Collection[CosmosDBDocumentType],
+        collection: Collection,
        embedding: Embeddings,
        *,
        index_name: str = "vectorSearchIndex",
        text_key: str = "textContent",
        embedding_key: str = "vectorContent",
+        application_name: str = "LANGCHAIN_PYTHON",
    ):
        """Constructor for AzureCosmosDBVectorSearch

@ -99,6 +98,7 @@ class AzureCosmosDBVectorSearch(VectorStore):
        self._index_name = index_name
        self._text_key = text_key
        self._embedding_key = embedding_key
+        self._application_name = application_name

    @property
    def embeddings(self) -> Embeddings:
@ -122,7 +122,8 @@ class AzureCosmosDBVectorSearch(VectorStore):
        application_name: str = "LANGCHAIN_PYTHON",
        **kwargs: Any,
    ) -> AzureCosmosDBVectorSearch:
-        """Creates an Instance of AzureCosmosDBVectorSearch from a Connection String
+        """Creates an Instance of AzureCosmosDBVectorSearch
+        from a Connection String

        Args:
            connection_string: The MongoDB vCore instance connection string
@ -357,7 +358,7 @@ class AzureCosmosDBVectorSearch(VectorStore):
        texts: List[str],
        embedding: Embeddings,
        metadatas: Optional[List[dict]] = None,
-        collection: Optional[Collection[CosmosDBDocumentType]] = None,
+        collection: Optional[Collection] = None,
        **kwargs: Any,
    ) -> AzureCosmosDBVectorSearch:
        if collection is None:
@ -581,5 +582,5 @@ class AzureCosmosDBVectorSearch(VectorStore):
        )
        return docs

-    def get_collection(self) -> Collection[CosmosDBDocumentType]:
+    def get_collection(self) -> Collection:
        return self._collection
--- a/libs/community/langchain_community/vectorstores/azure_cosmos_db_no_sql.py
+++ b/libs/community/langchain_community/vectorstores/azure_cosmos_db_no_sql.py
@ -0,0 +1,337 @@
+from __future__ import annotations
+
+import uuid
+import warnings
+from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Optional, Tuple
+
+import numpy as np
+from langchain_core.documents import Document
+from langchain_core.embeddings import Embeddings
+from langchain_core.vectorstores import VectorStore
+
+from langchain_community.vectorstores.utils import maximal_marginal_relevance
+
+if TYPE_CHECKING:
+    from azure.cosmos.cosmos_client import CosmosClient
+
+
+class AzureCosmosDBNoSqlVectorSearch(VectorStore):
+    """`Azure Cosmos DB for NoSQL` vector store.
+
+    To use, you should have both:
+        - the ``azure-cosmos`` python package installed
+
+    You can read more about vector search using AzureCosmosDBNoSQL here:
+    https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/vector-search
+    """
+
+    def __init__(
+        self,
+        *,
+        cosmos_client: CosmosClient,
+        embedding: Embeddings,
+        vector_embedding_policy: Dict[str, Any],
+        indexing_policy: Dict[str, Any],
+        cosmos_container_properties: Dict[str, Any],
+        cosmos_database_properties: Dict[str, Any],
+        database_name: str = "vectorSearchDB",
+        container_name: str = "vectorSearchContainer",
+        create_container: bool = True,
+    ):
+        """
+        Constructor for AzureCosmosDBNoSqlVectorSearch
+
+        Args:
+            cosmos_client: Client used to connect to azure cosmosdb no sql account.
+            database_name: Name of the database to be created.
+            container_name: Name of the container to be created.
+            embedding: Text embedding model to use.
+            vector_embedding_policy: Vector Embedding Policy for the container.
+            indexing_policy: Indexing Policy for the container.
+            cosmos_container_properties: Container Properties for the container.
+            cosmos_database_properties: Database Properties for the container.
+        """
+        self._cosmos_client = cosmos_client
+        self._database_name = database_name
+        self._container_name = container_name
+        self._embedding = embedding
+        self._vector_embedding_policy = vector_embedding_policy
+        self._indexing_policy = indexing_policy
+        self._cosmos_container_properties = cosmos_container_properties
+        self._cosmos_database_properties = cosmos_database_properties
+        self._create_container = create_container
+
+        if self._create_container:
+            if (
+                indexing_policy["vectorIndexes"] is None
+                or len(indexing_policy["vectorIndexes"]) == 0
+            ):
+                raise ValueError(
+                    "vectorIndexes cannot be null or empty in the indexing_policy."
+                )
+            if (
+                vector_embedding_policy is None
+                or len(vector_embedding_policy["vectorEmbeddings"]) == 0
+            ):
+                raise ValueError(
+                    "vectorEmbeddings cannot be null "
+                    "or empty in the vector_embedding_policy."
+                )
+            if self._cosmos_container_properties["partition_key"] is None:
+                raise ValueError(
+                    "partition_key cannot be null or empty for a container."
+                )
+
+        # Create the database if it already doesn't exist
+        self._database = self._cosmos_client.create_database_if_not_exists(
+            id=self._database_name,
+            offer_throughput=self._cosmos_database_properties.get("offer_throughput"),
+            session_token=self._cosmos_database_properties.get("session_token"),
+            initial_headers=self._cosmos_database_properties.get("initial_headers"),
+            etag=self._cosmos_database_properties.get("etag"),
+            match_condition=self._cosmos_database_properties.get("match_condition"),
+        )
+
+        # Create the collection if it already doesn't exist
+        self._container = self._database.create_container_if_not_exists(
+            id=self._container_name,
+            partition_key=self._cosmos_container_properties["partition_key"],
+            indexing_policy=self._indexing_policy,
+            default_ttl=self._cosmos_container_properties.get("default_ttl"),
+            offer_throughput=self._cosmos_container_properties.get("offer_throughput"),
+            unique_key_policy=self._cosmos_container_properties.get(
+                "unique_key_policy"
+            ),
+            conflict_resolution_policy=self._cosmos_container_properties.get(
+                "conflict_resolution_policy"
+            ),
+            analytical_storage_ttl=self._cosmos_container_properties.get(
+                "analytical_storage_ttl"
+            ),
+            computed_properties=self._cosmos_container_properties.get(
+                "computed_properties"
+            ),
+            etag=self._cosmos_container_properties.get("etag"),
+            match_condition=self._cosmos_container_properties.get("match_condition"),
+            session_token=self._cosmos_container_properties.get("session_token"),
+            initial_headers=self._cosmos_container_properties.get("initial_headers"),
+            vector_embedding_policy=self._vector_embedding_policy,
+        )
+
+        self._embedding_key = self._vector_embedding_policy["vectorEmbeddings"][0][
+            "path"
+        ][1:]
+
+    def add_texts(
+        self,
+        texts: Iterable[str],
+        metadatas: Optional[List[dict]] = None,
+        **kwargs: Any,
+    ) -> List[str]:
+        """Run more texts through the embeddings and add to the vectorstore.
+
+        Args:
+            texts: Iterable of strings to add to the vectorstore.
+            metadatas: Optional list of metadatas associated with the texts.
+
+        Returns:
+            List of ids from adding the texts into the vectorstore.
+        """
+        _metadatas = list(metadatas if metadatas is not None else ({} for _ in texts))
+
+        return self._insert_texts(list(texts), _metadatas)
+
+    def _insert_texts(
+        self, texts: List[str], metadatas: List[Dict[str, Any]]
+    ) -> List[str]:
+        """Used to Load Documents into the collection
+
+        Args:
+            texts: The list of documents strings to load
+            metadatas: The list of metadata objects associated with each document
+
+        Returns:
+            List of ids from adding the texts into the vectorstore.
+        """
+        # If the texts is empty, throw an error
+        if not texts:
+            raise Exception("Texts can not be null or empty")
+
+        # Embed and create the documents
+        embeddings = self._embedding.embed_documents(texts)
+        text_key = "text"
+
+        to_insert = [
+            {"id": str(uuid.uuid4()), text_key: t, self._embedding_key: embedding, **m}
+            for t, m, embedding in zip(texts, metadatas, embeddings)
+        ]
+        # insert the documents in CosmosDB No Sql
+        doc_ids: List[str] = []
+        for item in to_insert:
+            created_doc = self._container.create_item(item)
+            doc_ids.append(created_doc["id"])
+        return doc_ids
+
+    @classmethod
+    def _from_kwargs(
+        cls,
+        embedding: Embeddings,
+        *,
+        cosmos_client: CosmosClient,
+        vector_embedding_policy: Dict[str, Any],
+        indexing_policy: Dict[str, Any],
+        cosmos_container_properties: Dict[str, Any],
+        cosmos_database_properties: Dict[str, Any],
+        database_name: str = "vectorSearchDB",
+        container_name: str = "vectorSearchContainer",
+        **kwargs: Any,
+    ) -> AzureCosmosDBNoSqlVectorSearch:
+        if kwargs:
+            warnings.warn(
+                "Method 'from_texts' of AzureCosmosDBNoSql vector "
+                "store invoked with "
+                f"unsupported arguments "
+                f"({', '.join(sorted(kwargs))}), "
+                "which will be ignored."
+            )
+
+        return cls(
+            embedding=embedding,
+            cosmos_client=cosmos_client,
+            vector_embedding_policy=vector_embedding_policy,
+            indexing_policy=indexing_policy,
+            cosmos_container_properties=cosmos_container_properties,
+            cosmos_database_properties=cosmos_database_properties,
+            database_name=database_name,
+            container_name=container_name,
+        )
+
+    @classmethod
+    def from_texts(
+        cls,
+        texts: List[str],
+        embedding: Embeddings,
+        metadatas: Optional[List[dict]] = None,
+        **kwargs: Any,
+    ) -> AzureCosmosDBNoSqlVectorSearch:
+        """Create an AzureCosmosDBNoSqlVectorSearch vectorstore from raw texts.
+
+        Args:
+            texts: the texts to insert.
+            embedding: the embedding function to use in the store.
+            metadatas: metadata dicts for the texts.
+            **kwargs: you can pass any argument that you would
+                to :meth:`~add_texts` and/or to the 'AstraDB' constructor
+                (see these methods for details). These arguments will be
+                routed to the respective methods as they are.
+
+        Returns:
+            an `AzureCosmosDBNoSqlVectorSearch` vectorstore.
+        """
+        vectorstore = AzureCosmosDBNoSqlVectorSearch._from_kwargs(embedding, **kwargs)
+        vectorstore.add_texts(
+            texts=texts,
+            metadatas=metadatas,
+        )
+        return vectorstore
+
+    def delete(self, ids: Optional[List[str]] = None, **kwargs: Any) -> Optional[bool]:
+        if ids is None:
+            raise ValueError("No document ids provided to delete.")
+
+        for document_id in ids:
+            self._container.delete_item(document_id)
+        return True
+
+    def delete_document_by_id(self, document_id: Optional[str] = None) -> None:
+        """Removes a Specific Document by id
+
+        Args:
+            document_id: The document identifier
+        """
+        if document_id is None:
+            raise ValueError("No document ids provided to delete.")
+        self._container.delete_item(document_id, partition_key=document_id)
+
+    def _similarity_search_with_score(
+        self,
+        embeddings: List[float],
+        k: int = 4,
+    ) -> List[Tuple[Document, float]]:
+        query = (
+            "SELECT TOP {} c.id, c.{}, c.text, VectorDistance(c.{}, {}) AS "
+            "SimilarityScore FROM c ORDER BY VectorDistance(c.{}, {})".format(
+                k,
+                self._embedding_key,
+                self._embedding_key,
+                embeddings,
+                self._embedding_key,
+                embeddings,
+            )
+        )
+        docs_and_scores = []
+        items = list(
+            self._container.query_items(query=query, enable_cross_partition_query=True)
+        )
+        for item in items:
+            text = item["text"]
+            score = item["SimilarityScore"]
+            docs_and_scores.append((Document(page_content=text, metadata=item), score))
+        return docs_and_scores
+
+    def similarity_search_with_score(
+        self,
+        query: str,
+        k: int = 4,
+    ) -> List[Tuple[Document, float]]:
+        embeddings = self._embedding.embed_query(query)
+        docs_and_scores = self._similarity_search_with_score(embeddings=embeddings, k=k)
+        return docs_and_scores
+
+    def similarity_search(
+        self, query: str, k: int = 4, **kwargs: Any
+    ) -> List[Document]:
+        docs_and_scores = self.similarity_search_with_score(query, k=k)
+
+        return [doc for doc, _ in docs_and_scores]
+
+    def max_marginal_relevance_search_by_vector(
+        self,
+        embedding: List[float],
+        k: int = 4,
+        fetch_k: int = 20,
+        lambda_mult: float = 0.5,
+        **kwargs: Any,
+    ) -> List[Document]:
+        # Retrieves the docs with similarity scores
+        docs = self._similarity_search_with_score(embeddings=embedding, k=fetch_k)
+
+        # Re-ranks the docs using MMR
+        mmr_doc_indexes = maximal_marginal_relevance(
+            np.array(embedding),
+            [doc.metadata[self._embedding_key] for doc, _ in docs],
+            k=k,
+            lambda_mult=lambda_mult,
+        )
+
+        mmr_docs = [docs[i][0] for i in mmr_doc_indexes]
+        return mmr_docs
+
+    def max_marginal_relevance_search(
+        self,
+        query: str,
+        k: int = 4,
+        fetch_k: int = 20,
+        lambda_mult: float = 0.5,
+        **kwargs: Any,
+    ) -> List[Document]:
+        # compute the embeddings vector from the query string
+        embeddings = self._embedding.embed_query(query)
+
+        docs = self.max_marginal_relevance_search_by_vector(
+            embeddings,
+            k=k,
+            fetch_k=fetch_k,
+            lambda_mult=lambda_mult,
+        )
+        return docs
--- a/libs/community/poetry.lock
+++ b/libs/community/poetry.lock
@ -1,4 +1,4 @@
-# This file is automatically @generated by Poetry 1.7.1 and should not be changed by hand.
+# This file is automatically @generated by Poetry 1.5.1 and should not be changed by hand.

 [[package]]
 name = "aiohttp"
@ -2116,7 +2116,7 @@ files = [

 [[package]]
 name = "langchain"
-version = "0.2.2"
+version = "0.2.3"
 description = "Building applications with LLMs through composability"
 optional = false
 python-versions = ">=3.8.1,<4.0"
@ -2125,6 +2125,7 @@ develop = true

 [package.dependencies]
 aiohttp = "^3.8.3"
+async-timeout = {version = "^4.0.0", markers = "python_version < \"3.11\""}
 langchain-core = "^0.2.0"
 langchain-text-splitters = "^0.2.0"
 langsmith = "^0.1.17"
@ -2141,7 +2142,7 @@ url = "../langchain"

 [[package]]
 name = "langchain-core"
-version = "0.2.4"
+version = "0.2.5"
 description = "Building applications with LLMs through composability"
 optional = false
 python-versions = ">=3.8.1,<4.0"
@ -2150,7 +2151,7 @@ develop = true

 [package.dependencies]
 jsonpatch = "^1.33"
-langsmith = "^0.1.66"
+langsmith = "^0.1.75"
 packaging = "^23.2"
 pydantic = ">=1,<3"
 PyYAML = ">=5.3"
@ -2178,13 +2179,13 @@ url = "../text-splitters"

 [[package]]
 name = "langsmith"
-version = "0.1.73"
+version = "0.1.75"
 description = "Client library to connect to the LangSmith LLM Tracing and Evaluation Platform."
 optional = false
 python-versions = "<4.0,>=3.8.1"
 files = [
-    {file = "langsmith-0.1.73-py3-none-any.whl", hash = "sha256:38bfcce2cfcf0b2da2e9628b903c9e768e1ce59d450e8a584514c1638c595e93"},
-    {file = "langsmith-0.1.73.tar.gz", hash = "sha256:0055471cb1fddb76ec65499716764ad0b0314affbdf33ff1f72ad5e2d6a3b224"},
+    {file = "langsmith-0.1.75-py3-none-any.whl", hash = "sha256:d08b08dd6b3fa4da170377f95123d77122ef4c52999d10fff4ae08ff70d07aed"},
+    {file = "langsmith-0.1.75.tar.gz", hash = "sha256:61274e144ea94c297dd78ce03e6dfae18459fe9bd8ab5094d61a0c4816561279"},
 ]

 [package.dependencies]
@ -3022,7 +3023,7 @@ files = [
 [package.dependencies]
 numpy = [
    {version = ">=1.20.3", markers = "python_version < \"3.10\""},
-    {version = ">=1.21.0", markers = "python_version >= \"3.10\" and python_version < \"3.11\""},
+    {version = ">=1.21.0", markers = "python_version >= \"3.10\""},
    {version = ">=1.23.2", markers = "python_version >= \"3.11\""},
 ]
 python-dateutil = ">=2.8.2"
@ -4520,7 +4521,7 @@ files = [
 ]

 [package.dependencies]
-greenlet = {version = "!=0.4.17", markers = "platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\""}
+greenlet = {version = "!=0.4.17", markers = "platform_machine == \"win32\" or platform_machine == \"WIN32\" or platform_machine == \"AMD64\" or platform_machine == \"amd64\" or platform_machine == \"x86_64\" or platform_machine == \"ppc64le\" or platform_machine == \"aarch64\""}
 typing-extensions = ">=4.6.0"

 [package.extras]
@ -5098,20 +5099,6 @@ files = [
 [package.dependencies]
 types-urllib3 = "*"

-[[package]]
-name = "types-requests"
-version = "2.32.0.20240602"
-description = "Typing stubs for requests"
-optional = false
-python-versions = ">=3.8"
-files = [
-    {file = "types-requests-2.32.0.20240602.tar.gz", hash = "sha256:3f98d7bbd0dd94ebd10ff43a7fbe20c3b8528acace6d8efafef0b6a184793f06"},
-    {file = "types_requests-2.32.0.20240602-py3-none-any.whl", hash = "sha256:ed3946063ea9fbc6b5fc0c44fa279188bae42d582cb63760be6cb4b9d06c3de8"},
-]
-
-[package.dependencies]
-urllib3 = ">=2"
-
 [[package]]
 name = "types-setuptools"
 version = "70.0.0.20240524"
@ -5212,23 +5199,6 @@ brotli = ["brotli (==1.0.9)", "brotli (>=1.0.9)", "brotlicffi (>=0.8.0)", "brotl
 secure = ["certifi", "cryptography (>=1.3.4)", "idna (>=2.0.0)", "ipaddress", "pyOpenSSL (>=0.14)", "urllib3-secure-extra"]
 socks = ["PySocks (>=1.5.6,!=1.5.7,<2.0)"]

-[[package]]
-name = "urllib3"
-version = "2.2.1"
-description = "HTTP library with thread-safe connection pooling, file post, and more."
-optional = false
-python-versions = ">=3.8"
-files = [
-    {file = "urllib3-2.2.1-py3-none-any.whl", hash = "sha256:450b20ec296a467077128bff42b73080516e71b56ff59a60a02bef2232c4fa9d"},
-    {file = "urllib3-2.2.1.tar.gz", hash = "sha256:d0570876c61ab9e520d776c38acbbb5b05a776d3f9ff98a5c8fd5162a444cf19"},
-]
-
-[package.extras]
-brotli = ["brotli (>=1.0.9)", "brotlicffi (>=0.8.0)"]
-h2 = ["h2 (>=4,<5)"]
-socks = ["pysocks (>=1.5.6,!=1.5.7,<2.0)"]
-zstd = ["zstandard (>=0.18.0)"]
-
 [[package]]
 name = "vcrpy"
 version = "6.0.1"
@ -5651,4 +5621,4 @@ test = ["big-O", "importlib-resources", "jaraco.functools", "jaraco.itertools",
 [metadata]
 lock-version = "2.0"
 python-versions = ">=3.8.1,<4.0"
-content-hash = "21ead159a299fcc5cc0a9038ddcee5b4f355c893f23ef80456b72941ad3122fd"
+content-hash = "deaf3c8d3c816e32d176b2ccc8646e3100be1f96e497833c2a528e274b50bfef"
--- a/libs/community/pyproject.toml
+++ b/libs/community/pyproject.toml
@ -20,7 +20,6 @@ tenacity = "^8.1.0"
 dataclasses-json = ">= 0.5.7, < 0.7"
 langsmith = "^0.1.0"

-
 [tool.poetry.group.test]
 optional = true

@ -59,6 +58,7 @@ optional = true
 # Instead read the following link:
 # https://python.langchain.com/docs/contributing/code#working-with-optional-dependencies
 pytest-vcr = "^1.0.2"
+vcrpy = "^6"
 wrapt = "^1.15.0"
 openai = "^1"
 python-dotenv = "^1.0.0"
@ -96,7 +96,7 @@ optional = true
 [tool.poetry.group.dev.dependencies]
 jupyter = "^1.0.0"
 setuptools = "^67.6.1"
-langchain-core = { path = "../core", develop = true }
+langchain-core = {path = "../core", develop = true}

 [tool.ruff]
 exclude = [
--- a/libs/community/tests/integration_tests/vectorstores/test_azure_cosmos_db_no_sql.py
+++ b/libs/community/tests/integration_tests/vectorstores/test_azure_cosmos_db_no_sql.py
@ -0,0 +1,155 @@
+"""Test AzureCosmosDBNoSqlVectorSearch functionality."""
+import logging
+import os
+from time import sleep
+from typing import Any
+
+import pytest
+from langchain_core.documents import Document
+
+from langchain_community.embeddings import OpenAIEmbeddings
+from langchain_community.vectorstores.azure_cosmos_db_no_sql import (
+    AzureCosmosDBNoSqlVectorSearch,
+)
+
+logging.basicConfig(level=logging.DEBUG)
+
+model_deployment = os.getenv(
+    "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada"
+)
+model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002")
+
+# Host and Key for CosmosDB No SQl
+HOST = os.environ.get("HOST")
+KEY = os.environ.get("KEY")
+
+database_name = "langchain_python_db"
+container_name = "langchain_python_container"
+
+
+@pytest.fixture()
+def cosmos_client() -> Any:
+    from azure.cosmos import CosmosClient
+
+    return CosmosClient(HOST, KEY)
+
+
+@pytest.fixture()
+def partition_key() -> Any:
+    from azure.cosmos import PartitionKey
+
+    return PartitionKey(path="/id")
+
+
+@pytest.fixture()
+def azure_openai_embeddings() -> Any:
+    openai_embeddings: OpenAIEmbeddings = OpenAIEmbeddings(
+        deployment=model_deployment, model=model_name, chunk_size=1
+    )
+    return openai_embeddings
+
+
+def safe_delete_database(cosmos_client: Any) -> None:
+    cosmos_client.delete_database(database_name)
+
+
+def get_vector_indexing_policy(embedding_type: str) -> dict:
+    return {
+        "indexingMode": "consistent",
+        "includedPaths": [{"path": "/*"}],
+        "excludedPaths": [{"path": '/"_etag"/?'}],
+        "vectorIndexes": [{"path": "/embedding", "type": embedding_type}],
+    }
+
+
+def get_vector_embedding_policy(
+    distance_function: str, data_type: str, dimensions: int
+) -> dict:
+    return {
+        "vectorEmbeddings": [
+            {
+                "path": "/embedding",
+                "dataType": data_type,
+                "dimensions": dimensions,
+                "distanceFunction": distance_function,
+            }
+        ]
+    }
+
+
+class TestAzureCosmosDBNoSqlVectorSearch:
+    def test_from_documents_cosine_distance(
+        self,
+        cosmos_client: Any,
+        partition_key: Any,
+        azure_openai_embeddings: OpenAIEmbeddings,
+    ) -> None:
+        """Test end to end construction and search."""
+        documents = [
+            Document(page_content="Dogs are tough.", metadata={"a": 1}),
+            Document(page_content="Cats have fluff.", metadata={"b": 1}),
+            Document(page_content="What is a sandwich?", metadata={"c": 1}),
+            Document(page_content="That fence is purple.", metadata={"d": 1, "e": 2}),
+        ]
+
+        store = AzureCosmosDBNoSqlVectorSearch.from_documents(
+            documents,
+            azure_openai_embeddings,
+            cosmos_client=cosmos_client,
+            database_name=database_name,
+            container_name=container_name,
+            vector_embedding_policy=get_vector_embedding_policy(
+                "cosine", "float32", 400
+            ),
+            indexing_policy=get_vector_indexing_policy("flat"),
+            cosmos_container_properties={"partition_key": partition_key},
+        )
+        sleep(1)  # waits for Cosmos DB to save contents to the collection
+
+        output = store.similarity_search("Dogs", k=2)
+
+        assert output
+        assert output[0].page_content == "Dogs are tough."
+        safe_delete_database(cosmos_client)
+
+    def test_from_texts_cosine_distance_delete_one(
+        self,
+        cosmos_client: Any,
+        partition_key: Any,
+        azure_openai_embeddings: OpenAIEmbeddings,
+    ) -> None:
+        texts = [
+            "Dogs are tough.",
+            "Cats have fluff.",
+            "What is a sandwich?",
+            "That fence is purple.",
+        ]
+        metadatas = [{"a": 1}, {"b": 1}, {"c": 1}, {"d": 1, "e": 2}]
+
+        store = AzureCosmosDBNoSqlVectorSearch.from_texts(
+            texts,
+            azure_openai_embeddings,
+            metadatas,
+            cosmos_client=cosmos_client,
+            database_name=database_name,
+            container_name=container_name,
+            vector_embedding_policy=get_vector_embedding_policy(
+                "cosine", "float32", 400
+            ),
+            indexing_policy=get_vector_indexing_policy("flat"),
+            cosmos_container_properties={"partition_key": partition_key},
+        )
+        sleep(1)  # waits for Cosmos DB to save contents to the collection
+
+        output = store.similarity_search("Dogs", k=1)
+        assert output
+        assert output[0].page_content == "Dogs are tough."
+
+        # delete one document
+        store.delete_document_by_id(str(output[0].metadata["id"]))
+        sleep(2)
+
+        output2 = store.similarity_search("Dogs", k=1)
+        assert output2
+        assert output2[0].page_content != "Dogs are tough."
+        safe_delete_database(cosmos_client)
--- a/libs/community/tests/unit_tests/imports/test_langchain_proxy_imports.py
+++ b/libs/community/tests/unit_tests/imports/test_langchain_proxy_imports.py
@ -20,5 +20,6 @@ def test_vectorstores() -> None:
            "AlibabaCloudOpenSearchSettings",
            "ClickhouseSettings",
            "MyScaleSettings",
+            "AzureCosmosDBVectorSearch",
        ]:
            assert issubclass(getattr(vectorstores, cls), VectorStore)
--- a/libs/community/tests/unit_tests/vectorstores/test_imports.py
+++ b/libs/community/tests/unit_tests/vectorstores/test_imports.py
@ -13,6 +13,7 @@ EXPECTED_ALL = [
    "AstraDB",
    "AtlasDB",
    "AwaDB",
+    "AzureCosmosDBNoSqlVectorSearch",
    "AzureCosmosDBVectorSearch",
    "AzureSearch",
    "BESVectorStore",
--- a/libs/community/tests/unit_tests/vectorstores/test_indexing_docs.py
+++ b/libs/community/tests/unit_tests/vectorstores/test_indexing_docs.py
@ -50,6 +50,7 @@ def test_compatible_vectorstore_documentation() -> None:
        "AnalyticDB",
        "AstraDB",
        "AzureCosmosDBVectorSearch",
+        "AzureCosmosDBNoSqlVectorSearch",
        "AzureSearch",
        "AwaDB",
        "Bagel",