diff --git a/docs/extras/modules/data_connection/vectorstores/integrations/mongodb_atlas_vector_search.ipynb b/docs/extras/modules/data_connection/vectorstores/integrations/mongodb_atlas_vector_search.ipynb deleted file mode 100644 index 1cba5676f9..0000000000 --- a/docs/extras/modules/data_connection/vectorstores/integrations/mongodb_atlas_vector_search.ipynb +++ /dev/null @@ -1,169 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "683953b3", - "metadata": {}, - "source": [ - "# MongoDB Atlas Vector Search\n", - "\n", - ">[MongoDB Atlas](https://www.mongodb.com/docs/atlas/) is a document database managed in the cloud. It also enables Lucene and its vector search feature.\n", - "\n", - "This notebook shows how to use the functionality related to the `MongoDB Atlas Vector Search` feature where you can store your embeddings in MongoDB documents and create a Lucene vector index to perform a KNN search.\n", - "\n", - "It uses the [knnBeta Operator](https://www.mongodb.com/docs/atlas/atlas-search/knn-beta) available in MongoDB Atlas Search. This feature is in early access and available only for evaluation purposes, to validate functionality, and to gather feedback from a small closed group of early access users. It is not recommended for production deployments as we may introduce breaking changes.\n", - "\n", - "To use MongoDB Atlas, you must have first deployed a cluster. Free clusters are available. \n", - "Here is the MongoDB Atlas [quick start](https://www.mongodb.com/docs/atlas/getting-started/)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b4c41cad-08ef-4f72-a545-2151e4598efe", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "!pip install pymongo" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c1e38361-c1fe-4ac6-86e9-c90ebaf7ae87", - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "MONGODB_ATLAS_URI = os.environ['MONGODB_ATLAS_URI']" - ] - }, - { - "cell_type": "markdown", - "id": "320af802-9271-46ee-948f-d2453933d44b", - "metadata": {}, - "source": [ - "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key. Make sure the environment variable `OPENAI_API_KEY` is set up before proceeding." - ] - }, - { - "cell_type": "markdown", - "id": "1f3ecc42", - "metadata": {}, - "source": [ - "Now, let's create a Lucene vector index on your cluster. In the below example, `embedding` is the name of the field that contains the embedding vector. Please refer to the [documentation](https://www.mongodb.com/docs/atlas/atlas-search/define-field-mappings-for-vector-search) to get more details on how to define an Atlas Search index.\n", - "You can name the index `langchain_demo` and create the index on the namespace `lanchain_db.langchain_col`. Finally, write the following definition in the JSON editor:\n", - "\n", - "```json\n", - "{\n", - " \"mappings\": {\n", - " \"dynamic\": true,\n", - " \"fields\": {\n", - " \"embedding\": {\n", - " \"dimensions\": 1536,\n", - " \"similarity\": \"cosine\",\n", - " \"type\": \"knnVector\"\n", - " }\n", - " }\n", - " }\n", - "}\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "aac9563e", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "from langchain.embeddings.openai import OpenAIEmbeddings\n", - "from langchain.text_splitter import CharacterTextSplitter\n", - "from langchain.vectorstores import MongoDBAtlasVectorSearch\n", - "from langchain.document_loaders import TextLoader" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "a3c3999a", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.document_loaders import TextLoader\n", - "loader = TextLoader('../../../state_of_the_union.txt')\n", - "documents = loader.load()\n", - "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", - "docs = text_splitter.split_documents(documents)\n", - "\n", - "embeddings = OpenAIEmbeddings()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "6e104aee", - "metadata": {}, - "outputs": [], - "source": [ - "from pymongo import MongoClient\n", - "\n", - "# initialize MongoDB python client\n", - "client = MongoClient(MONGODB_ATLAS_CONNECTION_STRING)\n", - "\n", - "db_name = \"lanchain_db\"\n", - "collection_name = \"langchain_col\"\n", - "collection = client[db_name][collection_name]\n", - "index_name = \"langchain_demo\"\n", - "\n", - "# insert the documents in MongoDB Atlas with their embedding\n", - "docsearch = MongoDBAtlasVectorSearch.from_documents(\n", - " docs,\n", - " embeddings,\n", - " collection=collection,\n", - " index_name=index_name\n", - ")\n", - "\n", - "# perform a similarity search between the embedding of the query and the embeddings of the documents\n", - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "docs = docsearch.similarity_search(query)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "9c608226", - "metadata": {}, - "outputs": [], - "source": [ - "print(docs[0].page_content)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.3" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -}