mirror of https://github.com/hwchase17/langchain
Add back in clickhouse mongo vecstore notebooks (#6949)
parent
73831ef3d8
commit
f780678910
@ -0,0 +1,169 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "683953b3",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# MongoDB Atlas Vector Search\n",
|
||||||
|
"\n",
|
||||||
|
">[MongoDB Atlas](https://www.mongodb.com/docs/atlas/) is a document database managed in the cloud. It also enables Lucene and its vector search feature.\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook shows how to use the functionality related to the `MongoDB Atlas Vector Search` feature where you can store your embeddings in MongoDB documents and create a Lucene vector index to perform a KNN search.\n",
|
||||||
|
"\n",
|
||||||
|
"It uses the [knnBeta Operator](https://www.mongodb.com/docs/atlas/atlas-search/knn-beta) available in MongoDB Atlas Search. This feature is in early access and available only for evaluation purposes, to validate functionality, and to gather feedback from a small closed group of early access users. It is not recommended for production deployments as we may introduce breaking changes.\n",
|
||||||
|
"\n",
|
||||||
|
"To use MongoDB Atlas, you must have first deployed a cluster. Free clusters are available. \n",
|
||||||
|
"Here is the MongoDB Atlas [quick start](https://www.mongodb.com/docs/atlas/getting-started/)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "b4c41cad-08ef-4f72-a545-2151e4598efe",
|
||||||
|
"metadata": {
|
||||||
|
"tags": []
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"!pip install pymongo"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "c1e38361-c1fe-4ac6-86e9-c90ebaf7ae87",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import os\n",
|
||||||
|
"\n",
|
||||||
|
"MONGODB_ATLAS_URI = os.environ['MONGODB_ATLAS_URI']"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "320af802-9271-46ee-948f-d2453933d44b",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key. Make sure the environment variable `OPENAI_API_KEY` is set up before proceeding."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "1f3ecc42",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now, let's create a Lucene vector index on your cluster. In the below example, `embedding` is the name of the field that contains the embedding vector. Please refer to the [documentation](https://www.mongodb.com/docs/atlas/atlas-search/define-field-mappings-for-vector-search) to get more details on how to define an Atlas Search index.\n",
|
||||||
|
"You can name the index `langchain_demo` and create the index on the namespace `lanchain_db.langchain_col`. Finally, write the following definition in the JSON editor:\n",
|
||||||
|
"\n",
|
||||||
|
"```json\n",
|
||||||
|
"{\n",
|
||||||
|
" \"mappings\": {\n",
|
||||||
|
" \"dynamic\": true,\n",
|
||||||
|
" \"fields\": {\n",
|
||||||
|
" \"embedding\": {\n",
|
||||||
|
" \"dimensions\": 1536,\n",
|
||||||
|
" \"similarity\": \"cosine\",\n",
|
||||||
|
" \"type\": \"knnVector\"\n",
|
||||||
|
" }\n",
|
||||||
|
" }\n",
|
||||||
|
" }\n",
|
||||||
|
"}\n",
|
||||||
|
"```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"id": "aac9563e",
|
||||||
|
"metadata": {
|
||||||
|
"tags": []
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||||
|
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||||
|
"from langchain.vectorstores import MongoDBAtlasVectorSearch\n",
|
||||||
|
"from langchain.document_loaders import TextLoader"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"id": "a3c3999a",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.document_loaders import TextLoader\n",
|
||||||
|
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||||
|
"documents = loader.load()\n",
|
||||||
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||||
|
"docs = text_splitter.split_documents(documents)\n",
|
||||||
|
"\n",
|
||||||
|
"embeddings = OpenAIEmbeddings()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "6e104aee",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from pymongo import MongoClient\n",
|
||||||
|
"\n",
|
||||||
|
"# initialize MongoDB python client\n",
|
||||||
|
"client = MongoClient(MONGODB_ATLAS_CONNECTION_STRING)\n",
|
||||||
|
"\n",
|
||||||
|
"db_name = \"lanchain_db\"\n",
|
||||||
|
"collection_name = \"langchain_col\"\n",
|
||||||
|
"collection = client[db_name][collection_name]\n",
|
||||||
|
"index_name = \"langchain_demo\"\n",
|
||||||
|
"\n",
|
||||||
|
"# insert the documents in MongoDB Atlas with their embedding\n",
|
||||||
|
"docsearch = MongoDBAtlasVectorSearch.from_documents(\n",
|
||||||
|
" docs,\n",
|
||||||
|
" embeddings,\n",
|
||||||
|
" collection=collection,\n",
|
||||||
|
" index_name=index_name\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"# perform a similarity search between the embedding of the query and the embeddings of the documents\n",
|
||||||
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||||
|
"docs = docsearch.similarity_search(query)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "9c608226",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(docs[0].page_content)"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.11.3"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
Loading…
Reference in New Issue