add docs for chroma persistance (#1202)

searx-query-suffixy
Harrison Chase 1 year ago committed by GitHub
parent 5bdb8dd6fe
commit 047231840d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "aac9563e",
"metadata": {},
"outputs": [],
@ -25,7 +25,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
@ -41,7 +41,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 3,
"id": "5eabdb75",
"metadata": {},
"outputs": [
@ -63,7 +63,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 4,
"id": "4b172de8",
"metadata": {},
"outputs": [
@ -89,10 +89,115 @@
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "8061454b",
"metadata": {},
"source": [
"## Persistance\n",
"\n",
"The below steps cover how to persist a ChromaDB instance"
]
},
{
"cell_type": "markdown",
"id": "2b76db26",
"metadata": {},
"source": [
"### Initialize PeristedChromaDB\n",
"Create embeddings for each chunk and insert into the Chroma vector database. The persist_directory argument tells ChromaDB where to store the database when it's persisted.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "cdb86e0d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"No existing DB found in db, skipping load\n",
"No existing DB found in db, skipping load\n"
]
}
],
"source": [
"# Embed and store the texts\n",
"# Supplying a persist_directory will store the embeddings on disk\n",
"persist_directory = 'db'\n",
"\n",
"embedding = OpenAIEmbeddings()\n",
"vectordb = Chroma.from_documents(documents=docs, embedding=embedding, persist_directory=persist_directory)"
]
},
{
"cell_type": "markdown",
"id": "f568a322",
"metadata": {},
"source": [
"### Persist the Database\n",
"In a notebook, we should call persist() to ensure the embeddings are written to disk. This isn't necessary in a script - the database will be automatically persisted when the client object is destroyed."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "74b08cb4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Persisting DB to disk, putting it in the save folder db\n",
"PersistentDuckDB del, about to run persist\n",
"Persisting DB to disk, putting it in the save folder db\n"
]
}
],
"source": [
"vectordb.persist()\n",
"vectordb = None"
]
},
{
"cell_type": "markdown",
"id": "cc9ed900",
"metadata": {},
"source": [
"### Load the Database from disk, and create the chain\n",
"Be sure to pass the same persist_directory and embedding_function as you did when you instantiated the database. Initialize the chain we will use for question answering."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "31fecfe9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"loaded in 4 embeddings\n",
"loaded in 1 collections\n"
]
}
],
"source": [
"# Now we can load the persisted database from disk, and use it as normal. \n",
"vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a359ed74",
"id": "4dde7a0d",
"metadata": {},
"outputs": [],
"source": []

Loading…
Cancel
Save