diff --git a/docs/modules/indexes/vectorstore_examples/chroma.ipynb b/docs/modules/indexes/vectorstore_examples/chroma.ipynb index 3a793a4a..ce9610a4 100644 --- a/docs/modules/indexes/vectorstore_examples/chroma.ipynb +++ b/docs/modules/indexes/vectorstore_examples/chroma.ipynb @@ -12,7 +12,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 1, "id": "aac9563e", "metadata": {}, "outputs": [], @@ -25,7 +25,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 2, "id": "a3c3999a", "metadata": {}, "outputs": [], @@ -41,7 +41,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 3, "id": "5eabdb75", "metadata": {}, "outputs": [ @@ -63,7 +63,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 4, "id": "4b172de8", "metadata": {}, "outputs": [ @@ -89,10 +89,115 @@ "print(docs[0].page_content)" ] }, + { + "cell_type": "markdown", + "id": "8061454b", + "metadata": {}, + "source": [ + "## Persistance\n", + "\n", + "The below steps cover how to persist a ChromaDB instance" + ] + }, + { + "cell_type": "markdown", + "id": "2b76db26", + "metadata": {}, + "source": [ + "### Initialize PeristedChromaDB\n", + "Create embeddings for each chunk and insert into the Chroma vector database. The persist_directory argument tells ChromaDB where to store the database when it's persisted.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "cdb86e0d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Running Chroma using direct local API.\n", + "No existing DB found in db, skipping load\n", + "No existing DB found in db, skipping load\n" + ] + } + ], + "source": [ + "# Embed and store the texts\n", + "# Supplying a persist_directory will store the embeddings on disk\n", + "persist_directory = 'db'\n", + "\n", + "embedding = OpenAIEmbeddings()\n", + "vectordb = Chroma.from_documents(documents=docs, embedding=embedding, persist_directory=persist_directory)" + ] + }, + { + "cell_type": "markdown", + "id": "f568a322", + "metadata": {}, + "source": [ + "### Persist the Database\n", + "In a notebook, we should call persist() to ensure the embeddings are written to disk. This isn't necessary in a script - the database will be automatically persisted when the client object is destroyed." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "74b08cb4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Persisting DB to disk, putting it in the save folder db\n", + "PersistentDuckDB del, about to run persist\n", + "Persisting DB to disk, putting it in the save folder db\n" + ] + } + ], + "source": [ + "vectordb.persist()\n", + "vectordb = None" + ] + }, + { + "cell_type": "markdown", + "id": "cc9ed900", + "metadata": {}, + "source": [ + "### Load the Database from disk, and create the chain\n", + "Be sure to pass the same persist_directory and embedding_function as you did when you instantiated the database. Initialize the chain we will use for question answering." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "31fecfe9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Running Chroma using direct local API.\n", + "loaded in 4 embeddings\n", + "loaded in 1 collections\n" + ] + } + ], + "source": [ + "# Now we can load the persisted database from disk, and use it as normal. \n", + "vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)\n" + ] + }, { "cell_type": "code", "execution_count": null, - "id": "a359ed74", + "id": "4dde7a0d", "metadata": {}, "outputs": [], "source": []