From d647ff1a9aa150a7553e15953ede3dec94fa7751 Mon Sep 17 00:00:00 2001 From: Shotaro Sano Date: Sat, 16 Mar 2024 07:27:15 +0900 Subject: [PATCH] docs: Fix execution results of `docs/docs/modules/data_connection/indexing.ipynb` (#19112) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Description This PR addresses a documentation issue in the [Indexing](https://python.langchain.com/docs/modules/data_connection/indexing) page. Specifically, it corrects the execution results of the Jupyter notebook under the [Source](https://python.langchain.com/docs/modules/data_connection/indexing#source) section, which were broken as detailed below. ## Problem The execution results following the statement, `This should delete the old versions of documents associated with doggy.txt source and replace them with the new versions.`, appear to be incorrect, as described below. ### Current Behavior - For some reason, the `index` function fails to add the new content of `doggy.txt`. Although it deletes the document objects associated with the `doggy.txt` source, it does not add the objects in `changed_doggy_docs`. Consequently, the execution result displays `num_added: 0`. - This unexpected behavior also impacts the results of `vectorstore.similarity_search("dog", k=30)`, showing only the contents of `kitty.txt`. It appears as though the contents of `doggy.txt` have been completely removed from the index: ``` Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}), Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}), Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})] ``` ### Expected Behavior - The `index` function should successfully add the objects in `changed_doggy_docs` after removing the old content of `doggy.txt`. The anticipated execution result is `num_added: 2`. - Subsequently, the modified content of `doggy.txt` should appear in the results of `vectorstore.similarity_search("dog", k=30)` as follows: ``` [Document(page_content='woof woof', metadata={'source': 'doggy.txt'}), Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'}), Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}), Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}), Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})] ``` ## Fix I reran `docs/docs/modules/data_connection/indexing.ipynb` and have included the diff in this PR. --- docs/docs/modules/data_connection/indexing.ipynb | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/docs/modules/data_connection/indexing.ipynb b/docs/docs/modules/data_connection/indexing.ipynb index 93389a09d2..64de959110 100644 --- a/docs/docs/modules/data_connection/indexing.ipynb +++ b/docs/docs/modules/data_connection/indexing.ipynb @@ -85,7 +85,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 1, "id": "15f7263e-c82e-4914-874f-9699ea4de93e", "metadata": {}, "outputs": [], @@ -192,7 +192,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 6, "id": "67d2a5c8-f2bd-489a-b58e-2c7ba7fefe6f", "metadata": {}, "outputs": [], @@ -724,7 +724,7 @@ { "data": { "text/plain": [ - "{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 2}" + "{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 2}" ] }, "execution_count": 30, @@ -751,7 +751,9 @@ { "data": { "text/plain": [ - "[Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),\n", + "[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),\n", + " Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'}),\n", + " Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),\n", " Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),\n", " Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]" ] @@ -904,7 +906,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.9.12" } }, "nbformat": 4,