docs: Fix execution results of `docs/docs/modules/data_connection/indexing.ipynb` (#19112)

## Description
This PR addresses a documentation issue in the
[Indexing](https://python.langchain.com/docs/modules/data_connection/indexing)
page. Specifically, it corrects the execution results of the Jupyter
notebook under the
[Source](https://python.langchain.com/docs/modules/data_connection/indexing#source)
section, which were broken as detailed below.

## Problem
The execution results following the statement, `This should delete the
old versions of documents associated with doggy.txt source and replace
them with the new versions.`, appear to be incorrect, as described
below.

### Current Behavior
- For some reason, the `index` function fails to add the new content of
`doggy.txt`. Although it deletes the document objects associated with
the `doggy.txt` source, it does not add the objects in
`changed_doggy_docs`. Consequently, the execution result displays
`num_added: 0`.
- This unexpected behavior also impacts the results of
`vectorstore.similarity_search("dog", k=30)`, showing only the contents
of `kitty.txt`. It appears as though the contents of `doggy.txt` have
been completely removed from the index:

```
 Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
 Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
 Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]
```

### Expected Behavior
- The `index` function should successfully add the objects in
`changed_doggy_docs` after removing the old content of `doggy.txt`. The
anticipated execution result is `num_added: 2`.
- Subsequently, the modified content of `doggy.txt` should appear in the
results of `vectorstore.similarity_search("dog", k=30)` as follows:

```
[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
 Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'}),
 Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
 Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
 Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]
```

## Fix
I reran `docs/docs/modules/data_connection/indexing.ipynb` and have
included the diff in this PR.
pull/19073/head^2
Shotaro Sano 4 months ago committed by GitHub
parent ebc4a64f9e
commit d647ff1a9a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -85,7 +85,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "15f7263e-c82e-4914-874f-9699ea4de93e",
"metadata": {},
"outputs": [],
@ -192,7 +192,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 6,
"id": "67d2a5c8-f2bd-489a-b58e-2c7ba7fefe6f",
"metadata": {},
"outputs": [],
@ -724,7 +724,7 @@
{
"data": {
"text/plain": [
"{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 2}"
"{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 2}"
]
},
"execution_count": 30,
@ -751,7 +751,9 @@
{
"data": {
"text/plain": [
"[Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),\n",
"[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),\n",
" Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'}),\n",
" Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),\n",
" Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),\n",
" Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]"
]
@ -904,7 +906,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.9.12"
}
},
"nbformat": 4,

Loading…
Cancel
Save