mirror of https://github.com/hwchase17/langchain
docs: Fix execution results of `docs/docs/modules/data_connection/indexing.ipynb` (#19112)
## Description This PR addresses a documentation issue in the [Indexing](https://python.langchain.com/docs/modules/data_connection/indexing) page. Specifically, it corrects the execution results of the Jupyter notebook under the [Source](https://python.langchain.com/docs/modules/data_connection/indexing#source) section, which were broken as detailed below. ## Problem The execution results following the statement, `This should delete the old versions of documents associated with doggy.txt source and replace them with the new versions.`, appear to be incorrect, as described below. ### Current Behavior - For some reason, the `index` function fails to add the new content of `doggy.txt`. Although it deletes the document objects associated with the `doggy.txt` source, it does not add the objects in `changed_doggy_docs`. Consequently, the execution result displays `num_added: 0`. - This unexpected behavior also impacts the results of `vectorstore.similarity_search("dog", k=30)`, showing only the contents of `kitty.txt`. It appears as though the contents of `doggy.txt` have been completely removed from the index: ``` Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}), Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}), Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})] ``` ### Expected Behavior - The `index` function should successfully add the objects in `changed_doggy_docs` after removing the old content of `doggy.txt`. The anticipated execution result is `num_added: 2`. - Subsequently, the modified content of `doggy.txt` should appear in the results of `vectorstore.similarity_search("dog", k=30)` as follows: ``` [Document(page_content='woof woof', metadata={'source': 'doggy.txt'}), Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'}), Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}), Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}), Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})] ``` ## Fix I reran `docs/docs/modules/data_connection/indexing.ipynb` and have included the diff in this PR.pull/19073/head^2
parent
ebc4a64f9e
commit
d647ff1a9a
Loading…
Reference in New Issue