Update neo4j vector documentation (#20455)

Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
pull/20671/head
Tomaz Bratanic 6 months ago committed by GitHub
parent 8c08cf4619
commit 3d9b26fc28
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -104,7 +104,7 @@
"\n",
"url = \"bolt://localhost:7687\"\n",
"username = \"neo4j\"\n",
"password = \"pleaseletmein\"\n",
"password = \"password\"\n",
"\n",
"# You can also use environment variables instead of directly passing named parameters\n",
"# os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
@ -128,8 +128,8 @@
"name": "stderr",
"output_type": "stream",
"text": [
"/home/tomaz/neo4j/langchain/libs/langchain/langchain/vectorstores/neo4j_vector.py:165: ExperimentalWarning: The configuration may change in the future.\n",
" self._driver.verify_connectivity()\n"
"/Users/tomazbratanic/anaconda3/lib/python3.11/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).\n",
" from pandas.core import (\n"
]
}
],
@ -161,7 +161,7 @@
"output_type": "stream",
"text": [
"--------------------------------------------------------------------------------\n",
"Score: 0.9099836349487305\n",
"Score: 0.9076285362243652\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
@ -171,14 +171,18 @@
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.9099686145782471\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"Score: 0.8912243843078613\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"We can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"Weve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
"\n",
"Were putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
"\n",
"Were securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
"--------------------------------------------------------------------------------\n"
]
}
@ -205,16 +209,7 @@
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/tomaz/neo4j/langchain/libs/langchain/langchain/vectorstores/neo4j_vector.py:165: ExperimentalWarning: The configuration may change in the future.\n",
" self._driver.verify_connectivity()\n"
]
}
],
"outputs": [],
"source": [
"index_name = \"vector\" # default index name\n",
"\n",
@ -252,23 +247,16 @@
],
"source": [
"# First we create sample data in graph\n",
"store.query(\"CREATE (p:Person {name: 'Tomaz', location:'Slovenia', hobby:'Bicycle'})\")"
"store.query(\n",
" \"CREATE (p:Person {name: 'Tomaz', location:'Slovenia', hobby:'Bicycle', age: 33})\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/tomaz/neo4j/langchain/libs/langchain/langchain/vectorstores/neo4j_vector.py:165: ExperimentalWarning: The configuration may change in the future.\n",
" self._driver.verify_connectivity()\n"
]
}
],
"outputs": [],
"source": [
"# Now we initialize from existing graph\n",
"existing_graph = Neo4jVector.from_existing_graph(\n",
@ -292,7 +280,7 @@
{
"data": {
"text/plain": [
"Document(page_content='\\nname: Tomaz\\nlocation: Slovenia', metadata={'hobby': 'Bicycle'})"
"Document(page_content='\\nname: Tomaz\\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})"
]
},
"execution_count": 12,
@ -308,8 +296,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add documents\n",
"We can add documents to the existing vectorstore."
"### Metadata filtering\n",
"\n",
"Neo4j vector store also supports metadata filtering by combining parallel runtime and exact nearest neighbor search.\n",
"_Requires Neo4j 5.18 or greater version._\n",
"\n",
"Equality filtering has the following syntax."
]
},
{
@ -320,7 +312,7 @@
{
"data": {
"text/plain": [
"['187fc53a-5dde-11ee-ad78-1f6b05bf8513']"
"[Document(page_content='\\nname: Tomaz\\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]"
]
},
"execution_count": 13,
@ -329,13 +321,139 @@
}
],
"source": [
"store.add_documents([Document(page_content=\"foo\")])"
"existing_graph.similarity_search(\n",
" \"Slovenia\",\n",
" filter={\"hobby\": \"Bicycle\", \"name\": \"Tomaz\"},\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Metadata filtering also support the following operators:\n",
"\n",
"* `$eq: Equal`\n",
"* `$ne: Not Equal`\n",
"* `$lt: Less than`\n",
"* `$lte: Less than or equal`\n",
"* `$gt: Greater than`\n",
"* `$gte: Greater than or equal`\n",
"* `$in: In a list of values`\n",
"* `$nin: Not in a list of values`\n",
"* `$between: Between two values`\n",
"* `$like: Text contains value`\n",
"* `$ilike: lowered text contains value`"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='\\nname: Tomaz\\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"existing_graph.similarity_search(\n",
" \"Slovenia\",\n",
" filter={\"hobby\": {\"$eq\": \"Bicycle\"}, \"age\": {\"$gt\": 15}},\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='\\nname: Tomaz\\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"existing_graph.similarity_search(\n",
" \"Slovenia\",\n",
" filter={\"hobby\": {\"$eq\": \"Bicycle\"}, \"age\": {\"$gt\": 15}},\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also use `OR` operator between filters"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='\\nname: Tomaz\\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"existing_graph.similarity_search(\n",
" \"Slovenia\",\n",
" filter={\"$or\": [{\"hobby\": {\"$eq\": \"Bicycle\"}}, {\"age\": {\"$gt\": 15}}]},\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add documents\n",
"We can add documents to the existing vectorstore."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['acbd18db4cc2f85cedef654fccc4a4d8']"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store.add_documents([Document(page_content=\"foo\")])"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"docs_with_score = store.similarity_search_with_score(\"foo\")"
@ -343,7 +461,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 19,
"metadata": {
"scrolled": true
},
@ -351,10 +469,10 @@
{
"data": {
"text/plain": [
"(Document(page_content='foo', metadata={}), 1.0)"
"(Document(page_content='foo'), 1.0)"
]
},
"execution_count": 15,
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
@ -367,25 +485,149 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hybrid search (vector + keyword)\n",
"## Customize response with retrieval query\n",
"\n",
"Neo4j integrates both vector and keyword indexes, which allows you to use a hybrid search approach"
"You can also customize responses by using a custom Cypher snippet that can fetch other information from the graph.\n",
"Under the hood, the final Cypher statement is constructed like so:\n",
"\n",
"```\n",
"read_query = (\n",
" \"CALL db.index.vector.queryNodes($index, $k, $embedding) \"\n",
" \"YIELD node, score \"\n",
") + retrieval_query\n",
"```\n",
"\n",
"The retrieval query must return the following three columns:\n",
"\n",
"* `text`: Union[str, Dict] = Value used to populate `page_content` of a document\n",
"* `score`: Float = Similarity score\n",
"* `metadata`: Dict = Additional metadata of a document\n",
"\n",
"Learn more in this [blog post](https://medium.com/neo4j/implementing-rag-how-to-write-a-graph-retrieval-query-in-langchain-74abf13044f2)."
]
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/tomaz/neo4j/langchain/libs/langchain/langchain/vectorstores/neo4j_vector.py:165: ExperimentalWarning: The configuration may change in the future.\n",
" self._driver.verify_connectivity()\n"
]
"data": {
"text/plain": [
"[Document(page_content='Name:Tomaz', metadata={'foo': 'bar'})]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retrieval_query = \"\"\"\n",
"RETURN \"Name:\" + node.name AS text, score, {foo:\"bar\"} AS metadata\n",
"\"\"\"\n",
"retrieval_example = Neo4jVector.from_existing_index(\n",
" OpenAIEmbeddings(),\n",
" url=url,\n",
" username=username,\n",
" password=password,\n",
" index_name=\"person_index\",\n",
" retrieval_query=retrieval_query,\n",
")\n",
"retrieval_example.similarity_search(\"Foo\", k=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is an example of passing all node properties except for `embedding` as a dictionary to `text` column,"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='name: Tomaz\\nage: 33\\nhobby: Bicycle\\n', metadata={'foo': 'bar'})]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retrieval_query = \"\"\"\n",
"RETURN node {.name, .age, .hobby} AS text, score, {foo:\"bar\"} AS metadata\n",
"\"\"\"\n",
"retrieval_example = Neo4jVector.from_existing_index(\n",
" OpenAIEmbeddings(),\n",
" url=url,\n",
" username=username,\n",
" password=password,\n",
" index_name=\"person_index\",\n",
" retrieval_query=retrieval_query,\n",
")\n",
"retrieval_example.similarity_search(\"Foo\", k=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also pass Cypher parameters to the retrieval query.\n",
"Parameters can be used for additional filtering, traversals, etc..."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='location: Slovenia\\nextra: ParamInfo\\nname: Tomaz\\nage: 33\\nhobby: Bicycle\\nembedding: None\\n', metadata={'foo': 'bar'})]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retrieval_query = \"\"\"\n",
"RETURN node {.*, embedding:Null, extra: $extra} AS text, score, {foo:\"bar\"} AS metadata\n",
"\"\"\"\n",
"retrieval_example = Neo4jVector.from_existing_index(\n",
" OpenAIEmbeddings(),\n",
" url=url,\n",
" username=username,\n",
" password=password,\n",
" index_name=\"person_index\",\n",
" retrieval_query=retrieval_query,\n",
")\n",
"retrieval_example.similarity_search(\"Foo\", k=1, params={\"extra\": \"ParamInfo\"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hybrid search (vector + keyword)\n",
"\n",
"Neo4j integrates both vector and keyword indexes, which allows you to use a hybrid search approach"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"# The Neo4jVector Module will connect to Neo4j and create a vector and keyword indices if needed.\n",
"hybrid_db = Neo4jVector.from_documents(\n",
@ -407,18 +649,9 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/tomaz/neo4j/langchain/libs/langchain/langchain/vectorstores/neo4j_vector.py:165: ExperimentalWarning: The configuration may change in the future.\n",
" self._driver.verify_connectivity()\n"
]
}
],
"outputs": [],
"source": [
"index_name = \"vector\" # default index name\n",
"keyword_index_name = \"keyword\" # default keyword index name\n",
@ -445,7 +678,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 25,
"metadata": {},
"outputs": [
{
@ -454,7 +687,7 @@
"Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../modules/state_of_the_union.txt'})"
]
},
"execution_count": 18,
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
@ -475,7 +708,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
@ -485,7 +718,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
@ -496,17 +729,25 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/tomazbratanic/anaconda3/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `__call__` was deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.\n",
" warn_deprecated(\n"
]
},
{
"data": {
"text/plain": [
"{'answer': \"The president honored Justice Stephen Breyer, who is retiring from the United States Supreme Court. He thanked him for his service and mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to continue Justice Breyer's legacy of excellence. \\n\",\n",
"{'answer': 'The president honored Justice Stephen Breyer for his service to the country.\\n',\n",
" 'sources': '../../modules/state_of_the_union.txt'}"
]
},
"execution_count": 21,
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
@ -542,7 +783,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
"version": "3.11.5"
}
},
"nbformat": 4,

Loading…
Cancel
Save