mirror of
https://github.com/hwchase17/langchain
synced 2024-11-08 07:10:35 +00:00
1d55141c50
- new ZepVectorStore class - ZepVectorStore unit tests - ZepVectorStore demo notebook - update zep-python to ~1.0.2 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
495 lines
19 KiB
Plaintext
495 lines
19 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"# Zep\n",
|
|
"\n",
|
|
"Zep is an open source long-term memory store for LLM applications. Zep makes it easy to add relevant documents,\n",
|
|
"chat history memory & rich user data to your LLM app's prompts.\n",
|
|
"\n",
|
|
"**Note:** The `ZepVectorStore` works with `Documents` and is intended to be used as a `Retriever`.\n",
|
|
"It offers separate functionality to Zep's `ZepMemory` class, which is designed for persisting, enriching\n",
|
|
"and searching your user's chat history.\n",
|
|
"\n",
|
|
"## Why Zep's VectorStore? 🤖🚀\n",
|
|
"Zep automatically embeds documents added to the Zep Vector Store using low-latency models local to the Zep server.\n",
|
|
"The Zep client also offers async interfaces for all document operations. These two together with Zep's chat memory\n",
|
|
" functionality make Zep ideal for building conversational LLM apps where latency and performance are important.\n",
|
|
"\n",
|
|
"## Installation\n",
|
|
"Follow the [Zep Quickstart Guide](https://docs.getzep.com/deployment/quickstart/) to install and get started with Zep.\n",
|
|
"\n",
|
|
"## Usage\n",
|
|
"\n",
|
|
"You'll need your Zep API URL and optionally an API key to use the Zep VectorStore. \n",
|
|
"See the [Zep docs](https://docs.getzep.com) for more information.\n",
|
|
"\n",
|
|
"In the examples below, we're using Zep's auto-embedding feature which automatically embed documents on the Zep server \n",
|
|
"using low-latency embedding models.\n",
|
|
"\n",
|
|
"## Note\n",
|
|
"- These examples use Zep's async interfaces. Call sync interfaces by removing the `a` prefix from the method names.\n",
|
|
"- If you pass in an `Embeddings` instance Zep will use this to embed documents rather than auto-embed them.\n",
|
|
"You must also set your document collection to `isAutoEmbedded === false`. \n",
|
|
"- If you set your collection to `isAutoEmbedded === false`, you must pass in an `Embeddings` instance."
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "9eb8dfa6fdb71ef5"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"## Load or create a Collection from documents"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "9a3a11aab1412d98"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"outputs": [],
|
|
"source": [
|
|
"from uuid import uuid4\n",
|
|
"\n",
|
|
"from langchain.document_loaders import WebBaseLoader\n",
|
|
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
|
"from langchain.vectorstores import ZepVectorStore\n",
|
|
"from langchain.vectorstores.zep import CollectionConfig\n",
|
|
"\n",
|
|
"ZEP_API_URL = \"http://localhost:8000\" # this is the API url of your Zep instance\n",
|
|
"ZEP_API_KEY = \"<optional_key>\" # optional API Key for your Zep instance\n",
|
|
"collection_name = f\"babbage{uuid4().hex}\" # a unique collection name. alphanum only\n",
|
|
"\n",
|
|
"# Collection config is needed if we're creating a new Zep Collection\n",
|
|
"config = CollectionConfig(\n",
|
|
" name=collection_name,\n",
|
|
" description=\"<optional description>\",\n",
|
|
" metadata={\"optional_metadata\": \"associated with the collection\"},\n",
|
|
" is_auto_embedded=True, # we'll have Zep embed our documents using its low-latency embedder\n",
|
|
" embedding_dimensions=1536 # this should match the model you've configured Zep to use.\n",
|
|
")\n",
|
|
"\n",
|
|
"# load the document\n",
|
|
"article_url = \"https://www.gutenberg.org/cache/epub/71292/pg71292.txt\"\n",
|
|
"loader = WebBaseLoader(article_url)\n",
|
|
"documents = loader.load()\n",
|
|
"\n",
|
|
"# split it into chunks\n",
|
|
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
|
|
"docs = text_splitter.split_documents(documents)\n",
|
|
"\n",
|
|
"# Instantiate the VectorStore. Since the collection does not already exist in Zep,\n",
|
|
"# it will be created and populated with the documents we pass in.\n",
|
|
"vs = ZepVectorStore.from_documents(docs,\n",
|
|
" collection_name=collection_name,\n",
|
|
" config=config,\n",
|
|
" api_url=ZEP_API_URL,\n",
|
|
" api_key=ZEP_API_KEY\n",
|
|
" )"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-08-13T01:07:50.672390Z",
|
|
"start_time": "2023-08-13T01:07:48.777799Z"
|
|
}
|
|
},
|
|
"id": "519418421a32e4d"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Embedding status: 0/402 documents embedded\n",
|
|
"Embedding status: 0/402 documents embedded\n",
|
|
"Embedding status: 402/402 documents embedded\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# wait for the collection embedding to complete\n",
|
|
"\n",
|
|
"async def wait_for_ready(collection_name: str) -> None:\n",
|
|
" from zep_python import ZepClient\n",
|
|
" import time\n",
|
|
"\n",
|
|
" client = ZepClient(ZEP_API_URL, ZEP_API_KEY)\n",
|
|
"\n",
|
|
" while True:\n",
|
|
" c = await client.document.aget_collection(collection_name)\n",
|
|
" print(\n",
|
|
" \"Embedding status: \"\n",
|
|
" f\"{c.document_embedded_count}/{c.document_count} documents embedded\"\n",
|
|
" )\n",
|
|
" time.sleep(1)\n",
|
|
" if c.status == \"ready\":\n",
|
|
" break\n",
|
|
"\n",
|
|
"\n",
|
|
"await wait_for_ready(collection_name)"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-08-13T01:07:53.807663Z",
|
|
"start_time": "2023-08-13T01:07:50.671241Z"
|
|
}
|
|
},
|
|
"id": "201dc57b124cb6d7"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"## Simarility Search Query over the Collection"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "94ca9dfa7d0ecaa5"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Tables necessary to determine the places of the planets are not less\r\n",
|
|
"necessary than those for the sun, moon, and stars. Some notion of the\r\n",
|
|
"number and complexity of these tables may be formed, when we state that\r\n",
|
|
"the positions of the two principal planets, (and these the most\r\n",
|
|
"necessary for the navigator,) Jupiter and Saturn, require each not less\r\n",
|
|
"than one hundred and sixteen tables. Yet it is not only necessary to\r\n",
|
|
"predict the position of these bodies, but it is likewise expedient to -> 0.8998482592744614 \n",
|
|
"====\n",
|
|
"\n",
|
|
"tabulate the motions of the four satellites of Jupiter, to predict the\r\n",
|
|
"exact times at which they enter his shadow, and at which their shadows\r\n",
|
|
"cross his disc, as well as the times at which they are interposed\r\n",
|
|
"between him and the Earth, and he between them and the Earth.\r\n",
|
|
"\r\n",
|
|
"Among the extensive classes of tables here enumerated, there are several\r\n",
|
|
"which are in their nature permanent and unalterable, and would never\r\n",
|
|
"require to be recomputed, if they could once be computed with perfect -> 0.8976143854195493 \n",
|
|
"====\n",
|
|
"\n",
|
|
"the scheme of notation thus applied, immediately suggested the\r\n",
|
|
"advantages which must attend it as an instrument for expressing the\r\n",
|
|
"structure, operation, and circulation of the animal system; and we\r\n",
|
|
"entertain no doubt of its adequacy for that purpose. Not only the\r\n",
|
|
"mechanical connexion of the solid members of the bodies of men and\r\n",
|
|
"animals, but likewise the structure and operation of the softer parts,\r\n",
|
|
"including the muscles, integuments, membranes, &c. the nature, motion, -> 0.889982614061763 \n",
|
|
"====\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# query it\n",
|
|
"query = \"what is the structure of our solar system?\"\n",
|
|
"docs_scores = await vs.asimilarity_search_with_relevance_scores(query, k=3)\n",
|
|
"\n",
|
|
"# print results\n",
|
|
"for d, s in docs_scores:\n",
|
|
" print(d.page_content, \" -> \", s, \"\\n====\\n\")"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-08-13T01:07:54.195988Z",
|
|
"start_time": "2023-08-13T01:07:53.808550Z"
|
|
}
|
|
},
|
|
"id": "1998de0a96fe89c3"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"## Search over Collection Re-ranked by MMR"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "e02b61a9af0b2c80"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Tables necessary to determine the places of the planets are not less\r\n",
|
|
"necessary than those for the sun, moon, and stars. Some notion of the\r\n",
|
|
"number and complexity of these tables may be formed, when we state that\r\n",
|
|
"the positions of the two principal planets, (and these the most\r\n",
|
|
"necessary for the navigator,) Jupiter and Saturn, require each not less\r\n",
|
|
"than one hundred and sixteen tables. Yet it is not only necessary to\r\n",
|
|
"predict the position of these bodies, but it is likewise expedient to \n",
|
|
"====\n",
|
|
"\n",
|
|
"the scheme of notation thus applied, immediately suggested the\r\n",
|
|
"advantages which must attend it as an instrument for expressing the\r\n",
|
|
"structure, operation, and circulation of the animal system; and we\r\n",
|
|
"entertain no doubt of its adequacy for that purpose. Not only the\r\n",
|
|
"mechanical connexion of the solid members of the bodies of men and\r\n",
|
|
"animals, but likewise the structure and operation of the softer parts,\r\n",
|
|
"including the muscles, integuments, membranes, &c. the nature, motion, \n",
|
|
"====\n",
|
|
"\n",
|
|
"tabulate the motions of the four satellites of Jupiter, to predict the\r\n",
|
|
"exact times at which they enter his shadow, and at which their shadows\r\n",
|
|
"cross his disc, as well as the times at which they are interposed\r\n",
|
|
"between him and the Earth, and he between them and the Earth.\r\n",
|
|
"\r\n",
|
|
"Among the extensive classes of tables here enumerated, there are several\r\n",
|
|
"which are in their nature permanent and unalterable, and would never\r\n",
|
|
"require to be recomputed, if they could once be computed with perfect \n",
|
|
"====\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"query = \"what is the structure of our solar system?\"\n",
|
|
"docs = await vs.asearch(query, search_type=\"mmr\", k=3)\n",
|
|
"\n",
|
|
"for d in docs:\n",
|
|
" print(d.page_content, \"\\n====\\n\")"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-08-13T01:07:54.394873Z",
|
|
"start_time": "2023-08-13T01:07:54.180901Z"
|
|
}
|
|
},
|
|
"id": "488112da752b1d58"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"# Filter by Metadata\n",
|
|
"\n",
|
|
"Use a metadata filter to narrow down results. First, load another book: \"Adventures of Sherlock Holmes\""
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "42455e31d4ab0d68"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Embedding status: 402/1692 documents embedded\n",
|
|
"Embedding status: 402/1692 documents embedded\n",
|
|
"Embedding status: 552/1692 documents embedded\n",
|
|
"Embedding status: 702/1692 documents embedded\n",
|
|
"Embedding status: 1002/1692 documents embedded\n",
|
|
"Embedding status: 1002/1692 documents embedded\n",
|
|
"Embedding status: 1152/1692 documents embedded\n",
|
|
"Embedding status: 1302/1692 documents embedded\n",
|
|
"Embedding status: 1452/1692 documents embedded\n",
|
|
"Embedding status: 1602/1692 documents embedded\n",
|
|
"Embedding status: 1692/1692 documents embedded\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Let's add more content to the existing Collection\n",
|
|
"article_url = \"https://www.gutenberg.org/files/48320/48320-0.txt\"\n",
|
|
"loader = WebBaseLoader(article_url)\n",
|
|
"documents = loader.load()\n",
|
|
"\n",
|
|
"# split it into chunks\n",
|
|
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
|
|
"docs = text_splitter.split_documents(documents)\n",
|
|
"\n",
|
|
"await vs.aadd_documents(docs)\n",
|
|
"\n",
|
|
"await wait_for_ready(collection_name)"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-08-13T01:08:06.323569Z",
|
|
"start_time": "2023-08-13T01:07:54.381822Z"
|
|
}
|
|
},
|
|
"id": "146c8a96201c0ab9"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"### We see results from both books. Note the `source` metadata"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "5b225f3ae1e61de8"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"by that body to Mr Babbage:--'In no department of science, or of the\r\n",
|
|
"arts, does this discovery promise to be so eminently useful as in that\r\n",
|
|
"of astronomy, and its kindred sciences, with the various arts dependent\r\n",
|
|
"on them. In none are computations more operose than those which\r\n",
|
|
"astronomy in particular requires;--in none are preparatory facilities\r\n",
|
|
"more needful;--in none is error more detrimental. The practical\r\n",
|
|
"astronomer is interrupted in his pursuit, and diverted from his task of -> {'source': 'https://www.gutenberg.org/cache/epub/71292/pg71292.txt'} \n",
|
|
"====\n",
|
|
"\n",
|
|
"possess all knowledge which is likely to be useful to him in his work,\r\n",
|
|
"and this I have endeavored in my case to do. If I remember rightly, you\r\n",
|
|
"on one occasion, in the early days of our friendship, defined my limits\r\n",
|
|
"in a very precise fashion.”\r\n",
|
|
"\r\n",
|
|
"“Yes,” I answered, laughing. “It was a singular document. Philosophy,\r\n",
|
|
"astronomy, and politics were marked at zero, I remember. Botany\r\n",
|
|
"variable, geology profound as regards the mud-stains from any region -> {'source': 'https://www.gutenberg.org/files/48320/48320-0.txt'} \n",
|
|
"====\n",
|
|
"\n",
|
|
"in all its relations; but above all, with Astronomy and Navigation. So\r\n",
|
|
"important have they been considered, that in many instances large sums\r\n",
|
|
"have been appropriated by the most enlightened nations in the production\r\n",
|
|
"of them; and yet so numerous and insurmountable have been the\r\n",
|
|
"difficulties attending the attainment of this end, that after all, even\r\n",
|
|
"navigators, putting aside every other department of art and science,\r\n",
|
|
"have, until very recently, been scantily and imperfectly supplied with -> {'source': 'https://www.gutenberg.org/cache/epub/71292/pg71292.txt'} \n",
|
|
"====\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"query = \"Was he interested in astronomy?\"\n",
|
|
"docs = await vs.asearch(query, search_type=\"similarity\", k=3)\n",
|
|
"\n",
|
|
"for d in docs:\n",
|
|
" print(d.page_content, \" -> \", d.metadata, \"\\n====\\n\")"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-08-13T01:08:06.504769Z",
|
|
"start_time": "2023-08-13T01:08:06.325435Z"
|
|
}
|
|
},
|
|
"id": "53700a9cd817cde4"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"### Let's try again using a filter for only the Sherlock Holmes document."
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "7b81d7cae351a1ec"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"possess all knowledge which is likely to be useful to him in his work,\r\n",
|
|
"and this I have endeavored in my case to do. If I remember rightly, you\r\n",
|
|
"on one occasion, in the early days of our friendship, defined my limits\r\n",
|
|
"in a very precise fashion.”\r\n",
|
|
"\r\n",
|
|
"“Yes,” I answered, laughing. “It was a singular document. Philosophy,\r\n",
|
|
"astronomy, and politics were marked at zero, I remember. Botany\r\n",
|
|
"variable, geology profound as regards the mud-stains from any region -> {'source': 'https://www.gutenberg.org/files/48320/48320-0.txt'} \n",
|
|
"====\n",
|
|
"\n",
|
|
"the light shining upon his strong-set aquiline features. So he sat as I\r\n",
|
|
"dropped off to sleep, and so he sat when a sudden ejaculation caused me\r\n",
|
|
"to wake up, and I found the summer sun shining into the apartment. The\r\n",
|
|
"pipe was still between his lips, the smoke still curled upward, and the\r\n",
|
|
"room was full of a dense tobacco haze, but nothing remained of the heap\r\n",
|
|
"of shag which I had seen upon the previous night.\r\n",
|
|
"\r\n",
|
|
"“Awake, Watson?” he asked.\r\n",
|
|
"\r\n",
|
|
"“Yes.”\r\n",
|
|
"\r\n",
|
|
"“Game for a morning drive?” -> {'source': 'https://www.gutenberg.org/files/48320/48320-0.txt'} \n",
|
|
"====\n",
|
|
"\n",
|
|
"“I glanced at the books upon the table, and in spite of my ignorance\r\n",
|
|
"of German I could see that two of them were treatises on science, the\r\n",
|
|
"others being volumes of poetry. Then I walked across to the window,\r\n",
|
|
"hoping that I might catch some glimpse of the country-side, but an oak\r\n",
|
|
"shutter, heavily barred, was folded across it. It was a wonderfully\r\n",
|
|
"silent house. There was an old clock ticking loudly somewhere in the\r\n",
|
|
"passage, but otherwise everything was deadly still. A vague feeling of -> {'source': 'https://www.gutenberg.org/files/48320/48320-0.txt'} \n",
|
|
"====\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"filter = {\n",
|
|
" \"where\": {\"jsonpath\": \"$[*] ? (@.source == 'https://www.gutenberg.org/files/48320/48320-0.txt')\"},\n",
|
|
"}\n",
|
|
"\n",
|
|
"docs = await vs.asearch(query, search_type=\"similarity\", metadata=filter, k=3)\n",
|
|
"\n",
|
|
"for d in docs:\n",
|
|
" print(d.page_content, \" -> \", d.metadata, \"\\n====\\n\")"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-08-13T01:08:06.672836Z",
|
|
"start_time": "2023-08-13T01:08:06.505944Z"
|
|
}
|
|
},
|
|
"id": "8f1bdcba03979d22"
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 2
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython2",
|
|
"version": "2.7.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|