forked from Archives/langchain
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
252 lines
6.9 KiB
Plaintext
252 lines
6.9 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a90b7557",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Time Weighted VectorStore\n",
|
|
"\n",
|
|
"This retriever uses a combination of semantic similarity and a time decay.\n",
|
|
"\n",
|
|
"The algorithm for scoring them is:\n",
|
|
"\n",
|
|
"```python\n",
|
|
"semantic_similarity + (1.0 - decay_rate) ** hours_passed\n",
|
|
"```\n",
|
|
"\n",
|
|
"Notably, `hours_passed` refers to the hours passed since the object in the retriever **was last accessed**, not since it was created. This means that frequently accessed objects remain \"fresh.\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "f22cc96b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import faiss\n",
|
|
"\n",
|
|
"from datetime import datetime, timedelta\n",
|
|
"from langchain.docstore import InMemoryDocstore\n",
|
|
"from langchain.embeddings import OpenAIEmbeddings\n",
|
|
"from langchain.retrievers import TimeWeightedVectorStoreRetriever\n",
|
|
"from langchain.schema import Document\n",
|
|
"from langchain.vectorstores import FAISS\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6af7ea6b",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Low Decay Rate\n",
|
|
"\n",
|
|
"A low `decay rate` (in this, to be extreme, we will set close to 0) means memories will be \"remembered\" for longer. A `decay rate` of 0 means memories never be forgotten, making this retriever equivalent to the vector lookup."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "c10e7696",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Define your embedding model\n",
|
|
"embeddings_model = OpenAIEmbeddings()\n",
|
|
"# Initialize the vectorstore as empty\n",
|
|
"embedding_size = 1536\n",
|
|
"index = faiss.IndexFlatL2(embedding_size)\n",
|
|
"vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})\n",
|
|
"retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_rate=.0000000000000000000000001, k=1) "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "86dbadb9",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['d7f85756-2371-4bdf-9140-052780a0f9b3']"
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"yesterday = datetime.now() - timedelta(days=1)\n",
|
|
"retriever.add_documents([Document(page_content=\"hello world\", metadata={\"last_accessed_at\": yesterday})])\n",
|
|
"retriever.add_documents([Document(page_content=\"hello foo\")])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "a580be32",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='hello world', metadata={'last_accessed_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 678341), 'created_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 279596), 'buffer_idx': 0})]"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# \"Hello World\" is returned first because it is most salient, and the decay rate is close to 0., meaning it's still recent enough\n",
|
|
"retriever.get_relevant_documents(\"hello world\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ca056896",
|
|
"metadata": {},
|
|
"source": [
|
|
"## High Decay Rate\n",
|
|
"\n",
|
|
"With a high `decay rate` (e.g., several 9's), the `recency score` quickly goes to 0! If you set this all the way to 1, `recency` is 0 for all objects, once again making this equivalent to a vector lookup.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "dc37669b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Define your embedding model\n",
|
|
"embeddings_model = OpenAIEmbeddings()\n",
|
|
"# Initialize the vectorstore as empty\n",
|
|
"embedding_size = 1536\n",
|
|
"index = faiss.IndexFlatL2(embedding_size)\n",
|
|
"vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})\n",
|
|
"retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_rate=.999, k=1) "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "fa284384",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['40011466-5bbe-4101-bfd1-e22e7f505de2']"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"yesterday = datetime.now() - timedelta(days=1)\n",
|
|
"retriever.add_documents([Document(page_content=\"hello world\", metadata={\"last_accessed_at\": yesterday})])\n",
|
|
"retriever.add_documents([Document(page_content=\"hello foo\")])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "7558f94d",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 4, 16, 22, 9, 2, 494798), 'created_at': datetime.datetime(2023, 4, 16, 22, 9, 2, 178722), 'buffer_idx': 1})]"
|
|
]
|
|
},
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# \"Hello Foo\" is returned first because \"hello world\" is mostly forgotten\n",
|
|
"retriever.get_relevant_documents(\"hello world\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "32e0131e",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Virtual Time\n",
|
|
"\n",
|
|
"Using some utils in LangChain, you can mock out the time component"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "da080d40",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.utils import mock_now\n",
|
|
"import datetime"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "7c7deff1",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"[Document(page_content='hello world', metadata={'last_accessed_at': MockDateTime(2011, 2, 3, 10, 11), 'created_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 279596), 'buffer_idx': 0})]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Notice the last access time is that date time\n",
|
|
"with mock_now(datetime.datetime(2011, 2, 3, 10, 11)):\n",
|
|
" print(retriever.get_relevant_documents(\"hello world\"))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c78d367d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|