2023-03-15 04:13:58 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PGVector\n",
"\n",
2023-04-29 02:26:50 +00:00
">[PGVector](https://github.com/pgvector/pgvector) is an open-source vector similarity search for `Postgres`\n",
"\n",
"It supports:\n",
"- exact and approximate nearest neighbor search\n",
"- L2 distance, inner product, and cosine distance\n",
"\n",
"This notebook shows how to use the Postgres vector database (`PGVector`)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See the [installation instruction](https://github.com/pgvector/pgvector)."
]
},
{
"cell_type": "code",
2023-06-03 23:47:52 +00:00
"execution_count": 60,
2023-04-29 02:26:50 +00:00
"metadata": {
"tags": []
},
2023-06-03 23:47:52 +00:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pgvector in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (0.1.8)\n",
"Requirement already satisfied: numpy in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from pgvector) (1.24.3)\n",
"Requirement already satisfied: openai in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (0.27.7)\n",
"Requirement already satisfied: requests>=2.20 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from openai) (2.28.2)\n",
"Requirement already satisfied: tqdm in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from openai) (4.65.0)\n",
"Requirement already satisfied: aiohttp in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from openai) (3.8.4)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.20->openai) (3.1.0)\n",
"Requirement already satisfied: idna<4,>=2.5 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.20->openai) (3.4)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.20->openai) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.20->openai) (2023.5.7)\n",
"Requirement already satisfied: attrs>=17.3.0 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (23.1.0)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (6.0.4)\n",
"Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (4.0.2)\n",
"Requirement already satisfied: yarl<2.0,>=1.0 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (1.9.2)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (1.3.3)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (1.3.1)\n",
"Requirement already satisfied: psycopg2-binary in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (2.9.6)\n",
"Requirement already satisfied: tiktoken in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (0.4.0)\n",
"Requirement already satisfied: regex>=2022.1.18 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from tiktoken) (2023.5.5)\n",
"Requirement already satisfied: requests>=2.26.0 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from tiktoken) (2.28.2)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (3.1.0)\n",
"Requirement already satisfied: idna<4,>=2.5 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (3.4)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (2023.5.7)\n"
]
}
],
2023-04-29 02:26:50 +00:00
"source": [
2023-06-03 23:47:52 +00:00
"# Pip install necessary package\n",
"!pip install pgvector\n",
"!pip install openai\n",
"!pip install psycopg2-binary\n",
"!pip install tiktoken"
2023-04-29 02:26:50 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
2023-03-15 04:13:58 +00:00
]
},
{
"cell_type": "code",
2023-06-03 23:47:52 +00:00
"execution_count": 19,
2023-03-15 04:13:58 +00:00
"metadata": {},
2023-06-03 23:47:52 +00:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI API Key:········\n"
]
}
],
2023-04-29 02:26:50 +00:00
"source": [
"import os\n",
"import getpass\n",
"\n",
2023-06-16 18:52:56 +00:00
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
2023-04-29 02:26:50 +00:00
]
},
{
"cell_type": "code",
2023-06-03 23:47:52 +00:00
"execution_count": 61,
2023-04-29 02:26:50 +00:00
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
2023-06-03 23:47:52 +00:00
"execution_count": 61,
2023-04-29 02:26:50 +00:00
"metadata": {},
"output_type": "execute_result"
}
],
2023-03-15 04:13:58 +00:00
"source": [
"## Loading Environment Variables\n",
"from typing import List, Tuple\n",
"from dotenv import load_dotenv\n",
2023-06-16 18:52:56 +00:00
"\n",
2023-03-15 04:13:58 +00:00
"load_dotenv()"
]
},
{
"cell_type": "code",
2023-07-07 20:27:44 +00:00
"execution_count": 1,
2023-04-29 02:26:50 +00:00
"metadata": {
"tags": []
},
2023-03-15 04:13:58 +00:00
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores.pgvector import PGVector\n",
"from langchain.document_loaders import TextLoader\n",
"from langchain.docstore.document import Document"
]
},
{
"cell_type": "code",
2023-07-07 20:27:44 +00:00
"execution_count": 2,
2023-03-15 04:13:58 +00:00
"metadata": {},
"outputs": [],
"source": [
2023-06-16 18:52:56 +00:00
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
2023-03-15 04:13:58 +00:00
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
2023-07-07 20:27:44 +00:00
"execution_count": 3,
2023-03-15 04:13:58 +00:00
"metadata": {},
"outputs": [],
"source": [
2023-07-07 20:27:44 +00:00
"# PGVector needs the connection string to the database.\n",
"CONNECTION_STRING = \"postgresql+psycopg2://harrisonchase@localhost:5432/test3\"\n",
2023-03-15 04:13:58 +00:00
"\n",
2023-07-07 20:27:44 +00:00
"# # Alternatively, you can create it from enviornment variables.\n",
2023-06-03 23:47:52 +00:00
"# import os\n",
2023-07-07 20:27:44 +00:00
"\n",
2023-06-03 23:47:52 +00:00
"# CONNECTION_STRING = PGVector.connection_string_from_db_params(\n",
"# driver=os.environ.get(\"PGVECTOR_DRIVER\", \"psycopg2\"),\n",
"# host=os.environ.get(\"PGVECTOR_HOST\", \"localhost\"),\n",
"# port=int(os.environ.get(\"PGVECTOR_PORT\", \"5432\")),\n",
2023-07-07 20:27:44 +00:00
"# database=os.environ.get(\"PGVECTOR_DATABASE\", \"postgres\"),\n",
"# user=os.environ.get(\"PGVECTOR_USER\", \"postgres\"),\n",
"# password=os.environ.get(\"PGVECTOR_PASSWORD\", \"postgres\"),\n",
Fix `make docs_build` and related scripts (#7276)
**Description: a description of the change**
Fixed `make docs_build` and related scripts which caused errors. There
are several changes.
First, I made the build of the documentation and the API Reference into
two separate commands. This is because it takes less time to build. The
commands for documents are `make docs_build`, `make docs_clean`, and
`make docs_linkcheck`. The commands for API Reference are `make
api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`.
It looked like `docs/.local_build.sh` could be used to build the
documentation, so I used that. Since `.local_build.sh` was also building
API Rerefence internally, I removed that process. `.local_build.sh` also
added some Bash options to stop in error or so. Futher more added `cd
"${SCRIPT_DIR}"` at the beginning so that the script will work no matter
which directory it is executed in.
`docs/api_reference/api_reference.rst` is removed, because which is
generated by `docs/api_reference/create_api_rst.py`, and added it to
.gitignore.
Finally, the description of CONTRIBUTING.md was modified.
**Issue: the issue # it fixes (if applicable)**
https://github.com/hwchase17/langchain/issues/6413
**Dependencies: any dependencies required for this change**
`nbdoc` was missing in group docs so it was added. I installed it with
the `poetry add --group docs nbdoc` command. I am concerned if any
modifications are needed to poetry.lock. I would greatly appreciate it
if you could pay close attention to this file during the review.
**Tag maintainer**
- General / Misc / if you don't know who to tag: @baskaryan
If this PR needs any additional changes, I'll be happy to make them!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 02:05:14 +00:00
"# )"
2023-06-03 23:47:52 +00:00
]
},
2023-03-15 04:13:58 +00:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2023-06-27 05:55:04 +00:00
"## Similarity Search with Euclidean Distance (Default)"
2023-03-15 04:13:58 +00:00
]
},
{
"cell_type": "code",
2023-07-07 20:27:44 +00:00
"execution_count": 16,
2023-03-15 04:13:58 +00:00
"metadata": {},
"outputs": [],
"source": [
Fix `make docs_build` and related scripts (#7276)
**Description: a description of the change**
Fixed `make docs_build` and related scripts which caused errors. There
are several changes.
First, I made the build of the documentation and the API Reference into
two separate commands. This is because it takes less time to build. The
commands for documents are `make docs_build`, `make docs_clean`, and
`make docs_linkcheck`. The commands for API Reference are `make
api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`.
It looked like `docs/.local_build.sh` could be used to build the
documentation, so I used that. Since `.local_build.sh` was also building
API Rerefence internally, I removed that process. `.local_build.sh` also
added some Bash options to stop in error or so. Futher more added `cd
"${SCRIPT_DIR}"` at the beginning so that the script will work no matter
which directory it is executed in.
`docs/api_reference/api_reference.rst` is removed, because which is
generated by `docs/api_reference/create_api_rst.py`, and added it to
.gitignore.
Finally, the description of CONTRIBUTING.md was modified.
**Issue: the issue # it fixes (if applicable)**
https://github.com/hwchase17/langchain/issues/6413
**Dependencies: any dependencies required for this change**
`nbdoc` was missing in group docs so it was added. I installed it with
the `poetry add --group docs nbdoc` command. I am concerned if any
modifications are needed to poetry.lock. I would greatly appreciate it
if you could pay close attention to this file during the review.
**Tag maintainer**
- General / Misc / if you don't know who to tag: @baskaryan
If this PR needs any additional changes, I'll be happy to make them!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 02:05:14 +00:00
"# The PGVector Module will try to create a table with the name of the collection.\n",
2023-07-07 20:27:44 +00:00
"# So, make sure that the collection name is unique and the user has the permission to create a table.\n",
"\n",
"COLLECTION_NAME = \"state_of_the_union_test\"\n",
2023-03-15 04:13:58 +00:00
"\n",
"db = PGVector.from_documents(\n",
" embedding=embeddings,\n",
" documents=docs,\n",
2023-07-07 20:27:44 +00:00
" collection_name=COLLECTION_NAME,\n",
2023-03-15 04:13:58 +00:00
" connection_string=CONNECTION_STRING,\n",
2023-07-07 20:27:44 +00:00
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
2023-03-15 04:13:58 +00:00
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
2023-07-07 20:27:44 +00:00
"docs_with_score = db.similarity_search_with_score(query)"
2023-03-15 04:13:58 +00:00
]
},
{
"cell_type": "code",
2023-07-07 20:27:44 +00:00
"execution_count": 18,
2023-03-15 04:13:58 +00:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------------------------------------------\n",
2023-07-07 20:27:44 +00:00
"Score: 0.18460171628856903\n",
2023-03-15 04:13:58 +00:00
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’ re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, I’ d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’ s top legal minds, who will continue Justice Breyer’ s legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
2023-07-07 20:27:44 +00:00
"Score: 0.18460171628856903\n",
2023-03-15 04:13:58 +00:00
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’ re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, I’ d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’ s top legal minds, who will continue Justice Breyer’ s legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
2023-07-07 20:27:44 +00:00
"Score: 0.18470284560586236\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’ re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
2023-03-15 04:13:58 +00:00
"\n",
2023-07-07 20:27:44 +00:00
"Tonight, I’ d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
2023-06-03 23:47:52 +00:00
"\n",
2023-07-07 20:27:44 +00:00
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
2023-06-03 23:47:52 +00:00
"\n",
2023-07-07 20:27:44 +00:00
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’ s top legal minds, who will continue Justice Breyer’ s legacy of excellence.\n",
2023-03-15 04:13:58 +00:00
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
2023-07-07 20:27:44 +00:00
"Score: 0.21730864082247825\n",
2023-06-03 23:47:52 +00:00
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’ s been nominated, she’ s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
2023-03-15 04:13:58 +00:00
"\n",
2023-06-03 23:47:52 +00:00
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
2023-03-15 04:13:58 +00:00
"\n",
2023-06-03 23:47:52 +00:00
"We can do both. At our border, we’ ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
2023-03-15 04:13:58 +00:00
"\n",
2023-06-03 23:47:52 +00:00
"We’ ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
"\n",
"We’ re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
"\n",
"We’ re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
2023-03-15 04:13:58 +00:00
"--------------------------------------------------------------------------------\n"
]
}
],
"source": [
"for doc, score in docs_with_score:\n",
" print(\"-\" * 80)\n",
" print(\"Score: \", score)\n",
" print(doc.page_content)\n",
2023-06-16 18:52:56 +00:00
" print(\"-\" * 80)"
2023-03-15 04:13:58 +00:00
]
2023-05-12 17:04:06 +00:00
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2023-07-07 20:27:44 +00:00
"## Working with vectorstore\n",
"\n",
"Above, we created a vectorstore from scratch. However, often times we want to work with an existing vectorstore.\n",
"In order to do that, we can initialize it directly."
2023-05-12 17:04:06 +00:00
]
},
{
2023-07-07 20:27:44 +00:00
"cell_type": "code",
"execution_count": 8,
2023-05-12 17:04:06 +00:00
"metadata": {},
2023-07-07 20:27:44 +00:00
"outputs": [],
2023-05-12 17:04:06 +00:00
"source": [
2023-07-07 20:27:44 +00:00
"store = PGVector(\n",
" collection_name=COLLECTION_NAME,\n",
" connection_string=CONNECTION_STRING,\n",
" embedding_function=embeddings,\n",
Fix `make docs_build` and related scripts (#7276)
**Description: a description of the change**
Fixed `make docs_build` and related scripts which caused errors. There
are several changes.
First, I made the build of the documentation and the API Reference into
two separate commands. This is because it takes less time to build. The
commands for documents are `make docs_build`, `make docs_clean`, and
`make docs_linkcheck`. The commands for API Reference are `make
api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`.
It looked like `docs/.local_build.sh` could be used to build the
documentation, so I used that. Since `.local_build.sh` was also building
API Rerefence internally, I removed that process. `.local_build.sh` also
added some Bash options to stop in error or so. Futher more added `cd
"${SCRIPT_DIR}"` at the beginning so that the script will work no matter
which directory it is executed in.
`docs/api_reference/api_reference.rst` is removed, because which is
generated by `docs/api_reference/create_api_rst.py`, and added it to
.gitignore.
Finally, the description of CONTRIBUTING.md was modified.
**Issue: the issue # it fixes (if applicable)**
https://github.com/hwchase17/langchain/issues/6413
**Dependencies: any dependencies required for this change**
`nbdoc` was missing in group docs so it was added. I installed it with
the `poetry add --group docs nbdoc` command. I am concerned if any
modifications are needed to poetry.lock. I would greatly appreciate it
if you could pay close attention to this file during the review.
**Tag maintainer**
- General / Misc / if you don't know who to tag: @baskaryan
If this PR needs any additional changes, I'll be happy to make them!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 02:05:14 +00:00
")"
2023-05-12 17:04:06 +00:00
]
},
{
2023-07-07 20:27:44 +00:00
"cell_type": "markdown",
2023-05-12 17:04:06 +00:00
"metadata": {},
"source": [
2023-07-07 20:27:44 +00:00
"### Add documents\n",
"We can add documents to the existing vectorstore."
2023-05-12 17:04:06 +00:00
]
},
{
2023-07-07 20:27:44 +00:00
"cell_type": "code",
"execution_count": 19,
2023-05-12 17:04:06 +00:00
"metadata": {},
2023-07-07 20:27:44 +00:00
"outputs": [
{
"data": {
"text/plain": [
"['048c2e14-1cf3-11ee-8777-e65801318980']"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
2023-05-12 17:04:06 +00:00
"source": [
2023-07-07 20:27:44 +00:00
"store.add_documents([Document(page_content=\"foo\")])"
2023-05-12 17:04:06 +00:00
]
},
{
"cell_type": "code",
2023-07-07 20:27:44 +00:00
"execution_count": 20,
2023-05-12 17:04:06 +00:00
"metadata": {},
"outputs": [],
"source": [
2023-07-07 20:27:44 +00:00
"docs_with_score = db.similarity_search_with_score(\"foo\")"
2023-05-12 17:04:06 +00:00
]
2023-06-03 23:47:52 +00:00
},
{
"cell_type": "code",
2023-07-07 20:27:44 +00:00
"execution_count": 21,
2023-06-03 23:47:52 +00:00
"metadata": {},
"outputs": [
{
2023-07-07 20:27:44 +00:00
"data": {
"text/plain": [
"(Document(page_content='foo', metadata={}), 3.3203430005457335e-09)"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
2023-06-03 23:47:52 +00:00
}
],
"source": [
2023-07-07 20:27:44 +00:00
"docs_with_score[0]"
2023-06-03 23:47:52 +00:00
]
},
{
"cell_type": "code",
2023-07-07 20:27:44 +00:00
"execution_count": 22,
2023-06-03 23:47:52 +00:00
"metadata": {},
"outputs": [
{
2023-07-07 20:27:44 +00:00
"data": {
"text/plain": [
"(Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’ s been nominated, she’ s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, we’ ve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWe’ ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWe’ re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWe’ re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
" 0.2404395365581814)"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
2023-06-03 23:47:52 +00:00
}
],
"source": [
2023-07-07 20:27:44 +00:00
"docs_with_score[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Overriding a vectorstore\n",
"\n",
"If you have an existing collection, you override it by doing `from_documents` and setting `pre_delete_collection` = True"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"db = PGVector.from_documents(\n",
" documents=docs,\n",
2023-06-03 23:47:52 +00:00
" embedding=embeddings,\n",
2023-07-07 20:27:44 +00:00
" collection_name=COLLECTION_NAME,\n",
2023-06-03 23:47:52 +00:00
" connection_string=CONNECTION_STRING,\n",
2023-07-07 20:27:44 +00:00
" pre_delete_collection=True,\n",
")"
2023-06-03 23:47:52 +00:00
]
},
{
"cell_type": "code",
2023-07-07 20:27:44 +00:00
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"docs_with_score = db.similarity_search_with_score(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
2023-06-03 23:47:52 +00:00
"metadata": {},
"outputs": [
{
2023-07-07 20:27:44 +00:00
"data": {
"text/plain": [
"(Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’ s been nominated, she’ s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, we’ ve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWe’ ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWe’ re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWe’ re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
" 0.2404115088144465)"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
2023-06-03 23:47:52 +00:00
}
],
"source": [
2023-07-07 20:27:44 +00:00
"docs_with_score[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using a VectorStore as a Retriever"
2023-06-03 23:47:52 +00:00
]
},
{
"cell_type": "code",
2023-07-07 20:27:44 +00:00
"execution_count": 26,
2023-06-03 23:47:52 +00:00
"metadata": {},
"outputs": [],
2023-07-07 20:27:44 +00:00
"source": [
"retriever = store.as_retriever()"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tags=None metadata=None vectorstore=<langchain.vectorstores.pgvector.PGVector object at 0x29f94f880> search_type='similarity' search_kwargs={}\n"
]
}
],
"source": [
"print(retriever)"
]
2023-03-15 04:13:58 +00:00
}
],
"metadata": {
"kernelspec": {
2023-03-27 02:49:46 +00:00
"display_name": "Python 3 (ipykernel)",
2023-03-15 04:13:58 +00:00
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2023-07-07 20:27:44 +00:00
"version": "3.9.1"
2023-03-27 02:49:46 +00:00
}
2023-03-15 04:13:58 +00:00
},
"nbformat": 4,
2023-04-29 02:26:50 +00:00
"nbformat_minor": 4
2023-03-15 04:13:58 +00:00
}