mirror of
https://github.com/hwchase17/langchain
synced 2024-10-31 15:20:26 +00:00
171b0b183b
Tiny PR: Since we've released version 1.0.0 of the python SDK, we no longer need to specify the pre-release version when pip installing.
241 lines
6.5 KiB
Plaintext
241 lines
6.5 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Xata\n",
|
|
"\n",
|
|
"> [Xata](https://xata.io) is a serverless data platform, based on PostgreSQL. It provides a Python SDK for interacting with your database, and a UI for managing your data.\n",
|
|
"> Xata has a native vector type, which can be added to any table, and supports similarity search. LangChain inserts vectors directly to Xata, and queries it for the nearest neighbors of a given vector, so that you can use all the LangChain Embeddings integrations with Xata."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This notebook guides you how to use Xata as a VectorStore."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup\n",
|
|
"\n",
|
|
"### Create a database to use as a vector store\n",
|
|
"\n",
|
|
"In the [Xata UI](https://app.xata.io) create a new database. You can name it whatever you want, in this notepad we'll use `langchain`.\n",
|
|
"Create a table, again you can name it anything, but we will use `vectors`. Add the following columns via the UI:\n",
|
|
"\n",
|
|
"* `content` of type \"Text\". This is used to store the `Document.pageContent` values.\n",
|
|
"* `embedding` of type \"Vector\". Use the dimension used by the model you plan to use. In this notebook we use OpenAI embeddings, which have 1536 dimensions.\n",
|
|
"* `search` of type \"Text\". This is used as a metadata column by this example.\n",
|
|
"* any other columns you want to use as metadata. They are populated from the `Document.metadata` object. For example, if in the `Document.metadata` object you have a `title` property, you can create a `title` column in the table and it will be populated.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's first install our dependencies:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"!pip install xata openai tiktoken langchain"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's load the OpenAI key to the environemnt. If you don't have one you can create an OpenAI account and create a key on this [page](https://platform.openai.com/account/api-keys)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"import getpass\n",
|
|
"\n",
|
|
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Similarly, we need to get the environment variables for Xata. You can create a new API key by visiting your [account settings](https://app.xata.io/settings). To find the database URL, go to the Settings page of the database that you have created. The database URL should look something like this: `https://demo-uni3q8.eu-west-1.xata.sh/db/langchain`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"api_key = getpass.getpass(\"Xata API key: \")\n",
|
|
"db_url = input(\"Xata database URL (copy it from your DB settings):\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
|
"from langchain.text_splitter import CharacterTextSplitter\n",
|
|
"from langchain.document_loaders import TextLoader\n",
|
|
"from langchain.vectorstores.xata import XataVectorStore\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create the Xata vector store\n",
|
|
"Let's import our test dataset:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
|
"documents = loader.load()\n",
|
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
|
"docs = text_splitter.split_documents(documents)\n",
|
|
"\n",
|
|
"embeddings = OpenAIEmbeddings()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now create the actual vector store, backed by the Xata table."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"vector_store = XataVectorStore.from_documents(docs, embeddings, api_key=api_key, db_url=db_url, table_name=\"vectors\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"After running the above command, if you go to the Xata UI, you should see the documents loaded together with their embeddings."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Similarity Search"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
|
"found_docs = vector_store.similarity_search(query)\n",
|
|
"print(found_docs)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Similarity Search with score (vector distance)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
|
"result = vector_store.similarity_search_with_score(query)\n",
|
|
"for doc, score in result:\n",
|
|
" print(f\"document={doc}, score={score}\")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|