community[minor]: Add KDBAI vector store (#12797)

Addition of KDBAI vector store (https://kdb.ai).

Dependencies: `kdbai_client` v0.1.2 Python package.

Sample notebook: `docs/docs/integrations/vectorstores/kdbai.ipynb`

Tag maintainer: @bu2kx
Twitter handle: @kxsystems
pull/16568/head
bu2kx 8 months ago committed by GitHub
parent 4ec3fe4680
commit ff3163297b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -0,0 +1,24 @@
# KDB.AI
>[KDB.AI](https://kdb.ai) is a powerful knowledge-based vector database and search engine that allows you to build scalable, reliable AI applications, using real-time data, by providing advanced search, recommendation and personalization.
## Installation and Setup
Install the Python SDK:
```bash
pip install kdbai-client
```
## Vector store
There exists a wrapper around KDB.AI indexes, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
```python
from langchain_community.vectorstores import KDBAI
```
For a more detailed walkthrough of the KDB.AI vectorstore, see [this notebook](/docs/integrations/vectorstores/kdbai)

@ -0,0 +1,510 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "08b3f3a3-7542-4d39-a9a1-f66e50ec3c0f",
"metadata": {},
"source": [
"# KDB.AI\n",
"\n",
"> [KDB.AI](https://kdb.ai/) is a powerful knowledge-based vector database and search engine that allows you to build scalable, reliable AI applications, using real-time data, by providing advanced search, recommendation and personalization.\n",
"\n",
"[This example](https://github.com/KxSystems/kdbai-samples/blob/main/document_search/document_search.ipynb) demonstrates how to use KDB.AI to run semantic search on unstructured text documents.\n",
"\n",
"To access your end point and API keys, [sign up to KDB.AI here](https://kdb.ai/get-started/).\n",
"\n",
"To set up your development environment, follow the instructions on the [KDB.AI pre-requisites page](https://code.kx.com/kdbai/pre-requisites.html).\n",
"\n",
"The following examples demonstrate some of the ways you can interact with KDB.AI through LangChain.\n",
"\n",
"## Import required packages"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2704194d-c42d-463d-b162-fb95262e052c",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import time\n",
"from getpass import getpass\n",
"\n",
"import kdbai_client as kdbai\n",
"import pandas as pd\n",
"import requests\n",
"from langchain.chains import RetrievalQA\n",
"from langchain.document_loaders import PyPDFLoader\n",
"from langchain_community.vectorstores import KDBAI\n",
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "04848fcf-e128-4d63-af6c-b3991531d62e",
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"KDB.AI endpoint: https://ui.qa.cld.kx.com/instance/pcnvlmi860\n",
"KDB.AI API key: ········\n",
"OpenAI API Key: ········\n"
]
}
],
"source": [
"KDBAI_ENDPOINT = input(\"KDB.AI endpoint: \")\n",
"KDBAI_API_KEY = getpass(\"KDB.AI API key: \")\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass(\"OpenAI API Key: \")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d08a1468-6bff-4a65-8b4a-9835cfa997ad",
"metadata": {},
"outputs": [],
"source": [
"TEMP = 0.0\n",
"K = 3"
]
},
{
"cell_type": "markdown",
"id": "63a111d8-2422-4d33-85c0-bc95d25e330a",
"metadata": {},
"source": [
"## Create a KBD.AI Session"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "9ffe4fee-2dc3-4943-917b-28adc3a69472",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Create a KDB.AI session...\n"
]
}
],
"source": [
"print(\"Create a KDB.AI session...\")\n",
"session = kdbai.Session(endpoint=KDBAI_ENDPOINT, api_key=KDBAI_API_KEY)"
]
},
{
"cell_type": "markdown",
"id": "a2ea7e87-f65c-43d9-bc67-be7bda86def2",
"metadata": {},
"source": [
"## Create a table"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "da27f31c-890e-46c0-8e01-1b8474ee3a70",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Create table \"documents\"...\n"
]
}
],
"source": [
"print('Create table \"documents\"...')\n",
"schema = {\n",
" \"columns\": [\n",
" {\"name\": \"id\", \"pytype\": \"str\"},\n",
" {\"name\": \"text\", \"pytype\": \"bytes\"},\n",
" {\n",
" \"name\": \"embeddings\",\n",
" \"pytype\": \"float32\",\n",
" \"vectorIndex\": {\"dims\": 1536, \"metric\": \"L2\", \"type\": \"hnsw\"},\n",
" },\n",
" {\"name\": \"tag\", \"pytype\": \"str\"},\n",
" {\"name\": \"title\", \"pytype\": \"bytes\"},\n",
" ]\n",
"}\n",
"table = session.create_table(\"documents\", schema)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "930ba64a-1cf9-4892-9335-8745c830497c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 44.1 ms, sys: 6.04 ms, total: 50.2 ms\n",
"Wall time: 213 ms\n"
]
},
{
"data": {
"text/plain": [
"562978"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"URL = 'https://www.conseil-constitutionnel.fr/node/3850/pdf'\n",
"PDF = 'Déclaration_des_droits_de_l_homme_et_du_citoyen.pdf'\n",
"open(PDF, 'wb').write(requests.get(URL).content)"
]
},
{
"cell_type": "markdown",
"id": "0f7da153-e7d4-4a4c-b044-ad7b4d893c7f",
"metadata": {},
"source": [
"## Read a PDF"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "00873e6b-f204-4dca-b82b-1c45d0b83ee5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Read a PDF...\n",
"CPU times: user 156 ms, sys: 12.5 ms, total: 169 ms\n",
"Wall time: 183 ms\n"
]
},
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"print('Read a PDF...')\n",
"loader = PyPDFLoader(PDF)\n",
"pages = loader.load_and_split()\n",
"len(pages)"
]
},
{
"cell_type": "markdown",
"id": "3536c7db-0db7-446a-b61e-149fd3c2d1d8",
"metadata": {},
"source": [
"## Create a Vector Database from PDF Text"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b06d4a96-c3d5-426b-9e22-12925b14e5e6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Create a Vector Database from PDF text...\n",
"CPU times: user 211 ms, sys: 18.4 ms, total: 229 ms\n",
"Wall time: 2.23 s\n"
]
},
{
"data": {
"text/plain": [
"['3ef27d23-47cf-419b-8fe9-5dfae9e8e895',\n",
" 'd3a9a69d-28f5-434b-b95b-135db46695c8',\n",
" 'd2069bda-c0b8-4791-b84d-0c6f84f4be34']"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"print('Create a Vector Database from PDF text...')\n",
"embeddings = OpenAIEmbeddings(model='text-embedding-ada-002')\n",
"texts = [p.page_content for p in pages]\n",
"metadata = pd.DataFrame(index=list(range(len(texts))))\n",
"metadata['tag'] = 'law'\n",
"metadata['title'] = 'Déclaration des Droits de l\\'Homme et du Citoyen de 1789'.encode('utf-8')\n",
"vectordb = KDBAI(table, embeddings)\n",
"vectordb.add_texts(texts=texts, metadatas=metadata)"
]
},
{
"cell_type": "markdown",
"id": "3b658f9a-61dd-4a88-9bcb-4651992f610d",
"metadata": {},
"source": [
"## Create LangChain Pipeline"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "6d848577-1192-4bb0-b721-37f52be5d9d0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Create LangChain Pipeline...\n",
"CPU times: user 40.8 ms, sys: 4.69 ms, total: 45.5 ms\n",
"Wall time: 44.7 ms\n"
]
}
],
"source": [
"%%time\n",
"print('Create LangChain Pipeline...')\n",
"qabot = RetrievalQA.from_chain_type(chain_type='stuff',\n",
" llm=ChatOpenAI(model='gpt-3.5-turbo-16k', temperature=TEMP), \n",
" retriever=vectordb.as_retriever(search_kwargs=dict(k=K)),\n",
" return_source_documents=True)"
]
},
{
"cell_type": "markdown",
"id": "21113a5e-d72d-4a44-9714-6b23ec95b755",
"metadata": {},
"source": [
"## Summarize the document in English"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "81668f8f-a416-4b58-93d2-8e0924ceca23",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"Summarize the document in English:\n",
"\n",
"The document is the Declaration of the Rights of Man and of the Citizen of 1789. It was written by the representatives of the French people and aims to declare the natural, inalienable, and sacred rights of every individual. These rights include freedom, property, security, and resistance to oppression. The document emphasizes the importance of equality and the principle that sovereignty resides in the nation. It also highlights the role of law in protecting individual rights and ensuring the common good. The document asserts the right to freedom of thought, expression, and religion, as long as it does not disturb public order. It emphasizes the need for a public force to guarantee the rights of all citizens and the importance of a fair and equal distribution of public contributions. The document also recognizes the right of citizens to hold public officials accountable and states that any society without the guarantee of rights and separation of powers does not have a constitution. Finally, it affirms the inviolable and sacred nature of property, stating that it can only be taken away for public necessity and with just compensation.\n",
"CPU times: user 144 ms, sys: 50.2 ms, total: 194 ms\n",
"Wall time: 4.96 s\n"
]
}
],
"source": [
"%%time\n",
"Q = 'Summarize the document in English:'\n",
"print(f'\\n\\n{Q}\\n')\n",
"print(qabot.invoke(dict(query=Q))['result'])"
]
},
{
"cell_type": "markdown",
"id": "9ce7667e-8c89-466c-8040-9ba62f3e57ec",
"metadata": {},
"source": [
"## Query the Data"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "e02a7acb-99ac-48f8-b93c-d95a8f9e87d4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"Is it a fair law and why ?\n",
"\n",
"As an AI language model, I don't have personal opinions. However, I can provide some analysis based on the given context. The text provided is an excerpt from the Declaration of the Rights of Man and of the Citizen of 1789, which is considered a foundational document in the history of human rights. It outlines the natural and inalienable rights of individuals, such as freedom, property, security, and resistance to oppression. It also emphasizes the principles of equality, the rule of law, and the separation of powers. \n",
"\n",
"Whether or not this law is considered fair is subjective and can vary depending on individual perspectives and societal norms. However, many consider the principles and rights outlined in this declaration to be fundamental and just. It is important to note that this declaration was a significant step towards establishing principles of equality and individual rights in France and has influenced subsequent human rights documents worldwide.\n",
"CPU times: user 85.1 ms, sys: 5.93 ms, total: 91.1 ms\n",
"Wall time: 5.11 s\n"
]
}
],
"source": [
"%%time\n",
"Q = 'Is it a fair law and why ?'\n",
"print(f'\\n\\n{Q}\\n')\n",
"print(qabot.invoke(dict(query=Q))['result'])"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "24dc85bd-cd35-4fb3-9d01-e00a896fd9a1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"What are the rights and duties of the man, the citizen and the society ?\n",
"\n",
"According to the Declaration of the Rights of Man and of the Citizen of 1789, the rights and duties of man, citizen, and society are as follows:\n",
"\n",
"Rights of Man:\n",
"1. Men are born and remain free and equal in rights. Social distinctions can only be based on common utility.\n",
"2. The purpose of political association is the preservation of the natural and imprescriptible rights of man, which are liberty, property, security, and resistance to oppression.\n",
"3. The principle of sovereignty resides essentially in the nation. No body or individual can exercise any authority that does not emanate expressly from it.\n",
"4. Liberty consists of being able to do anything that does not harm others. The exercise of natural rights of each man has no limits other than those that ensure the enjoyment of these same rights by other members of society. These limits can only be determined by law.\n",
"5. The law has the right to prohibit only actions harmful to society. Anything not prohibited by law cannot be prevented, and no one can be compelled to do what it does not command.\n",
"6. The law is the expression of the general will. All citizens have the right to participate personally, or through their representatives, in its formation. It must be the same for all, whether it protects or punishes. All citizens, being equal in its eyes, are equally eligible to all public dignities, places, and employments, according to their abilities, and without other distinction than that of their virtues and talents.\n",
"7. No man can be accused, arrested, or detained except in cases determined by law and according to the forms it has prescribed. Those who solicit, expedite, execute, or cause to be executed arbitrary orders must be punished. But any citizen called or seized in virtue of the law must obey instantly; he renders himself culpable by resistance.\n",
"8. The law should establish only strictly and evidently necessary penalties, and no one can be punished except in virtue of a law established and promulgated prior to the offense, and legally applied.\n",
"9. Every man being presumed innocent until he has been declared guilty, if it is judged indispensable to arrest him, any rigor that is not necessary to secure his person must be severely repressed by the law.\n",
"10. No one should be disturbed for his opinions, even religious ones, as long as their manifestation does not disturb the established public order by law.\n",
"11. The free communication of ideas and opinions is one of the most precious rights of man. Every citizen may therefore speak, write, and print freely, except to respond to the abuse of this liberty in cases determined by law.\n",
"12. The guarantee of the rights of man and of the citizen requires a public force. This force is therefore instituted for the advantage of all and not for the particular utility of those to whom it is entrusted.\n",
"13. For the maintenance of the public force and for the expenses of administration, a common contribution is necessary. It must be equally distributed among all citizens, in proportion to their abilities.\n",
"14. All citizens have the right to ascertain, by themselves or through their representatives, the necessity of the public contribution, to consent to it freely, to follow its use, and to determine its amount, basis, collection, and duration.\n",
"15. Society has the right to ask any public agent for an account of his administration.\n",
"16. Any society in which the guarantee of rights is not assured, nor the separation of powers determined, has no constitution.\n",
"17. Property being an inviolable and sacred right, no one can be deprived of it, except when public necessity, legally determined, obviously requires it, and under the condition of a just and prior indemnity.\n",
"\n",
"Duties of the Citizen:\n",
"The Declaration does not explicitly mention the duties of the citizen, but it emphasizes the importance of the law and the need for citizens to obey it. It also highlights the responsibility of citizens to participate in the formation of the law and to hold public agents accountable for their administration.\n",
"\n",
"Duties of Society:\n",
"The Declaration does not explicitly mention the duties of society, but it emphasizes the importance of guaranteeing the rights of man and of the citizen. Society is responsible for ensuring the preservation of these rights and the separation of powers, as well as providing a public force for the advantage of all.\n",
"CPU times: user 86.5 ms, sys: 5.45 ms, total: 92 ms\n",
"Wall time: 14.9 s\n"
]
}
],
"source": [
"%%time\n",
"Q = 'What are the rights and duties of the man, the citizen and the society ?'\n",
"print(f'\\n\\n{Q}\\n')\n",
"print(qabot.invoke(dict(query=Q))['result'])"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "d55b7980-a3d3-45d7-a310-99492edb098b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"Is this law practical ?\n",
"\n",
"As an AI language model, I cannot provide personal opinions or subjective judgments on whether a law is practical or not. The texts provided are excerpts from the French Constitution and the Declaration of the Rights of Man and of the Citizen of 1789. These texts outline fundamental rights and principles that form the basis of the French legal system. The practicality of a law is often a matter of interpretation and can vary depending on the context and specific circumstances. It is ultimately up to legal experts, lawmakers, and the judiciary to determine the practicality and application of these laws in specific cases.\n",
"CPU times: user 91.4 ms, sys: 5.89 ms, total: 97.3 ms\n",
"Wall time: 2.78 s\n"
]
}
],
"source": [
"%%time\n",
"Q = 'Is this law practical ?'\n",
"print(f'\\n\\n{Q}\\n')\n",
"print(qabot.invoke(dict(query=Q))['result'])"
]
},
{
"cell_type": "markdown",
"id": "5f9d0a3c-4941-4f65-b6b8-aefe4f6abd14",
"metadata": {},
"source": [
"## Clean up the Documents table"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "cdddda29-e28d-423f-b1c6-f77d39acc3dd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Clean up KDB.AI \"documents\" table and index for similarity search\n",
"# so this notebook could be played again and again\n",
"session.table(\"documents\").drop()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "23cb1359-f32c-4b47-a885-cbf3cbae5b14",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -210,6 +210,12 @@ def _import_hologres() -> Any:
return Hologres
def _import_kdbai() -> Any:
from langchain_community.vectorstores.kdbai import KDBAI
return KDBAI
def _import_lancedb() -> Any:
from langchain_community.vectorstores.lancedb import LanceDB
@ -523,6 +529,8 @@ def __getattr__(name: str) -> Any:
return _import_faiss()
elif name == "Hologres":
return _import_hologres()
elif name == "KDBAI":
return _import_kdbai()
elif name == "LanceDB":
return _import_lancedb()
elif name == "LLMRails":
@ -638,6 +646,7 @@ __all__ = [
"Epsilla",
"FAISS",
"Hologres",
"KDBAI",
"LanceDB",
"LLMRails",
"Marqo",

@ -0,0 +1,267 @@
from __future__ import annotations
import logging
import uuid
from typing import Any, Iterable, List, Optional, Tuple
from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings
from langchain_core.vectorstores import VectorStore
from langchain_community.vectorstores.utils import DistanceStrategy
logger = logging.getLogger(__name__)
class KDBAI(VectorStore):
"""`KDB.AI` vector store [https://kdb.ai](https://kdb.ai)
To use, you should have the `kdbai_client` python package installed.
Args:
table: kdbai_client.Table object to use as storage,
embedding: Any embedding function implementing
`langchain.embeddings.base.Embeddings` interface,
distance_strategy: One option from DistanceStrategy.EUCLIDEAN_DISTANCE,
DistanceStrategy.DOT_PRODUCT or DistanceStrategy.COSINE.
See the example [notebook](https://github.com/KxSystems/langchain/blob/KDB.AI/docs/docs/integrations/vectorstores/kdbai.ipynb).
"""
def __init__(
self,
table: Any,
embedding: Embeddings,
distance_strategy: Optional[
DistanceStrategy
] = DistanceStrategy.EUCLIDEAN_DISTANCE,
):
try:
import kdbai_client # noqa
except ImportError:
raise ImportError(
"Could not import kdbai_client python package. "
"Please install it with `pip install kdbai_client`."
)
self._table = table
self._embedding = embedding
self.distance_strategy = distance_strategy
@property
def embeddings(self) -> Optional[Embeddings]:
if isinstance(self._embedding, Embeddings):
return self._embedding
return None
def _embed_documents(self, texts: Iterable[str]) -> List[List[float]]:
if isinstance(self._embedding, Embeddings):
return self._embedding.embed_documents(list(texts))
return [self._embedding(t) for t in texts]
def _embed_query(self, text: str) -> List[float]:
if isinstance(self._embedding, Embeddings):
return self._embedding.embed_query(text)
return self._embedding(text)
def _insert(
self,
texts: List[str],
ids: Optional[List[str]],
metadata: Optional[Any] = None,
) -> None:
try:
import numpy as np
except ImportError:
raise ImportError(
"Could not import numpy python package. "
"Please install it with `pip install numpy`."
)
try:
import pandas as pd
except ImportError:
raise ImportError(
"Could not import pandas python package. "
"Please install it with `pip install pandas`."
)
embeds = self._embedding.embed_documents(texts)
df = pd.DataFrame()
df["id"] = ids
df["text"] = [t.encode("utf-8") for t in texts]
df["embeddings"] = [np.array(e, dtype="float32") for e in embeds]
if metadata is not None:
df = pd.concat([df, metadata], axis=1)
self._table.insert(df, warn=False)
def add_texts(
self,
texts: Iterable[str],
metadatas: Optional[List[dict]] = None,
ids: Optional[List[str]] = None,
batch_size: int = 32,
**kwargs: Any,
) -> List[str]:
"""Run more texts through the embeddings and add to the vectorstore.
Args:
texts (Iterable[str]): Texts to add to the vectorstore.
metadatas (Optional[List[dict]]): List of metadata corresponding to each
chunk of text.
ids (Optional[List[str]]): List of IDs corresponding to each chunk of text.
batch_size (Optional[int]): Size of batch of chunks of text to insert at
once.
Returns:
List[str]: List of IDs of the added texts.
"""
try:
import pandas as pd
except ImportError:
raise ImportError(
"Could not import pandas python package. "
"Please install it with `pip install pandas`."
)
texts = list(texts)
metadf: pd.DataFrame = None
if metadatas is not None:
if isinstance(metadatas, pd.DataFrame):
metadf = metadatas
else:
metadf = pd.DataFrame(metadatas)
out_ids: List[str] = []
nbatches = (len(texts) - 1) // batch_size + 1
for i in range(nbatches):
istart = i * batch_size
iend = (i + 1) * batch_size
batch = texts[istart:iend]
if ids:
batch_ids = ids[istart:iend]
else:
batch_ids = [str(uuid.uuid4()) for _ in range(len(batch))]
if metadf is not None:
batch_meta = metadf.iloc[istart:iend].reset_index(drop=True)
else:
batch_meta = None
self._insert(batch, batch_ids, batch_meta)
out_ids = out_ids + batch_ids
return out_ids
def add_documents(
self, documents: List[Document], batch_size: int = 32, **kwargs: Any
) -> List[str]:
"""Run more documents through the embeddings and add to the vectorstore.
Args:
documents (List[Document]: Documents to add to the vectorstore.
batch_size (Optional[int]): Size of batch of documents to insert at once.
Returns:
List[str]: List of IDs of the added texts.
"""
try:
import pandas as pd
except ImportError:
raise ImportError(
"Could not import pandas python package. "
"Please install it with `pip install pandas`."
)
texts = [x.page_content for x in documents]
metadata = pd.DataFrame([x.metadata for x in documents])
return self.add_texts(texts, metadata=metadata, batch_size=batch_size)
def similarity_search_with_score(
self,
query: str,
k: int = 1,
filter: Optional[List] = [],
**kwargs: Any,
) -> List[Tuple[Document, float]]:
"""Run similarity search with distance from a query string.
Args:
query (str): Query string.
k (Optional[int]): number of neighbors to retrieve.
filter (Optional[List]): KDB.AI metadata filter clause: https://code.kx.com/kdbai/use/filter.html
Returns:
List[Document]: List of similar documents.
"""
return self.similarity_search_by_vector_with_score(
self._embed_query(query), k=k, filter=filter, **kwargs
)
def similarity_search_by_vector_with_score(
self,
embedding: List[float],
*,
k: int = 1,
filter: Optional[List] = [],
**kwargs: Any,
) -> List[Tuple[Document, float]]:
"""Return pinecone documents most similar to embedding, along with scores.
Args:
embedding (List[float]): query vector.
k (Optional[int]): number of neighbors to retrieve.
filter (Optional[List]): KDB.AI metadata filter clause: https://code.kx.com/kdbai/use/filter.html
Returns:
List[Document]: List of similar documents.
"""
if "n" in kwargs:
k = kwargs.pop("n")
matches = self._table.search(vectors=[embedding], n=k, filter=filter, **kwargs)[
0
]
docs = []
for row in matches.to_dict(orient="records"):
text = row.pop("text")
score = row.pop("__nn_distance")
docs.append(
(
Document(
page_content=text,
metadata={k: v for k, v in row.items() if k != "text"},
),
score,
)
)
return docs
def similarity_search(
self,
query: str,
k: int = 1,
filter: Optional[List] = [],
**kwargs: Any,
) -> List[Document]:
"""Run similarity search from a query string.
Args:
query (str): Query string.
k (Optional[int]): number of neighbors to retrieve.
filter (Optional[List]): KDB.AI metadata filter clause: https://code.kx.com/kdbai/use/filter.html
Returns:
List[Document]: List of similar documents.
"""
docs_and_scores = self.similarity_search_with_score(
query, k=k, filter=filter, **kwargs
)
return [doc for doc, _ in docs_and_scores]
@classmethod
def from_texts(
cls: Any,
texts: List[str],
embedding: Embeddings,
metadatas: Optional[List[dict]] = None,
**kwargs: Any,
) -> Any:
"""Not implemented."""
raise Exception("Not implemented.")

@ -28,6 +28,7 @@ _EXPECTED = [
"Epsilla",
"FAISS",
"Hologres",
"KDBAI",
"LanceDB",
"Lantern",
"LLMRails",

Loading…
Cancel
Save