Qdrant self query (#5567)

Add self query abilities to qdrant vectorstore
searx_updates
Davis Chase 12 months ago committed by GitHub
parent 47c2ec2d0b
commit 6afb463e9b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,396 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "13afcae7",
"metadata": {},
"source": [
"# Self-querying with Qdrant\n",
"\n",
">[Qdrant](https://qdrant.tech/documentation/) (read: quadrant ) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage points - vectors with an additional payload. `Qdrant` is tailored to extended filtering support. It makes it useful \n",
"\n",
"In the notebook we'll demo the `SelfQueryRetriever` wrapped around a Qdrant vector store. "
]
},
{
"cell_type": "markdown",
"id": "68e75fb9",
"metadata": {},
"source": [
"## Creating a Qdrant vectorstore\n",
"First we'll want to create a Chroma VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
"\n",
"NOTE: The self-query retriever requires you to have `lark` installed (`pip install lark`). We also need the `qdrant-client` package."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "63a8af5b",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#!pip install lark qdrant-client"
]
},
{
"cell_type": "markdown",
"id": "83811610-7df3-4ede-b268-68a6a83ba9e2",
"metadata": {},
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "dd01b61b-7d32-4a55-85d6-b2d2d4f18840",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# import os\n",
"# import getpass\n",
"\n",
"# os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "cb4a5787",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.schema import Document\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Qdrant\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "bcbe04d9",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"docs = [\n",
" Document(page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\", metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\"}),\n",
" Document(page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\", metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2}),\n",
" Document(page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\", metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6}),\n",
" Document(page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\", metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3}),\n",
" Document(page_content=\"Toys come alive and have a blast doing so\", metadata={\"year\": 1995, \"genre\": \"animated\"}),\n",
" Document(page_content=\"Three men walk into the Zone, three men walk out of the Zone\", metadata={\"year\": 1979, \"rating\": 9.9, \"director\": \"Andrei Tarkovsky\", \"genre\": \"science fiction\", \"rating\": 9.9})\n",
"]\n",
"vectorstore = Qdrant.from_documents(\n",
" docs, \n",
" embeddings, \n",
" location=\":memory:\", # Local mode with in-memory storage only\n",
" collection_name=\"my_documents\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "5ecaab6d",
"metadata": {},
"source": [
"## Creating our self-querying retriever\n",
"Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "86e34dbf",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
"from langchain.chains.query_constructor.base import AttributeInfo\n",
"\n",
"metadata_field_info=[\n",
" AttributeInfo(\n",
" name=\"genre\",\n",
" description=\"The genre of the movie\", \n",
" type=\"string or list[string]\", \n",
" ),\n",
" AttributeInfo(\n",
" name=\"year\",\n",
" description=\"The year the movie was released\", \n",
" type=\"integer\", \n",
" ),\n",
" AttributeInfo(\n",
" name=\"director\",\n",
" description=\"The name of the movie director\", \n",
" type=\"string\", \n",
" ),\n",
" AttributeInfo(\n",
" name=\"rating\",\n",
" description=\"A 1-10 rating for the movie\",\n",
" type=\"float\"\n",
" ),\n",
"]\n",
"document_content_description = \"Brief summary of a movie\"\n",
"llm = OpenAI(temperature=0)\n",
"retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True)"
]
},
{
"cell_type": "markdown",
"id": "ea9df8d4",
"metadata": {},
"source": [
"## Testing it out\n",
"And now we can try actually using our retriever!"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "38a126e9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"query='dinosaur' filter=None limit=None\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction'}),\n",
" Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'}),\n",
" Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction'}),\n",
" Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6})]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# This example only specifies a relevant query\n",
"retriever.get_relevant_documents(\"What are some movies about dinosaurs\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "fc3f1e6e",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"query=' ' filter=Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8.5) limit=None\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction'}),\n",
" Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6})]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# This example only specifies a filter\n",
"retriever.get_relevant_documents(\"I want to watch a movie rated higher than 8.5\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b19d4da0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"query='women' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='director', value='Greta Gerwig') limit=None\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'year': 2019, 'director': 'Greta Gerwig', 'rating': 8.3})]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# This example specifies a query and a filter\n",
"retriever.get_relevant_documents(\"Has Greta Gerwig directed any movies about women\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "f900e40e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"query=' ' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8.5), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='science fiction')]) limit=None\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction'})]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# This example specifies a composite filter\n",
"retriever.get_relevant_documents(\"What's a highly rated (above 8.5) science fiction film?\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "12a51522",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"query='toys' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.GT: 'gt'>, attribute='year', value=1990), Comparison(comparator=<Comparator.LT: 'lt'>, attribute='year', value=2005), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='animated')]) limit=None\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# This example specifies a query and composite filter\n",
"retriever.get_relevant_documents(\"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\")"
]
},
{
"cell_type": "markdown",
"id": "39bd1de1-b9fe-4a98-89da-58d8a7a6ae51",
"metadata": {},
"source": [
"## Filter k\n",
"\n",
"We can also use the self query retriever to specify `k`: the number of documents to fetch.\n",
"\n",
"We can do this by passing `enable_limit=True` to the constructor."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "bff36b88-b506-4877-9c63-e5a1a8d78e64",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"retriever = SelfQueryRetriever.from_llm(\n",
" llm, \n",
" vectorstore, \n",
" document_content_description, \n",
" metadata_field_info, \n",
" enable_limit=True,\n",
" verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "2758d229-4f97-499c-819f-888acaf8ee10",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"query='dinosaur' filter=None limit=2\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction'}),\n",
" Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# This example only specifies a relevant query\n",
"retriever.get_relevant_documents(\"what are two movies about dinosaurs\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -401,17 +401,18 @@
},
{
"cell_type": "markdown",
"id": "525e3582",
"metadata": {},
"source": [
"### Metadata filtering\n",
"\n",
"Qdrant has an [extensive filtering system](https://qdrant.tech/documentation/concepts/filtering/) with rich type support. It is also possible to use the filters in Langchain, by passing an additional param to both the `similarity_search_with_score` and `similarity_search` methods."
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "markdown",
"id": "1c2c58dc",
"metadata": {},
"source": [
"```python\n",
"from qdrant_client.http import models as rest\n",
@ -419,10 +420,7 @@
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"found_docs = qdrant.similarity_search_with_score(query, filter=rest.Filter(...))\n",
"```"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "markdown",
@ -683,7 +681,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.11.3"
}
},
"nbformat": 4,

@ -33,6 +33,7 @@ class StructuredQueryOutputParser(BaseOutputParser[StructuredQuery]):
def parse(self, text: str) -> StructuredQuery:
try:
expected_keys = ["query", "filter"]
allowed_keys = ["query", "filter", "limit"]
parsed = parse_and_check_json_markdown(text, expected_keys)
if len(parsed["query"]) == 0:
parsed["query"] = " "
@ -40,10 +41,10 @@ class StructuredQueryOutputParser(BaseOutputParser[StructuredQuery]):
parsed["filter"] = None
else:
parsed["filter"] = self.ast_parse(parsed["filter"])
if not parsed.get("limit"):
parsed.pop("limit", None)
return StructuredQuery(
query=parsed["query"],
filter=parsed["filter"],
limit=parsed.get("limit"),
**{k: v for k, v in parsed.items() if k in allowed_keys}
)
except Exception as e:
raise OutputParserException(

@ -3,7 +3,7 @@ from __future__ import annotations
from abc import ABC, abstractmethod
from enum import Enum
from typing import Any, List, Optional, Sequence
from typing import Any, List, Optional, Sequence, Union
from pydantic import BaseModel
@ -14,6 +14,20 @@ class Visitor(ABC):
allowed_comparators: Optional[Sequence[Comparator]] = None
allowed_operators: Optional[Sequence[Operator]] = None
def _validate_func(self, func: Union[Operator, Comparator]) -> None:
if isinstance(func, Operator) and self.allowed_operators is not None:
if func not in self.allowed_operators:
raise ValueError(
f"Received disallowed operator {func}. Allowed "
f"comparators are {self.allowed_operators}"
)
if isinstance(func, Comparator) and self.allowed_comparators is not None:
if func not in self.allowed_comparators:
raise ValueError(
f"Received disallowed comparator {func}. Allowed "
f"comparators are {self.allowed_comparators}"
)
@abstractmethod
def visit_operation(self, operation: Operation) -> Any:
"""Translate an Operation."""

@ -10,23 +10,28 @@ from langchain.chains.query_constructor.ir import StructuredQuery, Visitor
from langchain.chains.query_constructor.schema import AttributeInfo
from langchain.retrievers.self_query.chroma import ChromaTranslator
from langchain.retrievers.self_query.pinecone import PineconeTranslator
from langchain.retrievers.self_query.qdrant import QdrantTranslator
from langchain.retrievers.self_query.weaviate import WeaviateTranslator
from langchain.schema import BaseRetriever, Document
from langchain.vectorstores import Chroma, Pinecone, VectorStore, Weaviate
from langchain.vectorstores import Chroma, Pinecone, Qdrant, VectorStore, Weaviate
def _get_builtin_translator(vectorstore_cls: Type[VectorStore]) -> Visitor:
def _get_builtin_translator(vectorstore: VectorStore) -> Visitor:
"""Get the translator class corresponding to the vector store class."""
vectorstore_cls = vectorstore.__class__
BUILTIN_TRANSLATORS: Dict[Type[VectorStore], Type[Visitor]] = {
Pinecone: PineconeTranslator,
Chroma: ChromaTranslator,
Weaviate: WeaviateTranslator,
Qdrant: QdrantTranslator,
}
if vectorstore_cls not in BUILTIN_TRANSLATORS:
raise ValueError(
f"Self query retriever with Vector Store type {vectorstore_cls}"
f" not supported."
)
if isinstance(vectorstore, Qdrant):
return QdrantTranslator(metadata_key=vectorstore.metadata_payload_key)
return BUILTIN_TRANSLATORS[vectorstore_cls]()
@ -55,9 +60,8 @@ class SelfQueryRetriever(BaseRetriever, BaseModel):
def validate_translator(cls, values: Dict) -> Dict:
"""Validate translator."""
if "structured_query_translator" not in values:
vectorstore_cls = values["vectorstore"].__class__
values["structured_query_translator"] = _get_builtin_translator(
vectorstore_cls
values["vectorstore"]
)
return values
@ -102,7 +106,7 @@ class SelfQueryRetriever(BaseRetriever, BaseModel):
**kwargs: Any,
) -> "SelfQueryRetriever":
if structured_query_translator is None:
structured_query_translator = _get_builtin_translator(vectorstore.__class__)
structured_query_translator = _get_builtin_translator(vectorstore)
chain_kwargs = chain_kwargs or {}
if "allowed_comparators" not in chain_kwargs:

@ -18,18 +18,7 @@ class ChromaTranslator(Visitor):
"""Subset of allowed logical operators."""
def _format_func(self, func: Union[Operator, Comparator]) -> str:
if isinstance(func, Operator) and self.allowed_operators is not None:
if func not in self.allowed_operators:
raise ValueError(
f"Received disallowed operator {func}. Allowed "
f"comparators are {self.allowed_operators}"
)
if isinstance(func, Comparator) and self.allowed_comparators is not None:
if func not in self.allowed_comparators:
raise ValueError(
f"Received disallowed comparator {func}. Allowed "
f"comparators are {self.allowed_comparators}"
)
self._validate_func(func)
return f"${func.value}"
def visit_operation(self, operation: Operation) -> Dict:

@ -18,18 +18,7 @@ class PineconeTranslator(Visitor):
"""Subset of allowed logical operators."""
def _format_func(self, func: Union[Operator, Comparator]) -> str:
if isinstance(func, Operator) and self.allowed_operators is not None:
if func not in self.allowed_operators:
raise ValueError(
f"Received disallowed operator {func}. Allowed "
f"comparators are {self.allowed_operators}"
)
if isinstance(func, Comparator) and self.allowed_comparators is not None:
if func not in self.allowed_comparators:
raise ValueError(
f"Received disallowed comparator {func}. Allowed "
f"comparators are {self.allowed_comparators}"
)
self._validate_func(func)
return f"${func.value}"
def visit_operation(self, operation: Operation) -> Dict:

@ -0,0 +1,66 @@
"""Logic for converting internal query language to a valid Qdrant query."""
from __future__ import annotations
from typing import TYPE_CHECKING, Tuple
from langchain.chains.query_constructor.ir import (
Comparator,
Comparison,
Operation,
Operator,
StructuredQuery,
Visitor,
)
if TYPE_CHECKING:
from qdrant_client.http import models as rest
class QdrantTranslator(Visitor):
"""Logic for converting internal query language elements to valid filters."""
def __init__(self, metadata_key: str):
self.metadata_key = metadata_key
def visit_operation(self, operation: Operation) -> rest.Filter:
from qdrant_client.http import models as rest
args = [arg.accept(self) for arg in operation.arguments]
operator = {
Operator.AND: "must",
Operator.OR: "should",
Operator.NOT: "must_not",
}[operation.operator]
return rest.Filter(**{operator: args})
def visit_comparison(self, comparison: Comparison) -> rest.FieldCondition:
from qdrant_client.http import models as rest
self._validate_func(comparison.comparator)
attribute = self.metadata_key + "." + comparison.attribute
if comparison.comparator == Comparator.EQ:
return rest.FieldCondition(
key=attribute, match=rest.MatchValue(value=comparison.value)
)
kwargs = {comparison.comparator.value: comparison.value}
return rest.FieldCondition(key=attribute, range=rest.Range(**kwargs))
def visit_structured_query(
self, structured_query: StructuredQuery
) -> Tuple[str, dict]:
try:
from qdrant_client.http import models as rest
except ImportError as e:
raise ImportError(
"Cannot import qdrant_client. Please install with `pip install "
"qdrant-client`."
) from e
if structured_query.filter is None:
kwargs = {}
else:
filter = structured_query.filter.accept(self)
if isinstance(filter, rest.FieldCondition):
filter = rest.Filter(must=[filter])
kwargs = {"filter": filter}
return structured_query.query, kwargs

@ -19,26 +19,12 @@ class WeaviateTranslator(Visitor):
allowed_comparators = [Comparator.EQ]
def _map_func(self, func: Union[Operator, Comparator]) -> str:
def _format_func(self, func: Union[Operator, Comparator]) -> str:
self._validate_func(func)
# https://weaviate.io/developers/weaviate/api/graphql/filters
map_dict = {Operator.AND: "And", Operator.OR: "Or", Comparator.EQ: "Equal"}
return map_dict[func]
def _format_func(self, func: Union[Operator, Comparator]) -> str:
if isinstance(func, Operator) and self.allowed_operators is not None:
if func not in self.allowed_operators:
raise ValueError(
f"Received disallowed operator {func}. Allowed "
f"comparators are {self.allowed_operators}"
)
if isinstance(func, Comparator) and self.allowed_comparators is not None:
if func not in self.allowed_comparators:
raise ValueError(
f"Received disallowed comparator {func}. Allowed "
f"comparators are {self.allowed_comparators}"
)
return self._map_func(func)
def visit_operation(self, operation: Operation) -> Dict:
args = [arg.accept(self) for arg in operation.arguments]
return {"operator": self._format_func(operation.operator), "operands": args}

@ -218,7 +218,7 @@ class Qdrant(VectorStore):
Returns:
List of Documents most similar to the query.
"""
results = self.similarity_search_with_score(query, k, filter)
results = self.similarity_search_with_score(query, k, filter=filter)
return list(map(itemgetter(0), results))
def similarity_search_with_score(
@ -245,7 +245,6 @@ class Qdrant(VectorStore):
qdrant_filter = self._qdrant_filter_from_dict(filter)
else:
qdrant_filter = filter
results = self.client.search(
collection_name=self.collection_name,
query_vector=self._embed_query(query),

Loading…
Cancel
Save