Add template for self-query-qdrant (#12795)

This PR adds a self-querying template using Qdrant as a vector store.
The template uses an artificial dataset and was implemented in a way
that simplifies passing different components and choosing LLM and
embedding providers.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
pull/12975/head v0.0.330
Kacper Łukawski 7 months ago committed by GitHub
parent f41f4c5e37
commit 66c41c0dbf
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,2 @@
.idea
tests

@ -0,0 +1,161 @@
# self-query-qdrant
This template performs [self-querying](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/)
using Qdrant and OpenAI. By default, it uses an artificial dataset of 10 documents, but you can replace it with your own dataset.
## Environment Setup
Set the `OPENAI_API_KEY` environment variable to access the OpenAI models.
Set the `QDRANT_URL` to the URL of your Qdrant instance. If you use [Qdrant Cloud](https://cloud.qdrant.io)
you have to set the `QDRANT_API_KEY` environment variable as well. If you do not set any of them,
the template will try to connect a local Qdrant instance at `http://localhost:6333`.
```shell
export QDRANT_URL=
export QDRANT_API_KEY=
export OPENAI_API_KEY=
```
## Usage
To use this package, install the LangChain CLI first:
```shell
pip install -U "langchain-cli[serve]"
```
Create a new LangChain project and install this package as the only one:
```shell
langchain app new my-app --package self-query-qdrant
```
To add this to an existing project, run:
```shell
langchain app add self-query-qdrant
```
### Defaults
Before you launch the server, you need to create a Qdrant collection and index the documents.
It can be done by running the following command:
```python
from self_query_qdrant.chain import initialize
initialize()
```
Add the following code to your `app/server.py` file:
```python
from self_query_qdrant.chain import chain
add_routes(app, chain, path="/self-query-qdrant")
```
The default dataset consists 10 documents about dishes, along with their price and restaurant information.
You can find the documents in the `packages/self-query-qdrant/self_query_qdrant/defaults.py` file.
Here is one of the documents:
```python
from langchain.schema import Document
Document(
page_content="Spaghetti with meatballs and tomato sauce",
metadata={
"price": 12.99,
"restaurant": {
"name": "Olive Garden",
"location": ["New York", "Chicago", "Los Angeles"],
},
},
)
```
The self-querying allows performing semantic search over the documents, with some additional filtering
based on the metadata. For example, you can search for the dishes that cost less than $15 and are served in New York.
### Customization
All the examples above assume that you want to launch the template with just the defaults.
If you want to customize the template, you can do it by passing the parameters to the `create_chain` function
in the `app/server.py` file:
```python
from langchain.llms import Cohere
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains.query_constructor.schema import AttributeInfo
from self_query_qdrant.chain import create_chain
chain = create_chain(
llm=Cohere(),
embeddings=HuggingFaceEmbeddings(),
document_contents="Descriptions of cats, along with their names and breeds.",
metadata_field_info=[
AttributeInfo(name="name", description="Name of the cat", type="string"),
AttributeInfo(name="breed", description="Cat's breed", type="string"),
],
collection_name="cats",
)
```
The same goes for the `initialize` function that creates a Qdrant collection and indexes the documents:
```python
from langchain.schema import Document
from langchain.embeddings import HuggingFaceEmbeddings
from self_query_qdrant.chain import initialize
initialize(
embeddings=HuggingFaceEmbeddings(),
collection_name="cats",
documents=[
Document(
page_content="A mean lazy old cat who destroys furniture and eats lasagna",
metadata={"name": "Garfield", "breed": "Tabby"},
),
...
]
)
```
The template is flexible and might be used for different sets of documents easily.
### LangSmith
(Optional) If you have access to LangSmith, configure it to help trace, monitor and debug LangChain applications. If you don't have access, skip this section.
```shell
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=<your-api-key>
export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to "default"
```
If you are inside this directory, then you can spin up a LangServe instance directly by:
```shell
langchain serve
```
### Local Server
This will start the FastAPI app with a server running locally at
[http://localhost:8000](http://localhost:8000)
You can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
Access the playground at [http://127.0.0.1:8000/self-query-qdrant/playground](http://127.0.0.1:8000/self-query-qdrant/playground)
Access the template from code with:
```python
from langserve.client import RemoteRunnable
runnable = RemoteRunnable("http://localhost:8000/self-query-qdrant")
```

File diff suppressed because it is too large Load Diff

@ -0,0 +1,32 @@
[tool.poetry]
name = "self-query-qdrant"
version = "0.1.0"
description = "Self-querying retriever using Qdrant"
authors = ["Kacper Łukawski <lukawski.kacper@gmail.com>"]
license = "Apache 2.0"
readme = "README.md"
packages = [{include = "self_query_qdrant"}]
[tool.poetry.dependencies]
python = ">=3.9,<3.13"
langchain = ">=0.0.325"
openai = "^0.28.1"
qdrant-client = ">=1.6"
lark = "^1.1.8"
tiktoken = "^0.5.1"
[tool.poetry.group.dev.dependencies]
langchain-cli = ">=0.0.15"
[tool.poetry.group.dev.dependencies.python-dotenv]
extras = [
"cli",
]
version = "^1.0.0"
[tool.langserve]
export_module = "self_query_qdrant"
export_attr = "chain"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

@ -0,0 +1,3 @@
from self_query_qdrant.chain import chain
__all__ = ["chain"]

@ -0,0 +1,92 @@
import os
from typing import List, Optional
from langchain.chains.query_constructor.schema import AttributeInfo
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import BaseLLM
from langchain.llms.openai import OpenAI
from langchain.pydantic_v1 import BaseModel
from langchain.retrievers import SelfQueryRetriever
from langchain.schema import Document, StrOutputParser
from langchain.schema.embeddings import Embeddings
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
from langchain.vectorstores.qdrant import Qdrant
from qdrant_client import QdrantClient
from self_query_qdrant import defaults, helper, prompts
class Query(BaseModel):
__root__: str
def create_chain(
llm: Optional[BaseLLM] = None,
embeddings: Optional[Embeddings] = None,
document_contents: str = defaults.DEFAULT_DOCUMENT_CONTENTS,
metadata_field_info: List[AttributeInfo] = defaults.DEFAULT_METADATA_FIELD_INFO,
collection_name: str = defaults.DEFAULT_COLLECTION_NAME,
):
"""
Create a chain that can be used to query a Qdrant vector store with a self-querying
capability. By default, this chain will use the OpenAI LLM and OpenAIEmbeddings, and
work with the default document contents and metadata field info. You can override
these defaults by passing in your own values.
:param llm: an LLM to use for generating text
:param embeddings: an Embeddings to use for generating queries
:param document_contents: a description of the document set
:param metadata_field_info: list of metadata attributes
:param collection_name: name of the Qdrant collection to use
:return:
"""
llm = llm or OpenAI()
embeddings = embeddings or OpenAIEmbeddings()
# Set up a vector store to store your vectors and metadata
client = QdrantClient(
url=os.environ.get("QDRANT_URL", "http://localhost:6333"),
api_key=os.environ.get("QDRANT_API_KEY"),
)
vectorstore = Qdrant(
client=client,
collection_name=collection_name,
embeddings=embeddings,
)
# Set up a retriever to query your vector store with self-querying capabilities
retriever = SelfQueryRetriever.from_llm(
llm, vectorstore, document_contents, metadata_field_info, verbose=True
)
context = RunnableParallel(
context=retriever | helper.combine_documents,
query=RunnablePassthrough(),
)
pipeline = context | prompts.LLM_CONTEXT_PROMPT | llm | StrOutputParser()
return pipeline.with_types(input_type=Query)
def initialize(
embeddings: Optional[Embeddings] = None,
collection_name: str = defaults.DEFAULT_COLLECTION_NAME,
documents: List[Document] = defaults.DEFAULT_DOCUMENTS,
):
"""
Initialize a vector store with a set of documents. By default, the documents will be
compatible with the default metadata field info. You can override these defaults by
passing in your own values.
:param embeddings: an Embeddings to use for generating queries
:param collection_name: name of the Qdrant collection to use
:param documents: a list of documents to initialize the vector store with
:return:
"""
embeddings = embeddings or OpenAIEmbeddings()
# Set up a vector store to store your vectors and metadata
Qdrant.from_documents(
documents, embedding=embeddings, collection_name=collection_name
)
# Create the default chain
chain = create_chain()

@ -0,0 +1,134 @@
from langchain.chains.query_constructor.schema import AttributeInfo
from langchain.schema import Document
# Qdrant collection name
DEFAULT_COLLECTION_NAME = "restaurants"
# Here is a description of the dataset and metadata attributes. Metadata attributes will
# be used to filter the results of the query beyond the semantic search.
DEFAULT_DOCUMENT_CONTENTS = (
"Dishes served at different restaurants, along with the restaurant information"
)
DEFAULT_METADATA_FIELD_INFO = [
AttributeInfo(
name="price",
description="The price of the dish",
type="float",
),
AttributeInfo(
name="restaurant.name",
description="The name of the restaurant",
type="string",
),
AttributeInfo(
name="restaurant.location",
description="Name of the city where the restaurant is located",
type="string or list[string]",
),
]
# A default set of documents to use for the vector store. This is a list of Document
# objects, which have a page_content field and a metadata field. The metadata field is a
# dictionary of metadata attributes compatible with the metadata field info above.
DEFAULT_DOCUMENTS = [
Document(
page_content="Pepperoni pizza with extra cheese, crispy crust",
metadata={
"price": 10.99,
"restaurant": {
"name": "Pizza Hut",
"location": ["New York", "Chicago"],
},
},
),
Document(
page_content="Spaghetti with meatballs and tomato sauce",
metadata={
"price": 12.99,
"restaurant": {
"name": "Olive Garden",
"location": ["New York", "Chicago", "Los Angeles"],
},
},
),
Document(
page_content="Chicken tikka masala with naan",
metadata={
"price": 14.99,
"restaurant": {
"name": "Indian Oven",
"location": ["New York", "Los Angeles"],
},
},
),
Document(
page_content="Chicken teriyaki with rice",
metadata={
"price": 11.99,
"restaurant": {
"name": "Sakura",
"location": ["New York", "Chicago", "Los Angeles"],
},
},
),
Document(
page_content="Scabbard fish with banana and passion fruit sauce",
metadata={
"price": 19.99,
"restaurant": {
"name": "A Concha",
"location": ["San Francisco"],
},
},
),
Document(
page_content="Pielmieni with sour cream",
metadata={
"price": 13.99,
"restaurant": {
"name": "Russian House",
"location": ["New York", "Chicago"],
},
},
),
Document(
page_content="Chicken biryani with raita",
metadata={
"price": 14.99,
"restaurant": {
"name": "Indian Oven",
"location": ["Los Angeles"],
},
},
),
Document(
page_content="Tomato soup with croutons",
metadata={
"price": 7.99,
"restaurant": {
"name": "Olive Garden",
"location": ["New York", "Chicago", "Los Angeles"],
},
},
),
Document(
page_content="Vegan burger with sweet potato fries",
metadata={
"price": 12.99,
"restaurant": {
"name": "Burger King",
"location": ["New York", "Los Angeles"],
},
},
),
Document(
page_content="Chicken nuggets with french fries",
metadata={
"price": 9.99,
"restaurant": {
"name": "McDonald's",
"location": ["San Francisco", "New York", "Los Angeles"],
},
},
),
]

@ -0,0 +1,27 @@
from string import Formatter
from typing import List
from langchain.schema import Document
document_template = """
PASSAGE: {page_content}
METADATA: {metadata}
"""
def combine_documents(documents: List[Document]) -> str:
"""
Combine a list of documents into a single string that might be passed further down
to a language model.
:param documents: list of documents to combine
:return:
"""
formatter = Formatter()
return "\n\n".join(
formatter.format(
document_template,
page_content=document.page_content,
metadata=document.metadata,
)
for document in documents
)

@ -0,0 +1,16 @@
from langchain.prompts import PromptTemplate
llm_context_prompt_template = """
Answer the user query using provided passages. Each passage has metadata given as
a nested JSON object you can also use. When answering, cite source name of the passages
you are answering from below the answer in a unique bullet point list.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----
{context}
----
Query: {query}
""" # noqa: E501
LLM_CONTEXT_PROMPT = PromptTemplate.from_template(llm_context_prompt_template)
Loading…
Cancel
Save