Add template for self-query-qdrant (#12795)

This PR adds a self-querying template using Qdrant as a vector store. The template uses an artificial dataset and was implemented in a way that simplifies passing different components and choosing LLM and embedding providers. --------- Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago · 66c41c0dbf
parent f41f4c5e37
commit 66c41c0dbf
9 changed files with 2394 additions and 0 deletions
--- a/templates/self-query-qdrant/.gitignore
+++ b/templates/self-query-qdrant/.gitignore
@ -0,0 +1,2 @@
+.idea
+tests
--- a/templates/self-query-qdrant/README.md
+++ b/templates/self-query-qdrant/README.md
@ -0,0 +1,161 @@
+
+# self-query-qdrant
+
+This template performs [self-querying](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/) 
+using Qdrant and OpenAI. By default, it uses an artificial dataset of 10 documents, but you can replace it with your own dataset.
+
+## Environment Setup
+
+Set the `OPENAI_API_KEY` environment variable to access the OpenAI models.
+
+Set the `QDRANT_URL` to the URL of your Qdrant instance. If you use [Qdrant Cloud](https://cloud.qdrant.io)
+you have to set the `QDRANT_API_KEY` environment variable as well. If you do not set any of them,
+the template will try to connect a local Qdrant instance at `http://localhost:6333`.
+
+```shell
+export QDRANT_URL=
+export QDRANT_API_KEY=
+
+export OPENAI_API_KEY=
+```
+
+## Usage
+
+To use this package, install the LangChain CLI first:
+
+```shell
+pip install -U "langchain-cli[serve]"
+```
+
+Create a new LangChain project and install this package as the only one:
+
+```shell
+langchain app new my-app --package self-query-qdrant
+```
+
+To add this to an existing project, run:
+
+```shell
+langchain app add self-query-qdrant
+```
+
+### Defaults
+
+Before you launch the server, you need to create a Qdrant collection and index the documents.
+It can be done by running the following command:
+
+```python
+from self_query_qdrant.chain import initialize
+
+initialize()
+```
+
+Add the following code to your `app/server.py` file:
+
+```python
+from self_query_qdrant.chain import chain
+
+add_routes(app, chain, path="/self-query-qdrant")
+```
+
+The default dataset consists 10 documents about dishes, along with their price and restaurant information.
+You can find the documents in the `packages/self-query-qdrant/self_query_qdrant/defaults.py` file.
+Here is one of the documents:
+
+```python
+from langchain.schema import Document
+
+Document(
+    page_content="Spaghetti with meatballs and tomato sauce",
+    metadata={
+        "price": 12.99,
+        "restaurant": {
+            "name": "Olive Garden",
+            "location": ["New York", "Chicago", "Los Angeles"],
+        },
+    },
+)
+```
+
+The self-querying allows performing semantic search over the documents, with some additional filtering
+based on the metadata. For example, you can search for the dishes that cost less than $15 and are served in New York.
+
+### Customization
+
+All the examples above assume that you want to launch the template with just the defaults.
+If you want to customize the template, you can do it by passing the parameters to the `create_chain` function
+in the `app/server.py` file:
+
+```python
+from langchain.llms import Cohere
+from langchain.embeddings import HuggingFaceEmbeddings
+from langchain.chains.query_constructor.schema import AttributeInfo
+
+from self_query_qdrant.chain import create_chain
+
+chain = create_chain(
+    llm=Cohere(),
+    embeddings=HuggingFaceEmbeddings(),
+    document_contents="Descriptions of cats, along with their names and breeds.",
+    metadata_field_info=[
+        AttributeInfo(name="name", description="Name of the cat", type="string"),
+        AttributeInfo(name="breed", description="Cat's breed", type="string"),
+    ],
+    collection_name="cats",
+)
+```
+
+The same goes for the `initialize` function that creates a Qdrant collection and indexes the documents:
+
+```python
+from langchain.schema import Document
+from langchain.embeddings import HuggingFaceEmbeddings
+
+from self_query_qdrant.chain import initialize
+
+initialize(
+    embeddings=HuggingFaceEmbeddings(),
+    collection_name="cats",
+    documents=[
+        Document(
+            page_content="A mean lazy old cat who destroys furniture and eats lasagna",
+            metadata={"name": "Garfield", "breed": "Tabby"},
+        ),
+        ...
+    ]
+)
+```
+
+The template is flexible and might be used for different sets of documents easily.
+
+### LangSmith
+
+(Optional) If you have access to LangSmith, configure it to help trace, monitor and debug LangChain applications. If you don't have access, skip this section.
+
+```shell
+export LANGCHAIN_TRACING_V2=true
+export LANGCHAIN_API_KEY=<your-api-key>
+export LANGCHAIN_PROJECT=<your-project>  # if not specified, defaults to "default"
+```
+
+If you are inside this directory, then you can spin up a LangServe instance directly by:
+
+```shell
+langchain serve
+```
+
+### Local Server
+
+This will start the FastAPI app with a server running locally at 
+[http://localhost:8000](http://localhost:8000)
+
+You can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
+Access the playground at [http://127.0.0.1:8000/self-query-qdrant/playground](http://127.0.0.1:8000/self-query-qdrant/playground)
+
+Access the template from code with:
+
+```python
+from langserve.client import RemoteRunnable
+
+runnable = RemoteRunnable("http://localhost:8000/self-query-qdrant")
+```
--- a/templates/self-query-qdrant/poetry.lock
+++ b/templates/self-query-qdrant/poetry.lock
--- a/templates/self-query-qdrant/pyproject.toml
+++ b/templates/self-query-qdrant/pyproject.toml
@ -0,0 +1,32 @@
+[tool.poetry]
+name = "self-query-qdrant"
+version = "0.1.0"
+description = "Self-querying retriever using Qdrant"
+authors = ["Kacper Łukawski <lukawski.kacper@gmail.com>"]
+license = "Apache 2.0"
+readme = "README.md"
+packages = [{include = "self_query_qdrant"}]
+
+[tool.poetry.dependencies]
+python = ">=3.9,<3.13"
+langchain = ">=0.0.325"
+openai = "^0.28.1"
+qdrant-client = ">=1.6"
+lark = "^1.1.8"
+tiktoken = "^0.5.1"
+
+[tool.poetry.group.dev.dependencies]
+langchain-cli = ">=0.0.15"
+[tool.poetry.group.dev.dependencies.python-dotenv]
+extras = [
+    "cli",
+]
+version = "^1.0.0"
+
+[tool.langserve]
+export_module = "self_query_qdrant"
+export_attr = "chain"
+
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"
--- a/templates/self-query-qdrant/self_query_qdrant/init.py
+++ b/templates/self-query-qdrant/self_query_qdrant/init.py
@ -0,0 +1,3 @@
+from self_query_qdrant.chain import chain
+
+__all__ = ["chain"]
--- a/templates/self-query-qdrant/self_query_qdrant/chain.py
+++ b/templates/self-query-qdrant/self_query_qdrant/chain.py
@ -0,0 +1,92 @@
+import os
+from typing import List, Optional
+
+from langchain.chains.query_constructor.schema import AttributeInfo
+from langchain.embeddings import OpenAIEmbeddings
+from langchain.llms import BaseLLM
+from langchain.llms.openai import OpenAI
+from langchain.pydantic_v1 import BaseModel
+from langchain.retrievers import SelfQueryRetriever
+from langchain.schema import Document, StrOutputParser
+from langchain.schema.embeddings import Embeddings
+from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
+from langchain.vectorstores.qdrant import Qdrant
+from qdrant_client import QdrantClient
+
+from self_query_qdrant import defaults, helper, prompts
+
+
+class Query(BaseModel):
+    __root__: str
+
+
+def create_chain(
+    llm: Optional[BaseLLM] = None,
+    embeddings: Optional[Embeddings] = None,
+    document_contents: str = defaults.DEFAULT_DOCUMENT_CONTENTS,
+    metadata_field_info: List[AttributeInfo] = defaults.DEFAULT_METADATA_FIELD_INFO,
+    collection_name: str = defaults.DEFAULT_COLLECTION_NAME,
+):
+    """
+    Create a chain that can be used to query a Qdrant vector store with a self-querying
+    capability. By default, this chain will use the OpenAI LLM and OpenAIEmbeddings, and
+    work with the default document contents and metadata field info. You can override
+    these defaults by passing in your own values.
+    :param llm: an LLM to use for generating text
+    :param embeddings: an Embeddings to use for generating queries
+    :param document_contents: a description of the document set
+    :param metadata_field_info: list of metadata attributes
+    :param collection_name: name of the Qdrant collection to use
+    :return:
+    """
+    llm = llm or OpenAI()
+    embeddings = embeddings or OpenAIEmbeddings()
+
+    # Set up a vector store to store your vectors and metadata
+    client = QdrantClient(
+        url=os.environ.get("QDRANT_URL", "http://localhost:6333"),
+        api_key=os.environ.get("QDRANT_API_KEY"),
+    )
+    vectorstore = Qdrant(
+        client=client,
+        collection_name=collection_name,
+        embeddings=embeddings,
+    )
+
+    # Set up a retriever to query your vector store with self-querying capabilities
+    retriever = SelfQueryRetriever.from_llm(
+        llm, vectorstore, document_contents, metadata_field_info, verbose=True
+    )
+
+    context = RunnableParallel(
+        context=retriever | helper.combine_documents,
+        query=RunnablePassthrough(),
+    )
+    pipeline = context | prompts.LLM_CONTEXT_PROMPT | llm | StrOutputParser()
+    return pipeline.with_types(input_type=Query)
+
+
+def initialize(
+    embeddings: Optional[Embeddings] = None,
+    collection_name: str = defaults.DEFAULT_COLLECTION_NAME,
+    documents: List[Document] = defaults.DEFAULT_DOCUMENTS,
+):
+    """
+    Initialize a vector store with a set of documents. By default, the documents will be
+    compatible with the default metadata field info. You can override these defaults by
+    passing in your own values.
+    :param embeddings: an Embeddings to use for generating queries
+    :param collection_name: name of the Qdrant collection to use
+    :param documents: a list of documents to initialize the vector store with
+    :return:
+    """
+    embeddings = embeddings or OpenAIEmbeddings()
+
+    # Set up a vector store to store your vectors and metadata
+    Qdrant.from_documents(
+        documents, embedding=embeddings, collection_name=collection_name
+    )
+
+
+# Create the default chain
+chain = create_chain()
--- a/templates/self-query-qdrant/self_query_qdrant/defaults.py
+++ b/templates/self-query-qdrant/self_query_qdrant/defaults.py
@ -0,0 +1,134 @@
+from langchain.chains.query_constructor.schema import AttributeInfo
+from langchain.schema import Document
+
+# Qdrant collection name
+DEFAULT_COLLECTION_NAME = "restaurants"
+
+# Here is a description of the dataset and metadata attributes. Metadata attributes will
+# be used to filter the results of the query beyond the semantic search.
+DEFAULT_DOCUMENT_CONTENTS = (
+    "Dishes served at different restaurants, along with the restaurant information"
+)
+DEFAULT_METADATA_FIELD_INFO = [
+    AttributeInfo(
+        name="price",
+        description="The price of the dish",
+        type="float",
+    ),
+    AttributeInfo(
+        name="restaurant.name",
+        description="The name of the restaurant",
+        type="string",
+    ),
+    AttributeInfo(
+        name="restaurant.location",
+        description="Name of the city where the restaurant is located",
+        type="string or list[string]",
+    ),
+]
+
+# A default set of documents to use for the vector store. This is a list of Document
+# objects, which have a page_content field and a metadata field. The metadata field is a
+# dictionary of metadata attributes compatible with the metadata field info above.
+DEFAULT_DOCUMENTS = [
+    Document(
+        page_content="Pepperoni pizza with extra cheese, crispy crust",
+        metadata={
+            "price": 10.99,
+            "restaurant": {
+                "name": "Pizza Hut",
+                "location": ["New York", "Chicago"],
+            },
+        },
+    ),
+    Document(
+        page_content="Spaghetti with meatballs and tomato sauce",
+        metadata={
+            "price": 12.99,
+            "restaurant": {
+                "name": "Olive Garden",
+                "location": ["New York", "Chicago", "Los Angeles"],
+            },
+        },
+    ),
+    Document(
+        page_content="Chicken tikka masala with naan",
+        metadata={
+            "price": 14.99,
+            "restaurant": {
+                "name": "Indian Oven",
+                "location": ["New York", "Los Angeles"],
+            },
+        },
+    ),
+    Document(
+        page_content="Chicken teriyaki with rice",
+        metadata={
+            "price": 11.99,
+            "restaurant": {
+                "name": "Sakura",
+                "location": ["New York", "Chicago", "Los Angeles"],
+            },
+        },
+    ),
+    Document(
+        page_content="Scabbard fish with banana and passion fruit sauce",
+        metadata={
+            "price": 19.99,
+            "restaurant": {
+                "name": "A Concha",
+                "location": ["San Francisco"],
+            },
+        },
+    ),
+    Document(
+        page_content="Pielmieni with sour cream",
+        metadata={
+            "price": 13.99,
+            "restaurant": {
+                "name": "Russian House",
+                "location": ["New York", "Chicago"],
+            },
+        },
+    ),
+    Document(
+        page_content="Chicken biryani with raita",
+        metadata={
+            "price": 14.99,
+            "restaurant": {
+                "name": "Indian Oven",
+                "location": ["Los Angeles"],
+            },
+        },
+    ),
+    Document(
+        page_content="Tomato soup with croutons",
+        metadata={
+            "price": 7.99,
+            "restaurant": {
+                "name": "Olive Garden",
+                "location": ["New York", "Chicago", "Los Angeles"],
+            },
+        },
+    ),
+    Document(
+        page_content="Vegan burger with sweet potato fries",
+        metadata={
+            "price": 12.99,
+            "restaurant": {
+                "name": "Burger King",
+                "location": ["New York", "Los Angeles"],
+            },
+        },
+    ),
+    Document(
+        page_content="Chicken nuggets with french fries",
+        metadata={
+            "price": 9.99,
+            "restaurant": {
+                "name": "McDonald's",
+                "location": ["San Francisco", "New York", "Los Angeles"],
+            },
+        },
+    ),
+]
--- a/templates/self-query-qdrant/self_query_qdrant/helper.py
+++ b/templates/self-query-qdrant/self_query_qdrant/helper.py
@ -0,0 +1,27 @@
+from string import Formatter
+from typing import List
+
+from langchain.schema import Document
+
+document_template = """
+PASSAGE: {page_content}
+METADATA: {metadata}
+"""
+
+
+def combine_documents(documents: List[Document]) -> str:
+    """
+    Combine a list of documents into a single string that might be passed further down
+    to a language model.
+    :param documents: list of documents to combine
+    :return:
+    """
+    formatter = Formatter()
+    return "\n\n".join(
+        formatter.format(
+            document_template,
+            page_content=document.page_content,
+            metadata=document.metadata,
+        )
+        for document in documents
+    )
--- a/templates/self-query-qdrant/self_query_qdrant/prompts.py
+++ b/templates/self-query-qdrant/self_query_qdrant/prompts.py
@ -0,0 +1,16 @@
+from langchain.prompts import PromptTemplate
+
+llm_context_prompt_template = """
+Answer the user query using provided passages. Each passage has metadata given as 
+a nested JSON object you can also use. When answering, cite source name of the passages 
+you are answering from below the answer in a unique bullet point list.
+
+If you don't know the answer, just say that you don't know, don't try to make up an answer.
+
+----
+{context}
+----
+Query: {query}
+"""  # noqa: E501
+
+LLM_CONTEXT_PROMPT = PromptTemplate.from_template(llm_context_prompt_template)