mirror of https://github.com/hwchase17/langchain
Add template for self-query-qdrant (#12795)
This PR adds a self-querying template using Qdrant as a vector store. The template uses an artificial dataset and was implemented in a way that simplifies passing different components and choosing LLM and embedding providers. --------- Co-authored-by: Erick Friis <erick@langchain.dev>pull/12975/head v0.0.330
parent
f41f4c5e37
commit
66c41c0dbf
@ -0,0 +1,2 @@
|
||||
.idea
|
||||
tests
|
@ -0,0 +1,161 @@
|
||||
|
||||
# self-query-qdrant
|
||||
|
||||
This template performs [self-querying](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/)
|
||||
using Qdrant and OpenAI. By default, it uses an artificial dataset of 10 documents, but you can replace it with your own dataset.
|
||||
|
||||
## Environment Setup
|
||||
|
||||
Set the `OPENAI_API_KEY` environment variable to access the OpenAI models.
|
||||
|
||||
Set the `QDRANT_URL` to the URL of your Qdrant instance. If you use [Qdrant Cloud](https://cloud.qdrant.io)
|
||||
you have to set the `QDRANT_API_KEY` environment variable as well. If you do not set any of them,
|
||||
the template will try to connect a local Qdrant instance at `http://localhost:6333`.
|
||||
|
||||
```shell
|
||||
export QDRANT_URL=
|
||||
export QDRANT_API_KEY=
|
||||
|
||||
export OPENAI_API_KEY=
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
To use this package, install the LangChain CLI first:
|
||||
|
||||
```shell
|
||||
pip install -U "langchain-cli[serve]"
|
||||
```
|
||||
|
||||
Create a new LangChain project and install this package as the only one:
|
||||
|
||||
```shell
|
||||
langchain app new my-app --package self-query-qdrant
|
||||
```
|
||||
|
||||
To add this to an existing project, run:
|
||||
|
||||
```shell
|
||||
langchain app add self-query-qdrant
|
||||
```
|
||||
|
||||
### Defaults
|
||||
|
||||
Before you launch the server, you need to create a Qdrant collection and index the documents.
|
||||
It can be done by running the following command:
|
||||
|
||||
```python
|
||||
from self_query_qdrant.chain import initialize
|
||||
|
||||
initialize()
|
||||
```
|
||||
|
||||
Add the following code to your `app/server.py` file:
|
||||
|
||||
```python
|
||||
from self_query_qdrant.chain import chain
|
||||
|
||||
add_routes(app, chain, path="/self-query-qdrant")
|
||||
```
|
||||
|
||||
The default dataset consists 10 documents about dishes, along with their price and restaurant information.
|
||||
You can find the documents in the `packages/self-query-qdrant/self_query_qdrant/defaults.py` file.
|
||||
Here is one of the documents:
|
||||
|
||||
```python
|
||||
from langchain.schema import Document
|
||||
|
||||
Document(
|
||||
page_content="Spaghetti with meatballs and tomato sauce",
|
||||
metadata={
|
||||
"price": 12.99,
|
||||
"restaurant": {
|
||||
"name": "Olive Garden",
|
||||
"location": ["New York", "Chicago", "Los Angeles"],
|
||||
},
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
The self-querying allows performing semantic search over the documents, with some additional filtering
|
||||
based on the metadata. For example, you can search for the dishes that cost less than $15 and are served in New York.
|
||||
|
||||
### Customization
|
||||
|
||||
All the examples above assume that you want to launch the template with just the defaults.
|
||||
If you want to customize the template, you can do it by passing the parameters to the `create_chain` function
|
||||
in the `app/server.py` file:
|
||||
|
||||
```python
|
||||
from langchain.llms import Cohere
|
||||
from langchain.embeddings import HuggingFaceEmbeddings
|
||||
from langchain.chains.query_constructor.schema import AttributeInfo
|
||||
|
||||
from self_query_qdrant.chain import create_chain
|
||||
|
||||
chain = create_chain(
|
||||
llm=Cohere(),
|
||||
embeddings=HuggingFaceEmbeddings(),
|
||||
document_contents="Descriptions of cats, along with their names and breeds.",
|
||||
metadata_field_info=[
|
||||
AttributeInfo(name="name", description="Name of the cat", type="string"),
|
||||
AttributeInfo(name="breed", description="Cat's breed", type="string"),
|
||||
],
|
||||
collection_name="cats",
|
||||
)
|
||||
```
|
||||
|
||||
The same goes for the `initialize` function that creates a Qdrant collection and indexes the documents:
|
||||
|
||||
```python
|
||||
from langchain.schema import Document
|
||||
from langchain.embeddings import HuggingFaceEmbeddings
|
||||
|
||||
from self_query_qdrant.chain import initialize
|
||||
|
||||
initialize(
|
||||
embeddings=HuggingFaceEmbeddings(),
|
||||
collection_name="cats",
|
||||
documents=[
|
||||
Document(
|
||||
page_content="A mean lazy old cat who destroys furniture and eats lasagna",
|
||||
metadata={"name": "Garfield", "breed": "Tabby"},
|
||||
),
|
||||
...
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
The template is flexible and might be used for different sets of documents easily.
|
||||
|
||||
### LangSmith
|
||||
|
||||
(Optional) If you have access to LangSmith, configure it to help trace, monitor and debug LangChain applications. If you don't have access, skip this section.
|
||||
|
||||
```shell
|
||||
export LANGCHAIN_TRACING_V2=true
|
||||
export LANGCHAIN_API_KEY=<your-api-key>
|
||||
export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to "default"
|
||||
```
|
||||
|
||||
If you are inside this directory, then you can spin up a LangServe instance directly by:
|
||||
|
||||
```shell
|
||||
langchain serve
|
||||
```
|
||||
|
||||
### Local Server
|
||||
|
||||
This will start the FastAPI app with a server running locally at
|
||||
[http://localhost:8000](http://localhost:8000)
|
||||
|
||||
You can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
|
||||
Access the playground at [http://127.0.0.1:8000/self-query-qdrant/playground](http://127.0.0.1:8000/self-query-qdrant/playground)
|
||||
|
||||
Access the template from code with:
|
||||
|
||||
```python
|
||||
from langserve.client import RemoteRunnable
|
||||
|
||||
runnable = RemoteRunnable("http://localhost:8000/self-query-qdrant")
|
||||
```
|
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,32 @@
|
||||
[tool.poetry]
|
||||
name = "self-query-qdrant"
|
||||
version = "0.1.0"
|
||||
description = "Self-querying retriever using Qdrant"
|
||||
authors = ["Kacper Łukawski <lukawski.kacper@gmail.com>"]
|
||||
license = "Apache 2.0"
|
||||
readme = "README.md"
|
||||
packages = [{include = "self_query_qdrant"}]
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = ">=3.9,<3.13"
|
||||
langchain = ">=0.0.325"
|
||||
openai = "^0.28.1"
|
||||
qdrant-client = ">=1.6"
|
||||
lark = "^1.1.8"
|
||||
tiktoken = "^0.5.1"
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
langchain-cli = ">=0.0.15"
|
||||
[tool.poetry.group.dev.dependencies.python-dotenv]
|
||||
extras = [
|
||||
"cli",
|
||||
]
|
||||
version = "^1.0.0"
|
||||
|
||||
[tool.langserve]
|
||||
export_module = "self_query_qdrant"
|
||||
export_attr = "chain"
|
||||
|
||||
[build-system]
|
||||
requires = ["poetry-core"]
|
||||
build-backend = "poetry.core.masonry.api"
|
@ -0,0 +1,3 @@
|
||||
from self_query_qdrant.chain import chain
|
||||
|
||||
__all__ = ["chain"]
|
@ -0,0 +1,92 @@
|
||||
import os
|
||||
from typing import List, Optional
|
||||
|
||||
from langchain.chains.query_constructor.schema import AttributeInfo
|
||||
from langchain.embeddings import OpenAIEmbeddings
|
||||
from langchain.llms import BaseLLM
|
||||
from langchain.llms.openai import OpenAI
|
||||
from langchain.pydantic_v1 import BaseModel
|
||||
from langchain.retrievers import SelfQueryRetriever
|
||||
from langchain.schema import Document, StrOutputParser
|
||||
from langchain.schema.embeddings import Embeddings
|
||||
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
|
||||
from langchain.vectorstores.qdrant import Qdrant
|
||||
from qdrant_client import QdrantClient
|
||||
|
||||
from self_query_qdrant import defaults, helper, prompts
|
||||
|
||||
|
||||
class Query(BaseModel):
|
||||
__root__: str
|
||||
|
||||
|
||||
def create_chain(
|
||||
llm: Optional[BaseLLM] = None,
|
||||
embeddings: Optional[Embeddings] = None,
|
||||
document_contents: str = defaults.DEFAULT_DOCUMENT_CONTENTS,
|
||||
metadata_field_info: List[AttributeInfo] = defaults.DEFAULT_METADATA_FIELD_INFO,
|
||||
collection_name: str = defaults.DEFAULT_COLLECTION_NAME,
|
||||
):
|
||||
"""
|
||||
Create a chain that can be used to query a Qdrant vector store with a self-querying
|
||||
capability. By default, this chain will use the OpenAI LLM and OpenAIEmbeddings, and
|
||||
work with the default document contents and metadata field info. You can override
|
||||
these defaults by passing in your own values.
|
||||
:param llm: an LLM to use for generating text
|
||||
:param embeddings: an Embeddings to use for generating queries
|
||||
:param document_contents: a description of the document set
|
||||
:param metadata_field_info: list of metadata attributes
|
||||
:param collection_name: name of the Qdrant collection to use
|
||||
:return:
|
||||
"""
|
||||
llm = llm or OpenAI()
|
||||
embeddings = embeddings or OpenAIEmbeddings()
|
||||
|
||||
# Set up a vector store to store your vectors and metadata
|
||||
client = QdrantClient(
|
||||
url=os.environ.get("QDRANT_URL", "http://localhost:6333"),
|
||||
api_key=os.environ.get("QDRANT_API_KEY"),
|
||||
)
|
||||
vectorstore = Qdrant(
|
||||
client=client,
|
||||
collection_name=collection_name,
|
||||
embeddings=embeddings,
|
||||
)
|
||||
|
||||
# Set up a retriever to query your vector store with self-querying capabilities
|
||||
retriever = SelfQueryRetriever.from_llm(
|
||||
llm, vectorstore, document_contents, metadata_field_info, verbose=True
|
||||
)
|
||||
|
||||
context = RunnableParallel(
|
||||
context=retriever | helper.combine_documents,
|
||||
query=RunnablePassthrough(),
|
||||
)
|
||||
pipeline = context | prompts.LLM_CONTEXT_PROMPT | llm | StrOutputParser()
|
||||
return pipeline.with_types(input_type=Query)
|
||||
|
||||
|
||||
def initialize(
|
||||
embeddings: Optional[Embeddings] = None,
|
||||
collection_name: str = defaults.DEFAULT_COLLECTION_NAME,
|
||||
documents: List[Document] = defaults.DEFAULT_DOCUMENTS,
|
||||
):
|
||||
"""
|
||||
Initialize a vector store with a set of documents. By default, the documents will be
|
||||
compatible with the default metadata field info. You can override these defaults by
|
||||
passing in your own values.
|
||||
:param embeddings: an Embeddings to use for generating queries
|
||||
:param collection_name: name of the Qdrant collection to use
|
||||
:param documents: a list of documents to initialize the vector store with
|
||||
:return:
|
||||
"""
|
||||
embeddings = embeddings or OpenAIEmbeddings()
|
||||
|
||||
# Set up a vector store to store your vectors and metadata
|
||||
Qdrant.from_documents(
|
||||
documents, embedding=embeddings, collection_name=collection_name
|
||||
)
|
||||
|
||||
|
||||
# Create the default chain
|
||||
chain = create_chain()
|
@ -0,0 +1,134 @@
|
||||
from langchain.chains.query_constructor.schema import AttributeInfo
|
||||
from langchain.schema import Document
|
||||
|
||||
# Qdrant collection name
|
||||
DEFAULT_COLLECTION_NAME = "restaurants"
|
||||
|
||||
# Here is a description of the dataset and metadata attributes. Metadata attributes will
|
||||
# be used to filter the results of the query beyond the semantic search.
|
||||
DEFAULT_DOCUMENT_CONTENTS = (
|
||||
"Dishes served at different restaurants, along with the restaurant information"
|
||||
)
|
||||
DEFAULT_METADATA_FIELD_INFO = [
|
||||
AttributeInfo(
|
||||
name="price",
|
||||
description="The price of the dish",
|
||||
type="float",
|
||||
),
|
||||
AttributeInfo(
|
||||
name="restaurant.name",
|
||||
description="The name of the restaurant",
|
||||
type="string",
|
||||
),
|
||||
AttributeInfo(
|
||||
name="restaurant.location",
|
||||
description="Name of the city where the restaurant is located",
|
||||
type="string or list[string]",
|
||||
),
|
||||
]
|
||||
|
||||
# A default set of documents to use for the vector store. This is a list of Document
|
||||
# objects, which have a page_content field and a metadata field. The metadata field is a
|
||||
# dictionary of metadata attributes compatible with the metadata field info above.
|
||||
DEFAULT_DOCUMENTS = [
|
||||
Document(
|
||||
page_content="Pepperoni pizza with extra cheese, crispy crust",
|
||||
metadata={
|
||||
"price": 10.99,
|
||||
"restaurant": {
|
||||
"name": "Pizza Hut",
|
||||
"location": ["New York", "Chicago"],
|
||||
},
|
||||
},
|
||||
),
|
||||
Document(
|
||||
page_content="Spaghetti with meatballs and tomato sauce",
|
||||
metadata={
|
||||
"price": 12.99,
|
||||
"restaurant": {
|
||||
"name": "Olive Garden",
|
||||
"location": ["New York", "Chicago", "Los Angeles"],
|
||||
},
|
||||
},
|
||||
),
|
||||
Document(
|
||||
page_content="Chicken tikka masala with naan",
|
||||
metadata={
|
||||
"price": 14.99,
|
||||
"restaurant": {
|
||||
"name": "Indian Oven",
|
||||
"location": ["New York", "Los Angeles"],
|
||||
},
|
||||
},
|
||||
),
|
||||
Document(
|
||||
page_content="Chicken teriyaki with rice",
|
||||
metadata={
|
||||
"price": 11.99,
|
||||
"restaurant": {
|
||||
"name": "Sakura",
|
||||
"location": ["New York", "Chicago", "Los Angeles"],
|
||||
},
|
||||
},
|
||||
),
|
||||
Document(
|
||||
page_content="Scabbard fish with banana and passion fruit sauce",
|
||||
metadata={
|
||||
"price": 19.99,
|
||||
"restaurant": {
|
||||
"name": "A Concha",
|
||||
"location": ["San Francisco"],
|
||||
},
|
||||
},
|
||||
),
|
||||
Document(
|
||||
page_content="Pielmieni with sour cream",
|
||||
metadata={
|
||||
"price": 13.99,
|
||||
"restaurant": {
|
||||
"name": "Russian House",
|
||||
"location": ["New York", "Chicago"],
|
||||
},
|
||||
},
|
||||
),
|
||||
Document(
|
||||
page_content="Chicken biryani with raita",
|
||||
metadata={
|
||||
"price": 14.99,
|
||||
"restaurant": {
|
||||
"name": "Indian Oven",
|
||||
"location": ["Los Angeles"],
|
||||
},
|
||||
},
|
||||
),
|
||||
Document(
|
||||
page_content="Tomato soup with croutons",
|
||||
metadata={
|
||||
"price": 7.99,
|
||||
"restaurant": {
|
||||
"name": "Olive Garden",
|
||||
"location": ["New York", "Chicago", "Los Angeles"],
|
||||
},
|
||||
},
|
||||
),
|
||||
Document(
|
||||
page_content="Vegan burger with sweet potato fries",
|
||||
metadata={
|
||||
"price": 12.99,
|
||||
"restaurant": {
|
||||
"name": "Burger King",
|
||||
"location": ["New York", "Los Angeles"],
|
||||
},
|
||||
},
|
||||
),
|
||||
Document(
|
||||
page_content="Chicken nuggets with french fries",
|
||||
metadata={
|
||||
"price": 9.99,
|
||||
"restaurant": {
|
||||
"name": "McDonald's",
|
||||
"location": ["San Francisco", "New York", "Los Angeles"],
|
||||
},
|
||||
},
|
||||
),
|
||||
]
|
@ -0,0 +1,27 @@
|
||||
from string import Formatter
|
||||
from typing import List
|
||||
|
||||
from langchain.schema import Document
|
||||
|
||||
document_template = """
|
||||
PASSAGE: {page_content}
|
||||
METADATA: {metadata}
|
||||
"""
|
||||
|
||||
|
||||
def combine_documents(documents: List[Document]) -> str:
|
||||
"""
|
||||
Combine a list of documents into a single string that might be passed further down
|
||||
to a language model.
|
||||
:param documents: list of documents to combine
|
||||
:return:
|
||||
"""
|
||||
formatter = Formatter()
|
||||
return "\n\n".join(
|
||||
formatter.format(
|
||||
document_template,
|
||||
page_content=document.page_content,
|
||||
metadata=document.metadata,
|
||||
)
|
||||
for document in documents
|
||||
)
|
@ -0,0 +1,16 @@
|
||||
from langchain.prompts import PromptTemplate
|
||||
|
||||
llm_context_prompt_template = """
|
||||
Answer the user query using provided passages. Each passage has metadata given as
|
||||
a nested JSON object you can also use. When answering, cite source name of the passages
|
||||
you are answering from below the answer in a unique bullet point list.
|
||||
|
||||
If you don't know the answer, just say that you don't know, don't try to make up an answer.
|
||||
|
||||
----
|
||||
{context}
|
||||
----
|
||||
Query: {query}
|
||||
""" # noqa: E501
|
||||
|
||||
LLM_CONTEXT_PROMPT = PromptTemplate.from_template(llm_context_prompt_template)
|
Loading…
Reference in New Issue