added filter kwarg to VectorStoreIndexWrapper query and query_with_so… (#8844)

- Description: added filter to query methods in VectorStoreIndexWrapper
for filtering by metadata (i.e. search_kwargs)
- Tag maintainer: @rlancemartin, @eyurtsev

Updated the doc snippet on this topic as well. It took me a long while
to figure out how to filter the vectorstore by filename, so this might
help someone else out.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
pull/8927/head
Chris Pappalardo 11 months ago committed by GitHub
parent 4a63533216
commit beab637f04
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -159,6 +159,22 @@ index.vectorstore.as_retriever()
</CodeOutputBlock>
It can also be convenient to filter the vectorstore by the metadata associated with documents, particularly when your vectorstore has multiple sources. This can be done using the `query` method like so:
```python
index.query("Summarize the general content of this document.", retriever_kwargs={"search_kwargs": {"filter": {"source": "../state_of_the_union.txt"}}})
```
<CodeOutputBlock lang="python">
```
" The document is a speech given by President Trump to the nation on the occasion of his 245th birthday. The speech highlights the importance of American values and the challenges facing the country, including the ongoing conflict in Ukraine, the ongoing trade war with China, and the ongoing conflict in Syria. The speech also discusses the importance of investing in emerging technologies and American manufacturing, and calls on Congress to pass the Bipartisan Innovation Act and other important legislation."
```
</CodeOutputBlock>
## Walkthrough
Okay, so what's actually going on? How is this index getting created?

@ -1,4 +1,4 @@
from typing import Any, List, Optional, Type
from typing import Any, Dict, List, Optional, Type
from pydantic import BaseModel, Extra, Field
@ -31,22 +31,32 @@ class VectorStoreIndexWrapper(BaseModel):
arbitrary_types_allowed = True
def query(
self, question: str, llm: Optional[BaseLanguageModel] = None, **kwargs: Any
self,
question: str,
llm: Optional[BaseLanguageModel] = None,
retriever_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> str:
"""Query the vectorstore."""
llm = llm or OpenAI(temperature=0)
retriever_kwargs = retriever_kwargs or {}
chain = RetrievalQA.from_chain_type(
llm, retriever=self.vectorstore.as_retriever(), **kwargs
llm, retriever=self.vectorstore.as_retriever(**retriever_kwargs), **kwargs
)
return chain.run(question)
def query_with_sources(
self, question: str, llm: Optional[BaseLanguageModel] = None, **kwargs: Any
self,
question: str,
llm: Optional[BaseLanguageModel] = None,
retriever_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> dict:
"""Query the vectorstore and get back sources."""
llm = llm or OpenAI(temperature=0)
retriever_kwargs = retriever_kwargs or {}
chain = RetrievalQAWithSourcesChain.from_chain_type(
llm, retriever=self.vectorstore.as_retriever(), **kwargs
llm, retriever=self.vectorstore.as_retriever(**retriever_kwargs), **kwargs
)
return chain({chain.question_key: question})

Loading…
Cancel
Save