Introducing Enhanced Functionality to WeaviateHybridSearchRetriever: Accepting Additional Keyword Arguments (#10802)

**Description:** 
This commit enriches the `WeaviateHybridSearchRetriever` class by
introducing a new parameter, `hybrid_search_kwargs`, within the
`_get_relevant_documents` method. This parameter accommodates arbitrary
keyword arguments (`**kwargs`) which can be channeled to the inherited
public method, `get_relevant_documents`, originating from the
`BaseRetriever` class.

This modification facilitates more intricate querying capabilities,
allowing users to convey supplementary arguments to the `.with_hybrid()`
method. This expansion not only makes it possible to perform a more
nuanced search targeting specific properties but also grants the ability
to boost the weight of searched properties, to carry out a search with a
custom vector, and to apply the Fusion ranking method. The documentation
has been updated accordingly to delineate these new possibilities in
detail.

In light of the layered approach in which this search operates,
initiating with `query.get()` and then transitioning to
`.with_hybrid()`, several advantageous opportunities are unlocked for
the hybrid component that were previously unattainable.

Here’s a representative example showcasing a query structure that was
formerly unfeasible:

[Specific Properties
Only](https://weaviate.io/developers/weaviate/search/hybrid#selected-properties-only)
"The example below illustrates a BM25 search targeting the keyword
'food' exclusively within the 'question' property, integrated with
vector search results corresponding to 'food'."
```python
response = (
    client.query
    .get("JeopardyQuestion", ["question", "answer"])
    .with_hybrid(
        query="food",
        properties=["question"], # Will now be possible moving forward
        alpha=0.25
    )
    .with_limit(3)
    .do()
)
```
This functionality is now accessible through my alterations, by
conveying `hybrid_search_kwargs={"properties": ["question", "answer"]}`
as an argument to
`WeaviateHybridSearchRetriever.get_relevant_documents()`. For example:

```python
import os
from weaviate import Client
from langchain.retrievers import WeaviateHybridSearchRetriever

client = Client(
        url=os.getenv("WEAVIATE_CLIENT_URL"),
        additional_headers={
            "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY"),
            "Authorization": f"Bearer {os.getenv('WEAVIATE_API_KEY')}",
        },
    )

index_name = "Document"
text_key = "content"
attributes = ["title", "summary", "header", "url"]

retriever = ExtendedWeaviateHybridSearchRetriever(
        client=client,
        index_name=index_name,
        text_key=text_key,
        attributes=attributes,
    )

# Warning: to utilize properties in this way, each use property must also be in the list `attributes + [text_key]`.
hybrid_search_kwargs = {"properties": ["summary^2", "content"]}
query_text = "Some Query Text"

relevant_docs = retriever.get_relevant_documents(
        query=query_text,
        hybrid_search_kwargs=hybrid_search_kwargs
    )
```
In my experience working with the `weaviate-client` library, I have
found that these supplementary options stand as vital tools for
refining/finetuning searches, notably within multifaceted datasets. As a
final note, this implementation supports both backwards and forward
(within reason) compatiblity. It accommodates any future additional
parameters Weaviate may add to `.with_hybrid()`, without necessitating
further alterations.

**Additional Documentation:**
For a more comprehensive understanding and to explore a myriad of useful
options that are now accessible, please refer to the Weaviate
documentation:
- [Fusion Ranking
Method](https://weaviate.io/developers/weaviate/search/hybrid#fusion-ranking-method)
- [Selected Properties
Only](https://weaviate.io/developers/weaviate/search/hybrid#selected-properties-only)
- [Weight Boost Searched
Properties](https://weaviate.io/developers/weaviate/search/hybrid#weight-boost-searched-properties)
- [With a Custom
Vector](https://weaviate.io/developers/weaviate/search/hybrid#with-a-custom-vector)

**Tag Maintainer:** 
@hwchase17 - I have tagged you based on your frequent contributions to
the pertinent file, `/retrievers/weaviate_hybrid_search.py`. My
apologies if this was not the appropriate choice.

Thank you for considering my contribution, I look forward to your
feedback, and to future collaboration.
pull/9890/head^2
Douglas Monsky 11 months ago committed by GitHub
parent 61cecf8b1b
commit d5f1969d55
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -99,8 +99,43 @@ class WeaviateHybridSearchRetriever(BaseRetriever):
run_manager: CallbackManagerForRetrieverRun,
where_filter: Optional[Dict[str, object]] = None,
score: bool = False,
hybrid_search_kwargs: Optional[Dict[str, object]] = None,
) -> List[Document]:
"""Look up similar documents in Weaviate."""
"""Look up similar documents in Weaviate.
query: The query to search for relevant documents
of using weviate hybrid search.
where_filter: A filter to apply to the query.
https://weaviate.io/developers/weaviate/guides/querying/#filtering
score: Whether to include the score, and score explanation
in the returned Documents meta_data.
hybrid_search_kwargs: Used to pass additional arguments
to the .with_hybrid() method.
The primary uses cases for this are:
1) Search specific properties only -
specify which properties to be used during hybrid search portion.
Note: this is not the same as the (self.attributes) to be returned.
Example - hybrid_search_kwargs={"properties": ["question", "answer"]}
https://weaviate.io/developers/weaviate/search/hybrid#selected-properties-only
2) Weight boosted searched properties -
Boost the weight of certain properties during the hybrid search portion.
Example - hybrid_search_kwargs={"properties": ["question^2", "answer"]}
https://weaviate.io/developers/weaviate/search/hybrid#weight-boost-searched-properties
3) Search with a custom vector - Define a different vector
to be used during the hybrid search portion.
Example - hybrid_search_kwargs={"vector": [0.1, 0.2, 0.3, ...]}
https://weaviate.io/developers/weaviate/search/hybrid#with-a-custom-vector
4) Use Fusion ranking method
Example - from weaviate.gql.get import HybridFusion
hybrid_search_kwargs={"fusion": fusion_type=HybridFusion.RELATIVE_SCORE}
https://weaviate.io/developers/weaviate/search/hybrid#fusion-ranking-method
"""
query_obj = self.client.query.get(self.index_name, self.attributes)
if where_filter:
query_obj = query_obj.with_where(where_filter)
@ -108,7 +143,14 @@ class WeaviateHybridSearchRetriever(BaseRetriever):
if score:
query_obj = query_obj.with_additional(["score", "explainScore"])
result = query_obj.with_hybrid(query, alpha=self.alpha).with_limit(self.k).do()
if hybrid_search_kwargs is None:
hybrid_search_kwargs = {}
result = (
query_obj.with_hybrid(query, alpha=self.alpha, **hybrid_search_kwargs)
.with_limit(self.k)
.do()
)
if "errors" in result:
raise ValueError(f"Error during query: {result['errors']}")

Loading…
Cancel
Save