From d5f1969d550edf6f007846a9ceaf52081e19f29c Mon Sep 17 00:00:00 2001 From: Douglas Monsky Date: Tue, 19 Sep 2023 17:56:22 -0500 Subject: [PATCH] Introducing Enhanced Functionality to WeaviateHybridSearchRetriever: Accepting Additional Keyword Arguments (#10802) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit **Description:** This commit enriches the `WeaviateHybridSearchRetriever` class by introducing a new parameter, `hybrid_search_kwargs`, within the `_get_relevant_documents` method. This parameter accommodates arbitrary keyword arguments (`**kwargs`) which can be channeled to the inherited public method, `get_relevant_documents`, originating from the `BaseRetriever` class. This modification facilitates more intricate querying capabilities, allowing users to convey supplementary arguments to the `.with_hybrid()` method. This expansion not only makes it possible to perform a more nuanced search targeting specific properties but also grants the ability to boost the weight of searched properties, to carry out a search with a custom vector, and to apply the Fusion ranking method. The documentation has been updated accordingly to delineate these new possibilities in detail. In light of the layered approach in which this search operates, initiating with `query.get()` and then transitioning to `.with_hybrid()`, several advantageous opportunities are unlocked for the hybrid component that were previously unattainable. Here’s a representative example showcasing a query structure that was formerly unfeasible: [Specific Properties Only](https://weaviate.io/developers/weaviate/search/hybrid#selected-properties-only) "The example below illustrates a BM25 search targeting the keyword 'food' exclusively within the 'question' property, integrated with vector search results corresponding to 'food'." ```python response = ( client.query .get("JeopardyQuestion", ["question", "answer"]) .with_hybrid( query="food", properties=["question"], # Will now be possible moving forward alpha=0.25 ) .with_limit(3) .do() ) ``` This functionality is now accessible through my alterations, by conveying `hybrid_search_kwargs={"properties": ["question", "answer"]}` as an argument to `WeaviateHybridSearchRetriever.get_relevant_documents()`. For example: ```python import os from weaviate import Client from langchain.retrievers import WeaviateHybridSearchRetriever client = Client( url=os.getenv("WEAVIATE_CLIENT_URL"), additional_headers={ "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY"), "Authorization": f"Bearer {os.getenv('WEAVIATE_API_KEY')}", }, ) index_name = "Document" text_key = "content" attributes = ["title", "summary", "header", "url"] retriever = ExtendedWeaviateHybridSearchRetriever( client=client, index_name=index_name, text_key=text_key, attributes=attributes, ) # Warning: to utilize properties in this way, each use property must also be in the list `attributes + [text_key]`. hybrid_search_kwargs = {"properties": ["summary^2", "content"]} query_text = "Some Query Text" relevant_docs = retriever.get_relevant_documents( query=query_text, hybrid_search_kwargs=hybrid_search_kwargs ) ``` In my experience working with the `weaviate-client` library, I have found that these supplementary options stand as vital tools for refining/finetuning searches, notably within multifaceted datasets. As a final note, this implementation supports both backwards and forward (within reason) compatiblity. It accommodates any future additional parameters Weaviate may add to `.with_hybrid()`, without necessitating further alterations. **Additional Documentation:** For a more comprehensive understanding and to explore a myriad of useful options that are now accessible, please refer to the Weaviate documentation: - [Fusion Ranking Method](https://weaviate.io/developers/weaviate/search/hybrid#fusion-ranking-method) - [Selected Properties Only](https://weaviate.io/developers/weaviate/search/hybrid#selected-properties-only) - [Weight Boost Searched Properties](https://weaviate.io/developers/weaviate/search/hybrid#weight-boost-searched-properties) - [With a Custom Vector](https://weaviate.io/developers/weaviate/search/hybrid#with-a-custom-vector) **Tag Maintainer:** @hwchase17 - I have tagged you based on your frequent contributions to the pertinent file, `/retrievers/weaviate_hybrid_search.py`. My apologies if this was not the appropriate choice. Thank you for considering my contribution, I look forward to your feedback, and to future collaboration. --- .../retrievers/weaviate_hybrid_search.py | 46 ++++++++++++++++++- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/libs/langchain/langchain/retrievers/weaviate_hybrid_search.py b/libs/langchain/langchain/retrievers/weaviate_hybrid_search.py index 8c2191b166..e0a366406f 100644 --- a/libs/langchain/langchain/retrievers/weaviate_hybrid_search.py +++ b/libs/langchain/langchain/retrievers/weaviate_hybrid_search.py @@ -99,8 +99,43 @@ class WeaviateHybridSearchRetriever(BaseRetriever): run_manager: CallbackManagerForRetrieverRun, where_filter: Optional[Dict[str, object]] = None, score: bool = False, + hybrid_search_kwargs: Optional[Dict[str, object]] = None, ) -> List[Document]: - """Look up similar documents in Weaviate.""" + """Look up similar documents in Weaviate. + + query: The query to search for relevant documents + of using weviate hybrid search. + + where_filter: A filter to apply to the query. + https://weaviate.io/developers/weaviate/guides/querying/#filtering + + score: Whether to include the score, and score explanation + in the returned Documents meta_data. + + hybrid_search_kwargs: Used to pass additional arguments + to the .with_hybrid() method. + The primary uses cases for this are: + 1) Search specific properties only - + specify which properties to be used during hybrid search portion. + Note: this is not the same as the (self.attributes) to be returned. + Example - hybrid_search_kwargs={"properties": ["question", "answer"]} + https://weaviate.io/developers/weaviate/search/hybrid#selected-properties-only + + 2) Weight boosted searched properties - + Boost the weight of certain properties during the hybrid search portion. + Example - hybrid_search_kwargs={"properties": ["question^2", "answer"]} + https://weaviate.io/developers/weaviate/search/hybrid#weight-boost-searched-properties + + 3) Search with a custom vector - Define a different vector + to be used during the hybrid search portion. + Example - hybrid_search_kwargs={"vector": [0.1, 0.2, 0.3, ...]} + https://weaviate.io/developers/weaviate/search/hybrid#with-a-custom-vector + + 4) Use Fusion ranking method + Example - from weaviate.gql.get import HybridFusion + hybrid_search_kwargs={"fusion": fusion_type=HybridFusion.RELATIVE_SCORE} + https://weaviate.io/developers/weaviate/search/hybrid#fusion-ranking-method + """ query_obj = self.client.query.get(self.index_name, self.attributes) if where_filter: query_obj = query_obj.with_where(where_filter) @@ -108,7 +143,14 @@ class WeaviateHybridSearchRetriever(BaseRetriever): if score: query_obj = query_obj.with_additional(["score", "explainScore"]) - result = query_obj.with_hybrid(query, alpha=self.alpha).with_limit(self.k).do() + if hybrid_search_kwargs is None: + hybrid_search_kwargs = {} + + result = ( + query_obj.with_hybrid(query, alpha=self.alpha, **hybrid_search_kwargs) + .with_limit(self.k) + .do() + ) if "errors" in result: raise ValueError(f"Error during query: {result['errors']}")