Tweaks to the PowerBI toolkit and utility (#4442)

Fixes some bugs I found while testing with more advanced datasets and queries. Includes using the output of PowerBI to parse the error and give that back to the LLM.
1 year ago · 47657fe01a
parent e363e709cb
commit 47657fe01a
4 changed files with 63 additions and 35 deletions
--- a/langchain/agents/agent_toolkits/powerbi/prompt.py
+++ b/langchain/agents/agent_toolkits/powerbi/prompt.py
@ -3,14 +3,12 @@


 POWERBI_PREFIX = """You are an agent designed to interact with a Power BI Dataset.
-Given an input question, create a syntactically correct DAX query to run, then look at the results of the query and return the answer.
-Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
-You can order the results by a relevant column to return the most interesting examples in the database.
-Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.

-You have access to tools for interacting with the Power BI Dataset. Only use the below tools. Only use the information returned by the below tools to construct your final answer. Usually I should first ask which tables I have, then how each table is defined and then ask the question to query tool to create a query for me and then I should ask the query tool to execute it, finally create a nice sentence that answers the question. If you receive an error back that mentions that the query was wrong try to phrase the question differently and get a new query from the question to query tool.
+Assistant has access to tools that can give context, write queries and execute those queries against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "I don't know" as the answer. The query language that PowerBI uses is called DAX and it is quite particular and complex, so make sure to use the right tools to get the answers the user is looking for.

-If the question does not seem related to the dataset, just return "I don't know" as the answer.
+Given an input question, create a syntactically correct DAX query to run, then look at the results and return the answer. Sometimes the result indicate something is wrong with the query, or there were errors in the json serialization. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
+
+Assistant never just starts querying, assistant should first find out which tables there are, then how each table is defined and then ask the question to query tool to create a query and then ask the query tool to execute it, finally create a complete sentence that answers the question, if multiple rows need are asked find a way to write that in a easily readible format for a human. Assistant has tools that can get more context of the tables which helps it write correct queries.
 """

 POWERBI_SUFFIX = """Begin!
@ -19,17 +17,13 @@ Question: {input}
 Thought: I should first ask which tables I have, then how each table is defined and then ask the question to query tool to create a query for me and then I should ask the query tool to execute it, finally create a nice sentence that answers the question.
 {agent_scratchpad}"""

-POWERBI_CHAT_PREFIX = """Assistant is a large language model trained by OpenAI built to help users interact with a PowerBI Dataset.
-
-Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.
-
-Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics. 
+POWERBI_CHAT_PREFIX = """Assistant is a large language model built to help users interact with a PowerBI Dataset.

-Given an input question, create a syntactically correct DAX query to run, then look at the results of the query and return the answer. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
+Assistant has access to tools that can give context, write queries and execute those queries against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "I don't know" as the answer. The query language that PowerBI uses is called DAX and it is quite particular and complex, so make sure to use the right tools to get the answers the user is looking for.

-Overall, Assistant is a powerful system that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.
+Given an input question, create a syntactically correct DAX query to run, then look at the results and return the answer. Sometimes the result indicate something is wrong with the query, or there were errors in the json serialization. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.

-Usually I should first ask which tables I have, then how each table is defined and then ask the question to query tool to create a query for me and then I should ask the query tool to execute it, finally create a complete sentence that answers the question. If you receive an error back that mentions that the query was wrong try to phrase the question differently and get a new query from the question to query tool.
+Assistant never just starts querying, assistant should first find out which tables there are, then how each table is defined and then ask the question to query tool to create a query and then ask the query tool to execute it, finally create a complete sentence that answers the question, if multiple rows need are asked find a way to write that in a easily readible format for a human. Assistant has tools that can get more context of the tables which helps it write correct queries.
 """

 POWERBI_CHAT_SUFFIX = """TOOLS
--- a/langchain/tools/powerbi/prompt.py
+++ b/langchain/tools/powerbi/prompt.py
@ -31,6 +31,8 @@ DATEDIFF(date1, date2, <interval>) - Returns the difference between two date val
 DATEVALUE(<date_text>) - Returns a date value that represents the specified date.
 YEAR(<date>), QUARTER(<date>), MONTH(<date>), DAY(<date>), HOUR(<date>), MINUTE(<date>), SECOND(<date>) - Returns the part of the date for the specified date.

+Finally, make sure to escape double quotes with a single backslash, and make sure that only table names have single quotes around them, while names of measures or the values of columns that you want to compare against are in escaped double quotes. Newlines are not necessary and can be skipped. The queries are serialized as json and so will have to fit be compliant with json syntax.
+
 The following tables exist: {tables}

 and the schema's for some are given here:
@ -38,19 +40,20 @@ and the schema's for some are given here:

 Examples:
 {examples}
+
 Question: {tool_input}
 DAX: 
 """

 DEFAULT_FEWSHOT_EXAMPLES = """
 Question: How many rows are in the table <table>?
-DAX: EVALUATE ROW("Number of rows", COUNTROWS(<table>))
+DAX: EVALUATE ROW(\"Number of rows\", COUNTROWS(<table>))
 ----
 Question: How many rows are in the table <table> where <column> is not empty?
-DAX: EVALUATE ROW("Number of rows", COUNTROWS(FILTER(<table>, <table>[<column>] <> "")))
+DAX: EVALUATE ROW(\"Number of rows\", COUNTROWS(FILTER(<table>, <table>[<column>] <> \"\")))
 ----
 Question: What was the average of <column> in <table>?
-DAX: EVALUATE ROW("Average", AVERAGE(<table>[<column>]))
+DAX: EVALUATE ROW(\"Average\", AVERAGE(<table>[<column>]))
 ----
 """

--- a/langchain/tools/powerbi/tool.py
+++ b/langchain/tools/powerbi/tool.py
@ -73,6 +73,19 @@ class QueryPowerBITool(BaseTool):
            self.session_cache[tool_input] = json_to_md(
                self.session_cache[tool_input]["results"][0]["tables"][0]["rows"]
            )
+            return self.session_cache[tool_input]
+        if (
+            "error" in self.session_cache[tool_input]
+            and "pbi.error" in self.session_cache[tool_input]["error"]
+            and "details" in self.session_cache[tool_input]["error"]["pbi.error"]
+        ):
+            self.session_cache[
+                tool_input
+            ] = f'{BAD_REQUEST_RESPONSE} Error was {self.session_cache[tool_input]["error"]["pbi.error"]["details"][0]["detail"]}'  # noqa: E501
+            return self.session_cache[tool_input]
+        self.session_cache[
+            tool_input
+        ] = f'{BAD_REQUEST_RESPONSE} Error was {self.session_cache[tool_input]["error"]}'  # noqa: E501
        return self.session_cache[tool_input]

    async def _arun(
@ -99,6 +112,19 @@ class QueryPowerBITool(BaseTool):
            self.session_cache[tool_input] = json_to_md(
                self.session_cache[tool_input]["results"][0]["tables"][0]["rows"]
            )
+            return self.session_cache[tool_input]
+        if (
+            "error" in self.session_cache[tool_input]
+            and "pbi.error" in self.session_cache[tool_input]["error"]
+            and "details" in self.session_cache[tool_input]["error"]["pbi.error"]
+        ):
+            self.session_cache[
+                tool_input
+            ] = f'{BAD_REQUEST_RESPONSE} Error was {self.session_cache[tool_input]["error"]["pbi.error"]["details"][0]["detail"]}'  # noqa: E501
+            return self.session_cache[tool_input]
+        self.session_cache[
+            tool_input
+        ] = f'{BAD_REQUEST_RESPONSE} Error was {self.session_cache[tool_input]["error"]}'  # noqa: E501
        return self.session_cache[tool_input]


--- a/langchain/utilities/powerbi.py
+++ b/langchain/utilities/powerbi.py
@ -120,6 +120,10 @@ class PowerBIDataset(BaseModel):
        """Get the tables that still need to be queried."""
        todo = deepcopy(tables_todo)
        for table in todo:
+            if table not in self.table_names:
+                _LOGGER.warning("Table %s not found in dataset.", table)
+                todo.remove(table)
+                continue
            if table in self.schemas:
                todo.remove(table)
        return todo
@ -179,43 +183,44 @@ class PowerBIDataset(BaseModel):
            self.schemas[table] = json_to_md(result["results"][0]["tables"][0]["rows"])
        return self._get_schema_for_tables(tables_requested)

+    def _create_json_content(self, command: str) -> dict[str, Any]:
+        """Create the json content for the request."""
+        return {
+            "queries": [{"query": rf"{command}"}],
+            "impersonatedUserName": self.impersonated_user_name,
+            "serializerSettings": {"includeNulls": True},
+        }
+
    def run(self, command: str) -> Any:
        """Execute a DAX command and return a json representing the results."""
-
+        _LOGGER.debug("Running command: %s", command)
        result = requests.post(
            self.request_url,
-            json={
-                "queries": [{"query": command}],
-                "impersonatedUserName": self.impersonated_user_name,
-                "serializerSettings": {"includeNulls": True},
-            },
+            json=self._create_json_content(command),
            headers=self.headers,
            timeout=10,
        )
-        result.raise_for_status()
        return result.json()

    async def arun(self, command: str) -> Any:
        """Execute a DAX command and return the result asynchronously."""
-        json_content = (
-            {
-                "queries": [{"query": command}],
-                "impersonatedUserName": self.impersonated_user_name,
-                "serializerSettings": {"includeNulls": True},
-            },
-        )
+        _LOGGER.debug("Running command: %s", command)
        if self.aiosession:
            async with self.aiosession.post(
-                self.request_url, headers=self.headers, json=json_content, timeout=10
+                self.request_url,
+                headers=self.headers,
+                json=self._create_json_content(command),
+                timeout=10,
            ) as response:
-                response.raise_for_status()
                response_json = await response.json()
                return response_json
        async with aiohttp.ClientSession() as session:
            async with session.post(
-                self.request_url, headers=self.headers, json=json_content, timeout=10
+                self.request_url,
+                headers=self.headers,
+                json=self._create_json_content(command),
+                timeout=10,
            ) as response:
-                response.raise_for_status()
                response_json = await response.json()
                return response_json