Use JSON rather than JSON5 (#2520)

Evaluation so far has shown that agents do a reasonable job of emitting
`json` blocks as arguments when cued (instead of typescript), and `json`
permits the `strict=False` flag to permit control characters, which are
likely to appear in the response in particular.

This PR makes this change to the request and response synthesizer
chains, and fixes the temperature to the OpenAI agent in the eval
notebook. It also adds a `raise_error = False` flag in the notebook to
facilitate debugging
doc
William FH 1 year ago committed by GitHub
parent f8e4048cd8
commit 629fda3957
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -7,7 +7,7 @@
"source": [
"# Evaluating an OpenAPI Chain\n",
"\n",
"This notebook goes over ways to semantically evaluate an OpenAPI Chain, which calls an endpoint defined by the OpenAPI specification using purely natural language."
"This notebook goes over ways to semantically evaluate an [OpenAPI Chain](openapi.ipynb), which calls an endpoint defined by the OpenAPI specification using purely natural language."
]
},
{
@ -54,7 +54,7 @@
"operation = APIOperation.from_openapi_spec(spec, '/public/openai/v0/products', \"get\")\n",
"verbose = False\n",
"# Select any LangChain LLM\n",
"llm = OpenAI()\n",
"llm = OpenAI(temperature=0, max_tokens=1000)\n",
"# Create the endpoint chain\n",
"api_chain = OpenAPIEndpointChain.from_api_operation(\n",
" operation, \n",
@ -166,6 +166,7 @@
"outputs": [],
"source": [
"## Run the the API chain itself\n",
"raise_error = False # Stop on first failed example - useful for development\n",
"chain_outputs = []\n",
"failed_examples = []\n",
"for question in questions:\n",
@ -173,6 +174,8 @@
" chain_outputs.append(api_chain(question))\n",
" scores[\"completed\"].append(1.0)\n",
" except Exception as e:\n",
" if raise_error:\n",
" raise e\n",
" failed_examples.append({'q': question, 'error': e})\n",
" scores[\"completed\"].append(0.0)"
]
@ -208,16 +211,16 @@
{
"data": {
"text/plain": [
"['Currently, Apple iPhone 14 Pro Max 256GB, Apple iPhone 12 128GB, Apple iPhone 13 128GB, Apple iPhone 14 Pro 128GB, Apple iPhone 14 Pro 256GB, Apple iPhone 14 Pro Max 128GB, Apple iPhone 13 Pro Max 128GB, Apple iPhone 14 128GB, Apple iPhone 12 Pro 512GB, and Apple iPhone 12 mini 64GB models are available.',\n",
" 'Yes, there are budget laptops available from the API response. For example, the HP 14-dq0055dx is priced at $199.99 and the HP 15-dw0083wm is priced at $244.99.',\n",
" 'The Alarco Gaming PC (X_BLACK_GTX750) is the cheapest gaming PC listed at $499.99. You can find more information at this link: https://www.klarna.com/us/shopping/pl/cl223/3203154750/Desktop-Computers/Alarco-Gaming-PC-%28X_BLACK_GTX750%29/?utm_source=openai&ref-site=openai_plugin',\n",
" 'Yes, there are several tablets under $400. You can find Apple iPad 10.2\" 32GB (2019) at $249.99, Samsung Galaxy Tab A8 10.5 SM-X200 32GB at $178.90, Samsung Galaxy Tab A7 Lite 8.7 SM-T220 32GB at $119.99, and Amazon Fire HD 8\" 32GB (10th Generation) at $44.99.',\n",
" 'It depends on your needs and budget. The API response contains a list of headphones from Apple, Bose and Beats that range in price from $69.99 to $409.00. The headphones come with a variety of features, such as active noise cancelling and wireless connections, so you can find the perfect pair for your needs.',\n",
" 'The top rated laptops from the API_RESPONSE are the Apple MacBook Air (2020) M1 OC 7C GPU 8GB 256GB SSD 13\", Apple MacBook Air (2022) M2 OC 8C GPU 8GB 256GB SSD 13.6\", Apple MacBook Pro (2020) M1 OC 8C GPU 8GB 256GB SSD 13\", and Apple MacBook Pro (2022) M2 OC 10C GPU 8GB 256GB SSD 13.3\".',\n",
" \"It looks like you're interested in buying shoes from Nike and Adidas. From the API response, I can see that there are several Nike and UGG shoes available. You can click on the provided links to learn more and purchase the shoes if you're interested.\",\n",
" 'I have found several skirts for you to choose from in the API response. Please click on the links to view more information about the product and available sizes.',\n",
" 'Based on the API response, you should consider getting the CyberPowerPC Gamer Master Gaming Desktop as it features a powerful AMD Ryzen 5 processor and 16GB of DDR4 RAM, along with an Nvidia GeForce RTX 3060 graphics card and a 500GB SSD. It is also released in 2022 and comes with Windows 10 Home.',\n",
" 'Based on the API response, the best budget cameras available are the DJI Mini 2 Dog Camera ($448.50), Insta360 Sphere with Landing Pad ($429.99), DJI FPV Gimbal Camera ($121.06), Parrot Camera & Body ($36.19), and the DJI FPV Air Unit ($179.00).']"
"['There are currently 10 Apple iPhone models available: Apple iPhone 14 Pro Max 256GB, Apple iPhone 12 128GB, Apple iPhone 13 128GB, Apple iPhone 14 Pro 128GB, Apple iPhone 14 Pro 256GB, Apple iPhone 14 Pro Max 128GB, Apple iPhone 13 Pro Max 128GB, Apple iPhone 14 128GB, Apple iPhone 12 Pro 512GB, and Apple iPhone 12 mini 64GB.',\n",
" 'Yes, there are several budget laptops in the API_RESPONSE. The HP 14-dq0055dx and HP 14-dq0054dx are both priced at $199.99, and the HP 15-dw0083wm is priced at $244.99.',\n",
" 'The cheapest gaming PC available is the Alarco Gaming PC (X_BLACK_GTX750) for $499.99. You can find more information about it here: https://www.klarna.com/us/shopping/pl/cl223/3203154750/Desktop-Computers/Alarco-Gaming-PC-%28X_BLACK_GTX750%29/?utm_source=openai&ref-site=openai_plugin',\n",
" 'Yes, there are several tablets under $400. These include the Apple iPad 10.2\" 32GB (2019), Amazon Fire HD 8\" 32GB (10th Generation), Amazon Fire HD 8 Kids Tablet 8\" - 32GB, and Samsung Galaxy Tab A8 10.5 SM-X200 32GB.',\n",
" 'It looks like there are several great options available. Based on the API response, some of the best headphones include Apple AirPods Pro (2nd generation) 2022, Apple AirPods Max, and Bose Noise Cancelling Headphones 700.',\n",
" 'The top rated laptops from the API_RESPONSE are the Apple MacBook Air (2020) M1 OC 7C GPU 8GB 256GB SSD 13\", Apple MacBook Air (2022) M2 OC 8C GPU 8GB 256GB SSD 13.6\", Apple MacBook Pro (2022) M2 OC 10C GPU 8GB 256GB SSD 13.3\", and Apple MacBook Pro (2020) M1 OC 8C GPU 8GB 256GB SSD 13\".',\n",
" \"I found several Nike and Adidas shoes in the API response. Here are the links to the products: Nike Dunk Low M - Black/White: https://www.klarna.com/us/shopping/pl/cl337/3200177969/Shoes/Nike-Dunk-Low-M-Black-White/?utm_source=openai&ref-site=openai_plugin, Nike Air Jordan 4 Retro M - Midnight Navy: https://www.klarna.com/us/shopping/pl/cl337/3202929835/Shoes/Nike-Air-Jordan-4-Retro-M-Midnight-Navy/?utm_source=openai&ref-site=openai_plugin, Nike Air Force 1 '07 M - White: https://www.klarna.com/us/shopping/pl/cl337/3979297/Shoes/Nike-Air-Force-1-07-M-White/?utm_source=openai&ref-site=openai_plugin, Nike Dunk Low W - White/Black: https://www.klarna.com/us/shopping/pl/cl337/3200134705/Shoes/Nike-Dunk-Low-W-White-Black/?utm_source=openai&ref-site=openai_plugin, Nike Air Jordan 1 Retro High M - White/University Blue/Black: https://www.klarna.com/us/shopping/pl/cl337/3200383658/Shoes/Nike-Air-Jordan-1-Retro-High-M-White-University-Blue-Black/?utm_source=openai&ref-site=openai_plugin, Nike Air Jordan 1 Retro High OG M - True Blue/Cement Grey/White: https://www.klarna.com/us/shopping/pl/cl337/3204655673/Shoes/Nike-Air-Jordan-1-Retro-High-OG-M-True-Blue-Cement-Grey-White/?utm_source=openai&ref-site=openai_plugin, Nike Air Jordan 11 Retro Cherry - White/Varsity Red/Black: https://www.klarna.com/us/shopping/pl/cl337/3202929696/Shoes/Nike-Air-Jordan-11-Retro-Cherry-White-Varsity-Red-Black/?utm_source=openai&ref-site=openai_plugin, Nike Dunk High W - White/Black: https://www.klarna.com/us/shopping/pl/cl337/3201956448/Shoes/Nike-Dunk-High-W-White-Black/?utm_source=openai&ref-site=openai_plugin, Nike Air Jordan 5 Retro M - Black/Taxi/Aquatone: https://www.klarna.com/us/shopping/pl/cl337/3204923084/Shoes/Nike-Air-Jordan-5-Retro-M-Black-Taxi-Aquatone/?utm_source=openai&ref-site=openai_plugin, Nike Court Legacy Lift W: https://www.klarna.com/us/shopping/pl/cl337/3202103728/Shoes/Nike-Court-Legacy-Lift-W/?utm_source=openai&ref-site=openai_plugin\",\n",
" \"I found several skirts that may interest you. Please take a look at the following products: Avenue Plus Size Denim Stretch Skirt, LoveShackFancy Ruffled Mini Skirt - Antique White, Nike Dri-Fit Club Golf Skirt - Active Pink, Skims Soft Lounge Ruched Long Skirt, French Toast Girl's Front Pleated Skirt with Tabs, Alexia Admor Women's Harmonie Mini Skirt Pink Pink, Vero Moda Long Skirt, Nike Court Dri-FIT Victory Flouncy Tennis Skirt Women - White/Black, Haoyuan Mini Pleated Skirts W, and Zimmermann Lyre Midi Skirt.\",\n",
" 'Based on the API response, you may want to consider the Skytech Archangel Gaming Computer PC Desktop, the CyberPowerPC Gamer Master Gaming Desktop, or the ASUS ROG Strix G10DK-RS756, all of which offer powerful processors and plenty of RAM.',\n",
" 'Based on the API response, the best budget cameras are the DJI Mini 2 Dog Camera ($448.50), Insta360 Sphere with Landing Pad ($429.99), DJI FPV Gimbal Camera ($121.06), Parrot Camera & Body ($36.19), and DJI FPV Air Unit ($179.00).']"
]
},
"execution_count": 9,
@ -318,15 +321,15 @@
{
"data": {
"text/plain": [
"[' \\nThe original query is asking for all iPhone models, so the \"size\" parameter is not necessary. The same goes for the min_price and max_price parameters, since these are not relevant to the question being asked. \\nTherefore, the predicted query is not semantically the same as the original query and is not likely to produce the same answer. \\nFinal Grade: F',\n",
" ' The query has the same keyword of \"laptops,\" which is semantically the same as the original query, so that\\'s a plus. However, the maximum price is 500, which is more than double the original maximum price of 300. This means that the predicted query is likely to yield more results than the original query, which may or may not include budget laptops. For this reason, I would give the predicted query a Grade B. Final Grade: B',\n",
" \" The first part of the query - 'gaming PC' - is semantically the same as the original query, so that's an A. However, the next parts of the query - the size, min_price, and max_price - are not related to the original query and do not provide any useful information, so these get a D. Final Grade: D\",\n",
" ' The first part of the query, \"q\": \"tablet\", is asking the API to look for tablets. The second part, \"size\": 10, is asking the API to return 10 results. The third part, \"min_price\": 0, is asking the API to return results with a minimum price of 0, and the fourth part, \"max_price\": 400, is asking the API to return results with a maximum price of 400. All of these parts of the query are relevant to the original question, so this query is semantically the same as the original. \\n\\nFinal Grade: A',\n",
" ' The original query is asking for laptops and the max price is 400. The predicted query is asking for headphones and the min and max prices are 0 and 1000 respectively. The predicted query is off topic and could give a range of results that are not necessarily the best headphones. Therefore, the predicted query is not semantically the same and is not likely to produce the same answer as the original query. Final Grade: F',\n",
" ' The original query was simply \"shoe,\" which is very broad and unlikely to give accurate results. The predicted query is looking for laptops, specifying a size of 5, and a price range of $0-$9999. While this query is a bit more specific than the original query, it still fails to provide any information about the laptops. It is missing a rating system or any other criteria to determine which laptops are the \"top rated.\" Therefore, I would give the predicted query a grade of D as it is more specific than the original, but still fails to provide all the necessary information to accurately answer the question. Final Grade: D',\n",
" ' The original question asked for shoes and specified two brands, Adidas and Nike. The predicted query is asking for shoes, but it is also asking for a size and a minimum/maximum price range. While these parameters might help narrow down the results returned, they do not answer the original question. Therefore, the predicted query is not semantically the same as the original query. Final Grade: F',\n",
" ' The original query was for a professional desktop PC with a maximum price of $10,000. The predicted query is for a skirt with a size of 10 and a minimum and maximum price of $0 and $500, respectively. The two queries are not semantically the same as they are asking for different items with different specifications. Final Grade: F',\n",
" ' The original query was asking for a \"camera\" with a maximum price of $300. The predicted query is asking for a \"deskopt pc\" with a minimum price of $0 and no maximum price, as well as a size of 10. The two queries are not semantically the same, as the original query is asking for specific information about a camera, whereas the predicted query is asking for the same type of product but with different parameters. As such, I would give this query a grade of D. Final Grade: D']"
"[' The original query is asking for all iPhone models, so the \"q\" parameter is correct. The \"size\" parameter is not necessary, as it is not relevant to the question. The \"min_price\" and \"max_price\" parameters are also not necessary, as the question does not ask for any pricing information. Therefore, this predicted query is not semantically the same as the original query and does not provide the same answer. Final Grade: F',\n",
" ' The original query is asking for laptops with a maximum price of $300. The predicted query is asking for laptops with a minimum price of $0 and a maximum price of $500. This means that the predicted query is likely to return more results than the original query, as it is asking for a wider range of prices. However, the predicted query is still likely to return some budget laptops, as it is still asking for laptops with a maximum price of $500. Therefore, the predicted query is semantically similar to the original query, but it is likely to return more results. Final Grade: B',\n",
" ' The query is asking for the cheapest gaming PC, so the first two parameters are correct. The third parameter, \"size\", is not necessary for this query, so it should be removed. The fourth parameter, \"min_price\", is also not necessary since the query is asking for the cheapest gaming PC. The fifth parameter, \"max_price\", should be set to null since the query is asking for the cheapest gaming PC and not a specific price range. Final Grade: B',\n",
" ' The original query is asking for any tablets under $400. The predicted query is asking for a tablet, with a size of 10, with a minimum price of 0 and a maximum price of 400. This query is semantically the same as the original query, as it is asking for a tablet with a price range of 0 to 400. Therefore, the predicted query is likely to produce the same answer as the original query. Final Grade: A',\n",
" ' The original query is looking for a laptop with a maximum price of 400. The predicted query is looking for headphones with a minimum price of 0 and a maximum price of 500. The two queries are not semantically the same because they are looking for different items (laptops vs. headphones) and different price ranges. Final Grade: F',\n",
" ' The original query is asking for the top rated laptops, so the first part of the predicted query is correct in that it is asking for laptops. However, the predicted query also includes parameters for size, min_price, and max_price, which are not relevant to the original question. Therefore, the predicted query is not semantically the same as the original query and is not likely to produce the same answer. Final Grade: F',\n",
" ' The original query is asking for shoes, so the predicted query is on the right track. However, the predicted query is adding additional parameters that are not relevant to the original question. The size, min_price, and max_price parameters are not necessary to answer the original question, so they are not semantically the same. Final Grade: C',\n",
" ' The original query is looking for a professional desktop PC with a maximum price of $10,000. The predicted query is looking for a skirt with a size of 10 and a price range of $0 to $500. The two queries are not semantically the same, as they are looking for two different items. The predicted query is not likely to produce the same answer as the original query. Final Grade: F',\n",
" \" The original query is asking for a professional Desktop PC with no price limit. The predicted query is asking for a Desktop PC with a size of 10 and a minimum price of 0, with no maximum price limit. The predicted query is missing the 'professional' keyword, which could lead to results that are not what the original query was asking for. Therefore, the predicted query does not semantically match the original query and should not be used. Final Grade: F\"]"
]
},
"execution_count": 13,
@ -434,25 +437,25 @@
{
"data": {
"text/plain": [
"[' \\nThe original query is asking for all iPhone models, so the \"size\" parameter is not necessary. The same goes for the min_price and max_price parameters, since these are not relevant to the question being asked. \\nTherefore, the predicted query is not semantically the same as the original query and is not likely to produce the same answer. \\nFinal Grade: F',\n",
" ' The query has the same keyword of \"laptops,\" which is semantically the same as the original query, so that\\'s a plus. However, the maximum price is 500, which is more than double the original maximum price of 300. This means that the predicted query is likely to yield more results than the original query, which may or may not include budget laptops. For this reason, I would give the predicted query a Grade B. Final Grade: B',\n",
" \" The first part of the query - 'gaming PC' - is semantically the same as the original query, so that's an A. However, the next parts of the query - the size, min_price, and max_price - are not related to the original query and do not provide any useful information, so these get a D. Final Grade: D\",\n",
" ' The first part of the query, \"q\": \"tablet\", is asking the API to look for tablets. The second part, \"size\": 10, is asking the API to return 10 results. The third part, \"min_price\": 0, is asking the API to return results with a minimum price of 0, and the fourth part, \"max_price\": 400, is asking the API to return results with a maximum price of 400. All of these parts of the query are relevant to the original question, so this query is semantically the same as the original. \\n\\nFinal Grade: A',\n",
" ' The original query is asking for laptops and the max price is 400. The predicted query is asking for headphones and the min and max prices are 0 and 1000 respectively. The predicted query is off topic and could give a range of results that are not necessarily the best headphones. Therefore, the predicted query is not semantically the same and is not likely to produce the same answer as the original query. Final Grade: F',\n",
" ' The original query was simply \"shoe,\" which is very broad and unlikely to give accurate results. The predicted query is looking for laptops, specifying a size of 5, and a price range of $0-$9999. While this query is a bit more specific than the original query, it still fails to provide any information about the laptops. It is missing a rating system or any other criteria to determine which laptops are the \"top rated.\" Therefore, I would give the predicted query a grade of D as it is more specific than the original, but still fails to provide all the necessary information to accurately answer the question. Final Grade: D',\n",
" ' The original question asked for shoes and specified two brands, Adidas and Nike. The predicted query is asking for shoes, but it is also asking for a size and a minimum/maximum price range. While these parameters might help narrow down the results returned, they do not answer the original question. Therefore, the predicted query is not semantically the same as the original query. Final Grade: F',\n",
" ' The original query was for a professional desktop PC with a maximum price of $10,000. The predicted query is for a skirt with a size of 10 and a minimum and maximum price of $0 and $500, respectively. The two queries are not semantically the same as they are asking for different items with different specifications. Final Grade: F',\n",
" ' The original query was asking for a \"camera\" with a maximum price of $300. The predicted query is asking for a \"deskopt pc\" with a minimum price of $0 and no maximum price, as well as a size of 10. The two queries are not semantically the same, as the original query is asking for specific information about a camera, whereas the predicted query is asking for the same type of product but with different parameters. As such, I would give this query a grade of D. Final Grade: D',\n",
" ' The user asked for an answer to the question \"What iPhone models are available?\" The API provided a response containing a list of 10 iPhone models. The response accurately provides the user with the models that are currently available. Therefore, the answer provided is complete, accurate, and useful.\\n\\nFinal Grade: A',\n",
" \" Does the response answer the user's question? Yes, the response provides a list of budget laptops and their prices. Is the information provided accurate? Yes, the information provided is accurate and matches the API response. Is the response useful? Yes, the response provides the user with a list of budget laptops and their prices.\\nFinal Grade: A\",\n",
" ' The question was \"Show me the cheapest gaming PC.\" The API returned a result of the Alarco Gaming PC (X_BLACK_GTX750) at a price of $499.99. The response to the user accurately answered their question by citing the name of the cheapest gaming PC and its price. The response also included a link to the product page for more information. Therefore, the response meets the criteria of an A grade. Final Grade: A',\n",
" ' First, the answer to the user\\'s question is \"Yes,\" which is accurate. Second, the API results provided are relevant to the user\\'s question, as all products are tablets and all are under $400. Finally, the response provides four specific examples of tablets under $400, which provides helpful detail to the user. Overall, this response is accurate and provides useful information to the user. Final Grade: A',\n",
" \" First, the response accurately answers the user's question by providing a list of headphones and their features. Second, it also provides a range of prices and explains that it depends on the user's needs and budget. The response is comprehensive and provides enough information to the user to make an informed decision.\\n\\nFinal Grade: A\",\n",
" ' The user asked for the top rated laptops, and the API response gave us a list of 5 laptops. The response provided all the relevant information about each laptop, such as the specs and price. This response is accurate and provides the necessary information to help the user make an informed decision. Final Grade: A',\n",
" ' The user asked for shoes from Adidas and Nike, and the API response provided links to shoes from both brands, so the accuracy of the response is good. The response also provides helpful information such as product name, price, and attributes, which makes the response useful. Therefore, I would give this response a grade of A.\\n\\nFinal Grade: A',\n",
" ' The API response was relevant and provided detailed information about the products. The response is accurate and provides useful information. The response also provides links to the products, which is very helpful for the user. All these factors give this response an A. Final Grade: A',\n",
" \" The user asked for a professional Desktop PC with no budget restrictions. The API response provided several options from different brands, with varying features. The CyberPowerPC Gamer Master Gaming Desktop in particular offers impressive specs, such as an AMD Ryzen 5 processor, 16GB of DDR4 RAM, an Nvidia GeForce RTX 3060 graphics card and a 500GB SSD, as well as being released in 2022 and coming with Windows 10 Home. Therefore, this option should meet the user's requirements and provides the best value for money.\\n\\nFinal Grade: A\",\n",
" ' First, the API returned the information asked for. The response contains the names, URLs, and prices of five budget cameras. Then, the response was properly interpreted by the user, and all the information in the response was relayed accurately. As a result, this response fully answers the original question of what are the best budget cameras, so the accuracy of the response is high. In addition, the response is useful to the user, as it provides an accurate list of the best budget cameras in one place. Final Grade: A.']"
"[' The original query is asking for all iPhone models, so the \"q\" parameter is correct. The \"size\" parameter is not necessary, as it is not relevant to the question. The \"min_price\" and \"max_price\" parameters are also not necessary, as the question does not ask for any pricing information. Therefore, this predicted query is not semantically the same as the original query and does not provide the same answer. Final Grade: F',\n",
" ' The original query is asking for laptops with a maximum price of $300. The predicted query is asking for laptops with a minimum price of $0 and a maximum price of $500. This means that the predicted query is likely to return more results than the original query, as it is asking for a wider range of prices. However, the predicted query is still likely to return some budget laptops, as it is still asking for laptops with a maximum price of $500. Therefore, the predicted query is semantically similar to the original query, but it is likely to return more results. Final Grade: B',\n",
" ' The query is asking for the cheapest gaming PC, so the first two parameters are correct. The third parameter, \"size\", is not necessary for this query, so it should be removed. The fourth parameter, \"min_price\", is also not necessary since the query is asking for the cheapest gaming PC. The fifth parameter, \"max_price\", should be set to null since the query is asking for the cheapest gaming PC and not a specific price range. Final Grade: B',\n",
" ' The original query is asking for any tablets under $400. The predicted query is asking for a tablet, with a size of 10, with a minimum price of 0 and a maximum price of 400. This query is semantically the same as the original query, as it is asking for a tablet with a price range of 0 to 400. Therefore, the predicted query is likely to produce the same answer as the original query. Final Grade: A',\n",
" ' The original query is looking for a laptop with a maximum price of 400. The predicted query is looking for headphones with a minimum price of 0 and a maximum price of 500. The two queries are not semantically the same because they are looking for different items (laptops vs. headphones) and different price ranges. Final Grade: F',\n",
" ' The original query is asking for the top rated laptops, so the first part of the predicted query is correct in that it is asking for laptops. However, the predicted query also includes parameters for size, min_price, and max_price, which are not relevant to the original question. Therefore, the predicted query is not semantically the same as the original query and is not likely to produce the same answer. Final Grade: F',\n",
" ' The original query is asking for shoes, so the predicted query is on the right track. However, the predicted query is adding additional parameters that are not relevant to the original question. The size, min_price, and max_price parameters are not necessary to answer the original question, so they are not semantically the same. Final Grade: C',\n",
" ' The original query is looking for a professional desktop PC with a maximum price of $10,000. The predicted query is looking for a skirt with a size of 10 and a price range of $0 to $500. The two queries are not semantically the same, as they are looking for two different items. The predicted query is not likely to produce the same answer as the original query. Final Grade: F',\n",
" \" The original query is asking for a professional Desktop PC with no price limit. The predicted query is asking for a Desktop PC with a size of 10 and a minimum price of 0, with no maximum price limit. The predicted query is missing the 'professional' keyword, which could lead to results that are not what the original query was asking for. Therefore, the predicted query does not semantically match the original query and should not be used. Final Grade: F\",\n",
" ' The user asked a question about what iPhone models are available, and the API returned a response with 10 different models. The response provided by the user accurately listed all 10 models, so the accuracy of the response is A+. The utility of the response is also A+ since the user was able to get the exact information they were looking for. Final Grade: A+',\n",
" ' The API response provided a list of laptops with their prices and attributes. The response was accurate in that it provided the user with a list of budget laptops that are available. The response was also useful in that it provided the user with the information they needed to make an informed decision. Final Grade: A',\n",
" \" The API response provided the name, price, and URL of the product, which is exactly what the user asked for. The response also provided additional information about the product's attributes, which is useful for the user to make an informed decision. Therefore, the response is accurate and useful. Final Grade: A\",\n",
" \" The API response provided a list of tablets that are under $400. The response accurately answered the user's question. The response also provided useful information such as the product name, price, and attributes. Therefore, the response was accurate and useful. Final Grade: A\",\n",
" ' The API response provided a list of headphones with their respective features and prices. The user asked for the best headphones, so the response should include the best options available. The response provided a list of headphones that include features such as noise cancelling and type of headphone. The response also included the prices of the headphones, which is important for the user to know. Therefore, the response was accurate and useful in providing the user with the best options available. Final Grade: A',\n",
" ' The API response provided a list of laptops with their attributes, which is exactly what the user asked for. The response provided a concise list of the top rated laptops, which is what the user was looking for. The response was accurate and useful, so I would give it an A. Final Grade: A',\n",
" ' The API response provided a list of shoes from both Adidas and Nike, which is exactly what the user asked for. The response also included the product name, price, and attributes for each shoe, which is useful information for the user to make an informed decision. The response also included links to the products, which is helpful for the user to purchase the shoes. Overall, the response was accurate and useful, so I would give it an A. Final Grade: A',\n",
" \" The API response provided a list of skirts that could potentially meet the user's needs. The response also included the name, price, and attributes of each skirt. This is a great start, as it provides the user with a variety of options to choose from. However, the response does not provide any images of the skirts, which would have been helpful for the user to make a decision. Additionally, the response does not provide any information about the availability of the skirts, which could be important for the user. \\n\\nFinal Grade: B\",\n",
" ' The user asked for a professional desktop PC with no budget constraints. The API response provided a list of products that fit the criteria, including the Skytech Archangel Gaming Computer PC Desktop, the CyberPowerPC Gamer Master Gaming Desktop, and the ASUS ROG Strix G10DK-RS756. The response accurately suggested these three products as potential options for the user, and provided useful information about their features and prices. Final Grade: A',\n",
" ' The API response provided a list of cameras with their prices, which is exactly what the user asked for. The response was accurate and provided the user with the information they needed to make an informed decision. The response was also useful, as it provided the user with a list of cameras that fit their budget. Final Grade: A']"
]
},
"execution_count": 17,
@ -493,8 +496,8 @@
"text": [
"Metric \tMin \tMean \tMax \n",
"completed \t1.00 \t1.00 \t1.00 \n",
"request_synthesizer \t0.00 \t0.28 \t1.00 \n",
"result_synthesizer \t0.00 \t0.66 \t1.00 \n"
"request_synthesizer \t0.00 \t0.33 \t1.00 \n",
"result_synthesizer \t0.00 \t0.67 \t1.00 \n"
]
}
],

@ -2,9 +2,6 @@
import json
import re
from typing import Dict
from pydantic import root_validator
from langchain.chains.api.openapi.prompts import REQUEST_TEMPLATE
from langchain.chains.llm import LLMChain
@ -16,30 +13,18 @@ from langchain.schema import BaseOutputParser
class APIRequesterOutputParser(BaseOutputParser):
"""Parse the request and error tags."""
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that json5 package exists."""
def _load_json_block(self, serialized_block: str) -> str:
try:
import json5 # noqa: F401
except ImportError:
raise ValueError(
"Could not import json5 python package. "
"Please it install it with `pip install json5`."
)
return values
return json.dumps(json.loads(serialized_block, strict=False))
except json.JSONDecodeError:
return "ERROR serializing request."
def parse(self, llm_output: str) -> str:
"""Parse the request and error tags."""
import json5
json_match = re.search(r"```json(.*?)```", llm_output, re.DOTALL)
if json_match:
typescript_block = json_match.group(1).strip()
try:
return json.dumps(json5.loads(typescript_block))
except json.JSONDecodeError:
return "ERROR serializing request"
return self._load_json_block(json_match.group(1).strip())
message_match = re.search(r"```text(.*?)```", llm_output, re.DOTALL)
if message_match:
return f"MESSAGE: {message_match.group(1).strip()}"

@ -2,9 +2,6 @@
import json
import re
from typing import Dict
from pydantic import root_validator
from langchain.chains.api.openapi.prompts import RESPONSE_TEMPLATE
from langchain.chains.llm import LLMChain
@ -16,34 +13,22 @@ from langchain.schema import BaseOutputParser
class APIResponderOutputParser(BaseOutputParser):
"""Parse the response and error tags."""
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that json5 package exists."""
def _load_json_block(self, serialized_block: str) -> str:
try:
import json5 # noqa: F401
except ImportError:
raise ValueError(
"Could not import json5 python package. "
"Please it install it with `pip install json5`."
)
return values
response_content = json.loads(serialized_block, strict=False)
return response_content.get("response", "ERROR parsing response.")
except json.JSONDecodeError:
return "ERROR parsing response."
except:
raise
def parse(self, llm_output: str) -> str:
"""Parse the response and error tags."""
import json5
json_match = re.search(r"```json(.*?)```", llm_output, re.DOTALL)
if json_match:
try:
response_content = json5.loads(json_match.group(1).strip())
return response_content.get("response", "ERROR parsing response.")
except json.JSONDecodeError:
return "ERROR parsing response."
except:
raise
return self._load_json_block(json_match.group(1).strip())
else:
raise ValueError("No response found in output.")
raise ValueError(f"No response found in output: {llm_output}.")
class APIResponderChain(LLMChain):

Loading…
Cancel
Save