"# os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.langchain.plus\" # Uncomment this line if you want to use the hosted version\n",
@ -142,60 +141,59 @@
"name": "stdout",
"output_type": "stream",
"text": [
"39,566,248\n",
"Anwar Hadid is Dua Lipa's boyfriend and his age raised to the 0.43 power is approximately 3.87.\n",
"LLMMathChain._evaluate(\"\n",
"(age ** 0.43)\n",
"\") raised error: 'age'. Please try again with a valid numerical expression\n",
"The distance between Paris and Boston is 3448 miles.\n",
"The total number of points scored in the 2023 super bowl raised to the .23 power is approximately 3.457460415669602.\n",
"LLMMathChain._evaluate(\"\n",
"(total number of points scored in the 2023 super bowl)**0.23\n",
"\") raised error: invalid syntax. Perhaps you forgot a comma? (<expr>, line 1). Please try again with a valid numerical expression\n"
"unknown format from LLM: Sorry, I cannot answer this question as it requires information that is not currently available.\n",
"unknown format from LLM: Sorry, as an AI language model, I do not have access to personal information such as someone's age. Please provide a different math problem.\n",
"unknown format from LLM: As an AI language model, I do not have information on future events such as the 2023 super bowl. Therefore, I cannot provide a solution to this question.\n",
"unknown format from LLM: This is not a math problem and cannot be translated into a mathematical expression.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 63c89b8bad9b172227d890620cdec651 in your message.).\n",
"Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID e3dd37877de500d7defe699f8411b3dd in your message.).\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"0\n",
"1.9347796717823205\n",
"1.2600907451828602 (inches)\n",
"LLMMathChain._evaluate(\"\n",
"round(0.2791714614499425, 2)\n",
"\") raised error: 'VariableNode' object is not callable. Please try again with a valid numerical expression\n"
]
"data": {
"text/plain": [
"['The population of Canada as of 2023 is estimated to be 39,566,248.',\n",
" \"Anwar Hadid's age raised to the 0.43 power is approximately 3.87.\",\n",
" ValueError(\"unknown format from LLM: Sorry, as an AI language model, I do not have access to personal information such as someone's age. Please provide a different math problem.\"),\n",
" 'The distance between Paris and Boston is 3448 miles.',\n",
" ValueError('unknown format from LLM: Sorry, I cannot answer this question as it requires information that is not currently available.'),\n",
" ValueError('unknown format from LLM: As an AI language model, I do not have information on future events such as the 2023 super bowl. Therefore, I cannot provide a solution to this question.'),\n",
" '15 points were scored more in the 2023 Super Bowl than in the 2022 Super Bowl.',\n",
" '1.9347796717823205',\n",
" ValueError('unknown format from LLM: This is not a math problem and cannot be translated into a mathematical expression.'),\n",
" '0.2791714614499425']"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import asyncio\n",
"\n",
"inputs = [\n",
"'How many people live in canada as of 2023?',\n",
" \"who is dua lipa's boyfriend? what is his age raised to the .43 power?\",\n",
" \"what is dua lipa's boyfriend age raised to the .43 power?\",\n",
" 'how far is it from paris to boston in miles',\n",
" 'what was the total number of points scored in the 2023 super bowl? what is that number raised to the .23 power?',\n",
" 'what was the total number of points scored in the 2023 super bowl raised to the .23 power?',\n",
" 'how many more points were scored in the 2023 super bowl than in the 2022 super bowl?',\n",
" 'what is 153 raised to .1312 power?',\n",
" \"who is kendall jenner's boyfriend? what is his height (in inches) raised to .13 power?\",\n",
" 'what is 1213 divided by 4345?'\n",
" \"How many people live in canada as of 2023?\",\n",
" \"who is dua lipa's boyfriend? what is his age raised to the .43 power?\",\n",
" \"what is dua lipa's boyfriend age raised to the .43 power?\",\n",
" \"how far is it from paris to boston in miles\",\n",
" \"what was the total number of points scored in the 2023 super bowl? what is that number raised to the .23 power?\",\n",
" \"what was the total number of points scored in the 2023 super bowl raised to the .23 power?\",\n",
" \"how many more points were scored in the 2023 super bowl than in the 2022 super bowl?\",\n",
" \"what is 153 raised to .1312 power?\",\n",
" \"who is kendall jenner's boyfriend? what is his height (in inches) raised to .13 power?\",\n",
" \"what is 1213 divided by 4345?\",\n",
"]\n",
"results = []\n",
"\n",
"for input_example in inputs:\n",
"async def arun(agent, input_example):\n",
" try:\n",
" print(agent.run(input_example))\n",
" return await agent.arun(input_example)\n",
" except Exception as e:\n",
" # The agent sometimes makes mistakes! These will be captured by the tracing.\n",
"# If your chain is NOT stateful, your lambda can return the object directly\n",
"# to improve runtime performance. For example:\n",
@ -451,7 +443,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 12,
"id": "a8088b7d-3ab6-4279-94c8-5116fe7cee33",
"metadata": {
"tags": []
@ -461,314 +453,85 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Processed examples: 1\r"
"Processed examples: 4\r"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Chain failed for example 604fbd32-7cbe-4dd4-9ddd-fd5ab5c01566. Error: LLMMathChain._evaluate(\"\n",
"(age ** 0.43)\n",
"\") raised error: 'age'. Please try again with a valid numerical expression\n"
"Chain failed for example 898af6aa-ea39-4959-9ecd-9b9f1ffee31c. Error: LLMMathChain._evaluate(\"\n",
"round(0.2791714614499425, 2)\n",
"\") raised error: 'VariableNode' object is not callable. Please try again with a valid numerical expression\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Processed examples: 4\r"
"Processed examples: 5\r"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Chain failed for example 4c82b6a4-d8ce-4129-8229-7f4e2f76294c. Error: LLMMathChain._evaluate(\"\n",
"(total number of points scored in the 2023 super bowl)**0.23\n",
"\") raised error: invalid syntax. Perhaps you forgot a comma? (<expr>, line 1). Please try again with a valid numerical expression\n"
"Chain failed for example ffb8071d-60e4-49ca-aa9f-5ec03ea78f2d. Error: unknown format from LLM: This is not a math problem and cannot be translated into a mathematical expression.\n"
"Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 29fc448d09a0f240719eb1dbb95db18d in your message.).\n"
"# You can navigate to the UI by clicking on the link below\n",
"client"
]
},
{
"cell_type": "markdown",
"id": "7896cbeb-345f-430b-ab5e-e108973174f8",
"id": "63ed6561-6574-43b3-a653-fe410aa8a617",
"metadata": {},
"source": [
"## Running an LLM over a Traced Dataset\n",
"## Running an Evaluation Chain\n",
"\n",
"You can run an LLM over a dataset in much the same way as the chain and chat models, provided the dataset you've captured is in the appropriate format. We've cached one for you here, but using application-specific traces will be much more useful for your use cases."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "d6805d0b-4612-4671-bffb-e6978992bd40",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"Manually comparing the results of chains in the UI is effective, but it can be time consuming.\n",
"It's easier to leverage AI-assisted feedback to evaluate your agent's performance.\n",