update LangSmith notebook (#7767)

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
pull/7783/head
Ankush Gola 1 year ago committed by GitHub
parent 0d058d4046
commit c4ece52dac
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -30,15 +30,9 @@
"source": [
"## Prerequisites\n",
"\n",
"**Run LangSmith locally with docker OR [create a LangSmith account](https://smith.langchain.com/) and connect with an API key.**\n",
"**[Create a LangSmith account](https://smith.langchain.com/) and create an API key (see bottom left corner). Familiarize yourself with the platform by looking through the [docs](https://docs.smith.langchain.com/)**\n",
"\n",
"Note that the hosted version of LangSmith is in gated beta; we're in the process of rolling it out to more users.\n",
"\n",
"To run LangSmith locally, execute the following comand in your terminal:\n",
"```\n",
"pip install --upgrade langsmith\n",
"langsmith start\n",
"```\n",
"Note LangSmith is in closed beta; we're in the process of rolling it out to more users. However, you can fill out the form on the website for expedited access.\n",
"\n",
"Now, let's get started!"
]
@ -50,21 +44,21 @@
"tags": []
},
"source": [
"## Log Traces to LangSmith\n",
"## Log runs to LangSmith\n",
"\n",
"First, configure your environment variables to tell LangChain to log traces. This is done by setting the `LANGCHAIN_TRACING_V2` environment variable to true.\n",
"You can tell LangChain which project to log to by setting the `LANGCHAIN_PROJECT` environment variable. This will automatically create a debug project for you.\n",
"You can tell LangChain which project to log to by setting the `LANGCHAIN_PROJECT` environment variable (if this isn't set, runs will be logged to the `default` project). This will automatically create the project for you if it doesn't exiust. You must also set the `LANGCHAIN_ENDPOINT` and `LANGCHAIN_API_KEY` environment variables.\n",
"\n",
"For more information on other ways to set up tracing, please reference the [LangSmith documentation](https://docs.smith.langchain.com/docs/)\n",
"\n",
"**NOTE:** You must also set your `OPENAI_API_KEY` and `SERPAPI_API_KEY` environment variables in order to run the following tutorial.\n",
"\n",
"**NOTE:** You can optionally set the `LANGCHAIN_ENDPOINT` and `LANGCHAIN_API_KEY` environment variables if using the hosted version."
"**NOTE:** You can only access an API key when you first create it. Keep it somewhere safe."
]
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 21,
"id": "904db9a5-f387-4a57-914c-c8af8d39e249",
"metadata": {
"tags": []
@ -77,12 +71,8 @@
"unique_id = uuid4().hex[0:8]\n",
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"os.environ[\"LANGCHAIN_PROJECT\"] = f\"Tracing Walkthrough - {unique_id}\"\n",
"os.environ[\n",
" \"LANGCHAIN_ENDPOINT\"\n",
"] = \"\" # Update to \"https://api.smith.langchain.com\" to use the hosted version.\n",
"os.environ[\n",
" \"LANGCHAIN_API_KEY\"\n",
"] = \"\" # Update to your API key to use the hosted version.\n",
"os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
"os.environ[\"LANGCHAIN_API_KEY\"] = \"\" # Update to your API key\n",
"\n",
"# Used by the agent in this tutorial\n",
"# os.environ[\"OPENAI_API_KEY\"] = \"<YOUR-OPENAI-API-KEY>\"\n",
@ -101,7 +91,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 12,
"id": "510b5ca0",
"metadata": {
"tags": []
@ -118,12 +108,12 @@
"id": "ca27fa11-ddce-4af0-971e-c5c37d5b92ef",
"metadata": {},
"source": [
"Now, start prototyping your agent. We will use a math example using an older ReACT-style agent."
"Create a LangChain component and log runs to the platform. In this example, we will create a ReAct-style agent with access to Search and Calculator as tools. However, LangSmith works regardless of which type of LangChain component you use (LLMs, Chat Models, Tools, Retrievers, Agents are all supported)."
]
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 13,
"id": "7c801853-8e96-404d-984c-51ace59cbbef",
"metadata": {
"tags": []
@ -140,9 +130,17 @@
")"
]
},
{
"cell_type": "markdown",
"id": "cab51e1e-8270-452c-ba22-22b5b5951899",
"metadata": {},
"source": [
"We are running the agent concurrently on multiple inputs to reduce latency. Runs get logged to LangSmith in the background so execution latency is unaffected."
]
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 14,
"id": "19537902-b95c-4390-80a4-f6c9a937081e",
"metadata": {
"tags": []
@ -181,7 +179,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 15,
"id": "0405ff30-21fe-413d-85cf-9fa3c649efec",
"metadata": {
"tags": []
@ -203,9 +201,7 @@
"id": "9decb964-be07-4b6c-9802-9825c8be7b64",
"metadata": {},
"source": [
"Assuming you've successfully configured the server earlier, your agent traces should show up in your web app.\n",
"\n",
"Navigate to the web app to see the results: [local app](http://localhost:80) or [hosted app](https://smith.langchain.com/)"
"Assuming you've successfully set up your environment, your agent traces should show up in the `Projects` section in the [app](https://smith.langchain.com/). Congrats!"
]
},
{
@ -213,16 +209,16 @@
"id": "6c43c311-4e09-4d57-9ef3-13afb96ff430",
"metadata": {},
"source": [
"## Evaluate a New Agent\n",
"## Evaluate another agent implementation\n",
"\n",
"Once you've debugged a customized your LLM component, you will want to create tests and benchmark evaluations to measure its performance before putting it into a production environment.\n",
"In addition to logging runs, LangSmith also allows you to test and evaluate your LLM applications.\n",
"\n",
"In this notebook, you will run evaluators to test an agent. You will do so in a few steps:\n",
"In this section, you will leverage LangSmith to create a benchmark dataset and run AI-assisted evaluators on an agent. You will do so in a few steps:\n",
"\n",
"1. Create a dataset\n",
"2. Select or create evaluators to measure performance\n",
"3. Define the LLM or Chain initializer to test\n",
"4. Run the chain and evaluators using the helper functions"
"1. Create a dataset from pre-existing run inputs and outputs\n",
"2. Initialize a new agent to benchmark\n",
"3. Configure evaluators to grade an agent's output\n",
"4. Run the agent over the dataset and evaluate the results"
]
},
{
@ -231,16 +227,18 @@
"id": "beab1a29-b79d-4a99-b5b1-0870c2d772b1",
"metadata": {},
"source": [
"### 1. Create Dataset\n",
"### 1. Create a LangSmith dataset\n",
"\n",
"Below, we use the LangSmith client to create a dataset from the agent runs you just logged above. You will use these later to measure performance for a new agent. This simple taking the inputs and outputs of the runs and saving them as examples to a dataset. A dataset is a collection of examples, which are nothing more than input-output pairs you can use as test cases to your application.\n",
"\n",
"Below, use the client to create a dataset from the Agent runs you just logged while debugging above. You will use these later to measure performance.\n",
"**Note: this is a simple, walkthrough example. In a real-world setting, you'd ideally first validate the outputs before adding them to a benchmark dataset to be used for evaluating other agents.**\n",
"\n",
"For more information on datasets, including how to create them from CSVs or other files or how to create them in the web app, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)."
"For more information on datasets, including how to create them from CSVs or other files or how to create them in the platform, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 16,
"id": "17580c4b-bd04-4dde-9d21-9d4edd25b00d",
"metadata": {
"tags": []
@ -269,16 +267,16 @@
"tags": []
},
"source": [
"### 2. Define the Agent or LLM to Test\n",
"### 2. Initialize a new agent to benchmark\n",
"\n",
"You can evaluate any LLM, chain, or agent. Since chains can have memory, we will pass in a `chain_factory` (aka a `constructor` ) function to initialize for each call.\n",
"\n",
"In this case, you will test an agent that uses OpenAI's function calling endpoints, but it can be any simple chain."
"In this case, we will test an agent that uses OpenAI's function calling endpoints."
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 17,
"id": "f42d8ecc-d46a-448b-a89c-04b0f6907f75",
"metadata": {
"tags": []
@ -310,10 +308,10 @@
"id": "9cb9ef53",
"metadata": {},
"source": [
"### 3. Configure Evaluation\n",
"### 3. Configure evaluation\n",
"\n",
"Manually comparing the results of chains in the UI is effective, but it can be time consuming.\n",
"It can be helpful to use automated metrics and ai-assisted feedback to evaluate your component's performance.\n",
"It can be helpful to use automated metrics and AI-assisted feedback to evaluate your component's performance.\n",
"\n",
"Below, we will create some pre-implemented run evaluators that do the following:\n",
"- Compare results against ground truth labels. (You used the debug outputs above for this)\n",
@ -326,7 +324,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 19,
"id": "a25dc281",
"metadata": {
"tags": []
@ -368,7 +366,7 @@
"tags": []
},
"source": [
"### 4. Run the Agent and Evaluators\n",
"### 4. Run the agent and evaluators\n",
"\n",
"Use the [arun_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.arun_on_dataset.html#langchain.smith.evaluation.runner_utils.arun_on_dataset) (or synchronous [run_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.run_on_dataset.html#langchain.smith.evaluation.runner_utils.run_on_dataset)) function to evaluate your model. This will:\n",
"1. Fetch example rows from the specified dataset\n",
@ -380,7 +378,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 20,
"id": "3733269b-8085-4644-9d5d-baedcff13a2f",
"metadata": {
"tags": []
@ -397,7 +395,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"Chain failed for example 890fac1b-9788-4545-a952-c8f569f21a13. Error: LLMMathChain._evaluate(\"\n",
"Chain failed for example 85f3a543-0429-48ae-be23-f48f0d903530. Error: LLMMathChain._evaluate(\"\n",
"age_of_Dua_Lipa_boyfriend ** 0.43\n",
"\") raised error: 'age_of_Dua_Lipa_boyfriend'. Please try again with a valid numerical expression\n"
]
@ -413,7 +411,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"Chain failed for example 614a5986-f9de-495e-adcf-a2a4bcfe68b6. Error: Too many arguments to single-input tool Calculator. Args: ['height ^ 0.13', {'height': 68}]\n"
"Chain failed for example 97d0d138-e9b3-4825-af2c-42789c66c0d4. Error: Too many arguments to single-input tool Calculator. Args: ['height ^ 0.13', {'height': 72}]\n"
]
},
{
@ -450,11 +448,11 @@
"tags": []
},
"source": [
"### Review the Test Results\n",
"### Review the test results\n",
"\n",
"You can review the test results tracing UI below by navigating to the \"Datasets & Testing\" page and selecting the **\"calculator-example-dataset-*\"** dataset and associated test project.\n",
"You can review the test results tracing UI below by navigating to the \"Datasets & Testing\" page and selecting the **\"calculator-example-dataset-*\"** dataset, clicking on the `Test Runs` tab, then inspecting the runs in the corresponding project. \n",
"\n",
"This will show the new runs and the feedback logged from the selected evaluators."
"This will show the new runs and the feedback logged from the selected evaluators. Note that runs that error out will not have feedback."
]
},
{
@ -462,7 +460,7 @@
"id": "591c819e-9932-45cf-adab-63727dd49559",
"metadata": {},
"source": [
"## Exporting Datasets and Runs\n",
"## Exporting datasets and runs\n",
"\n",
"LangSmith lets you export data to common formats such as CSV or JSONL directly in the web app. You can also use the client to fetch runs for further analysis, to store in your own database, or to share with others. Let's fetch the run traces from the evaluation run."
]

Loading…
Cancel
Save