"LangChain makes it easy to prototype LLM applications and Agents. Even so, delivering a high-quality product to production can be deceptively difficult. You will likely have to heavily customize your prompts, chains, and other components to create a high-quality product.\n",
"LangChain makes it easy to prototype LLM applications and Agents. However, delivering LLM applications to production can be deceptively difficult. You will likely have to heavily customize and iterate on your prompts, chains, and other components to create a high-quality product.\n",
"\n",
"To aid the development process, we've designed tracing and callbacks at the core of LangChain. In this notebook, you will get started prototyping and testing an example LLM agent.\n",
"To aid in this process, we've launched LangSmith, a unified platform for debugging, testing, and monitoring your LLM applications.\n",
"\n",
"When might this come in handy? You may find it useful when you want to:\n",
"\n",
"- Quickly debug a new chain, agent, or set of tools\n",
"- Visualize how components (chains, llms, retrievers, etc.) relate and are used\n",
"- Evaluate different prompts and LLMs for a single component\n",
"- Run a given chain several times over a dataset to ensure it consistently meets a quality bar.\n",
"- Run a given chain several times over a dataset to ensure it consistently meets a quality bar\n",
"- Capture usage traces and using LLMs or analytics pipelines to generate insights"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "138fbb8f-960d-4d26-9dd5-6d6acab3ee55",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"**Run the [local tracing server](https://docs.smith.langchain.com/docs/additional-resources/local_installation) OR [create a hosted LangSmith account](https://smith.langchain.com/) and connect with an API key.**\n",
"**Run LangSmith locally with docker OR [create a LangSmith account](https://smith.langchain.com/) and connect with an API key.**\n",
"\n",
"To run the local server, execute the following comand in your terminal:\n",
"Note that the hosted version of LangSmith is in gated beta; we're in the process of rolling it out to more users.\n",
"\n",
"To run LangSmith locally, execute the following comand in your terminal:\n",
"```\n",
"pip install --upgrade langsmith\n",
"langsmith start\n",
"```\n",
"\n",
"Now, let's get started debugging!"
"Now, let's get started!"
]
},
{
@ -47,7 +50,7 @@
"tags": []
},
"source": [
"## Debug your Chain \n",
"## Log Traces to LangSmith\n",
"\n",
"First, configure your environment variables to tell LangChain to log traces. This is done by setting the `LANGCHAIN_TRACING_V2` environment variable to true.\n",
"You can tell LangChain which project to log to by setting the `LANGCHAIN_PROJECT` environment variable. This will automatically create a debug project for you.\n",
"print(\"You can click the link below to view the UI\")\n",
"client"
"client = Client()"
]
},
{
@ -139,7 +123,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 21,
"id": "7c801853-8e96-404d-984c-51ace59cbbef",
"metadata": {
"tags": []
@ -158,7 +142,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 23,
"id": "19537902-b95c-4390-80a4-f6c9a937081e",
"metadata": {
"tags": []
@ -197,7 +181,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 9,
"id": "0405ff30-21fe-413d-85cf-9fa3c649efec",
"metadata": {
"tags": []
@ -214,37 +198,14 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9decb964-be07-4b6c-9802-9825c8be7b64",
"metadata": {},
"source": [
"Assuming you've successfully configured the server earlier, your agent traces should show up in your server's UI. You can check by clicking on the link below:"
"Assuming you've successfully configured the server earlier, your agent traces should show up in your web app.\n",
"\n",
"Navigate to the web app to see the results: [local app](http://localhost:80) or [hosted app](https://smith.langchain.com/)"
]
},
{
@ -252,7 +213,7 @@
"id": "6c43c311-4e09-4d57-9ef3-13afb96ff430",
"metadata": {},
"source": [
"## Test\n",
"## Evaluate a New Agent\n",
"\n",
"Once you've debugged a customized your LLM component, you will want to create tests and benchmark evaluations to measure its performance before putting it into a production environment.\n",
"\n",
@ -265,6 +226,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "beab1a29-b79d-4a99-b5b1-0870c2d772b1",
"metadata": {},
@ -273,12 +235,12 @@
"\n",
"Below, use the client to create a dataset from the Agent runs you just logged while debugging above. You will use these later to measure performance.\n",
"\n",
"For more information on datasets, including how to create them from CSVs or other files or how to create them in the web app, please refer to the [LangSmith documentation](https://docs.langchain.plus/docs)."
"For more information on datasets, including how to create them from CSVs or other files or how to create them in the web app, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)."
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 11,
"id": "17580c4b-bd04-4dde-9d21-9d4edd25b00d",
"metadata": {
"tags": []
@ -309,14 +271,14 @@
"source": [
"### 2. Define the Agent or LLM to Test\n",
"\n",
"You can evaluate any LLM or chain. Since chains can have memory, we will pass in a `chain_factory` (aka a `constructor` ) function to initialize for each call.\n",
"You can evaluate any LLM, chain, or agent. Since chains can have memory, we will pass in a `chain_factory` (aka a `constructor` ) function to initialize for each call.\n",
"\n",
"In this case, you will test an agent that uses OpenAI's function calling endpoints, but it can be any simple chain."
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 12,
"id": "f42d8ecc-d46a-448b-a89c-04b0f6907f75",
"metadata": {
"tags": []
@ -343,6 +305,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9cb9ef53",
"metadata": {},
@ -358,12 +321,12 @@
"- Evaluate 'aspects' of the agent's response in a reference-free manner using custom criteria\n",
"\n",
"For a longer discussion of how to select an appropriate evaluator for your use case and how to create your own\n",
"custom evaluators, please refer to the [LangSmith documentation](https://docs.langchain.plus/docs/).\n"
"custom evaluators, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/).\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 13,
"id": "a25dc281",
"metadata": {
"tags": []
@ -376,19 +339,29 @@
"evaluation_config = RunEvalConfig(\n",
" # Evaluators can either be an evaluator type (e.g., \"qa\", \"criteria\", \"embedding_distance\", etc.) or a configuration for that evaluator\n",
" evaluators=[\n",
" EvaluatorType.QA, # \"Correctness\" against a reference answer\n",
" # Measures whether a QA response is \"Correct\", based on a reference answer\n",
" # You can also select via the raw string \"qa\"\n",
" EvaluatorType.QA,\n",
" # Measure the embedding distance between the output and the reference answer\n",
" # Equivalent to: EvalConfig.EmbeddingDistance(embeddings=OpenAIEmbeddings())\n",
" EvaluatorType.EMBEDDING_DISTANCE,\n",
" RunEvalConfig.Criteria(\"helpfulness\"),\n",
" # Grade whether the output satisfies the stated criteria. You can select a default one such as \"helpfulness\" or provide your own.\n",
" # Both the Criteria and LabeledCriteria evaluators can be configured with a dictionary of custom criteria.\n",
" RunEvalConfig.Criteria(\n",
" {\n",
" \"fifth-grader-score\": \"Do you have to be smarter than a fifth grader to answer this question?\"\n",
" }\n",
" ),\n",
" ]\n",
" ],\n",
" # You can add custom StringEvaluator or RunEvaluator objects here as well, which will automatically be\n",
" # applied to each prediction. Check out the docs for examples.\n",
" custom_evaluators=[],\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "07885b10",
"metadata": {
@ -397,7 +370,7 @@
"source": [
"### 4. Run the Agent and Evaluators\n",
"\n",
"Use the `arun_on_dataset` (or synchronous `run_on_dataset`) function to evaluate your model. This will:\n",
"Use the [arun_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.arun_on_dataset.html#langchain.smith.evaluation.runner_utils.arun_on_dataset) (or synchronous [run_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.run_on_dataset.html#langchain.smith.evaluation.runner_utils.run_on_dataset)) function to evaluate your model. This will:\n",
"1. Fetch example rows from the specified dataset\n",
"2. Run your llm or chain on each example.\n",
"3. Apply evalutors to the resulting run traces and corresponding reference examples to generate automated feedback.\n",
@ -407,7 +380,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 14,
"id": "3733269b-8085-4644-9d5d-baedcff13a2f",
"metadata": {
"tags": []
@ -417,14 +390,14 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Processed examples: 2\r"
"Processed examples: 1\r"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Chain failed for example 4de88b85-928e-4711-8f11-98886295c8b3. Error: LLMMathChain._evaluate(\"\n",
"Chain failed for example 890fac1b-9788-4545-a952-c8f569f21a13. Error: LLMMathChain._evaluate(\"\n",
"age_of_Dua_Lipa_boyfriend ** 0.43\n",
"\") raised error: 'age_of_Dua_Lipa_boyfriend'. Please try again with a valid numerical expression\n"
]
@ -433,14 +406,14 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Processed examples: 3\r"
"Processed examples: 6\r"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Chain failed for example 7cacdf54-d1b8-4e6c-944e-c94578a2fe0d. Error: Too many arguments to single-input tool Calculator. Args: ['height ^ 0.13', {'height': 68}]\n"
"Chain failed for example 614a5986-f9de-495e-adcf-a2a4bcfe68b6. Error: Too many arguments to single-input tool Calculator. Args: ['height ^ 0.13', {'height': 68}]\n"
]
},
{
@ -470,74 +443,6 @@
"# These are logged as warnings here and captured as errors in the tracing UI."
"# For more information on additional configuration for the evaluation function:\n",
"\n",
"?arun_on_dataset"
]
},
{
"cell_type": "markdown",
"id": "cdacd159-eb4d-49e9-bb2a-c55322c40ed4",
@ -557,7 +462,7 @@
"id": "591c819e-9932-45cf-adab-63727dd49559",
"metadata": {},
"source": [
"## Exporting Runs\n",
"## Exporting Datasets and Runs\n",
"\n",
"LangSmith lets you export data to common formats such as CSV or JSONL directly in the web app. You can also use the client to fetch runs for further analysis, to store in your own database, or to share with others. Let's fetch the run traces from the evaluation run."
]
@ -615,6 +520,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2646f0fb-81d4-43ce-8a9b-54b8e19841e2",
"metadata": {
@ -627,8 +533,14 @@
"\n",
"This was a quick guide to get started, but there are many more ways to use LangSmith to speed up your developer flow and produce better results.\n",
"\n",
"For more information on how you can get the most out of LangSmith, check out [LangSmith documentation](https://docs.langchain.plus/docs/), and please reach out with questions, feature requests, or feedback at [support@langchain.dev](mailto:support@langchain.dev)."
"For more information on how you can get the most out of LangSmith, check out [LangSmith documentation](https://docs.smith.langchain.com/), and please reach out with questions, feature requests, or feedback at [support@langchain.dev](mailto:support@langchain.dev)."