docs: link to langsmith+langgraph docs (#21930)

pull/21848/head^2
Bagatur 4 months ago committed by GitHub
parent e8bdf245eb
commit 1418d3af00
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -45,9 +45,6 @@ generate-files:
wget -q https://raw.githubusercontent.com/langchain-ai/langserve/main/README.md -O $(INTERMEDIATE_DIR)/langserve.md
$(PYTHON) scripts/resolve_local_links.py $(INTERMEDIATE_DIR)/langserve.md https://github.com/langchain-ai/langserve/tree/main/
wget -q https://raw.githubusercontent.com/langchain-ai/langgraph/main/README.md -O $(INTERMEDIATE_DIR)/langgraph.md
$(PYTHON) scripts/resolve_local_links.py $(INTERMEDIATE_DIR)/langgraph.md https://github.com/langchain-ai/langgraph/tree/main/
copy-infra:
mkdir -p $(OUTPUT_NEW_DIR)
cp -r src $(OUTPUT_NEW_DIR)

@ -33,7 +33,7 @@ Key partner packages are separated out (see below).
This contains all integrations for various components (LLMs, vectorstores, retrievers).
All dependencies in this package are optional to keep the package as lightweight as possible.
### [`langgraph`](/docs/langgraph)
### [`langgraph`](https://langchain-ai.github.io/langgraph)
`langgraph` is an extension of `langchain` aimed at
building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
@ -44,7 +44,7 @@ LangGraph exposes high level interfaces for creating common types of agents, as
A package to deploy LangChain chains as REST APIs. Makes it easy to get a production ready API up and running.
### [LangSmith](/docs/langsmith)
### [LangSmith](https://docs.smith.langchain.com)
A developer platform that lets you debug, test, evaluate, and monitor LLM applications.
@ -66,7 +66,7 @@ LCEL was designed from day 1 to **support putting prototypes in production, with
When you build your chains with LCEL you get the best possible time-to-first-token (time elapsed until the first chunk of output comes out). For some chains this means eg. we stream tokens straight from an LLM to a streaming output parser, and you get back parsed, incremental chunks of output at the same rate as the LLM provider outputs the raw tokens.
**Async support**
Any chain built with LCEL can be called both with the synchronous API (eg. in your Jupyter notebook while prototyping) as well as with the asynchronous API (eg. in a [LangServe](/docs/langsmith) server). This enables using the same code for prototypes and in production, with great performance, and the ability to handle many concurrent requests in the same server.
Any chain built with LCEL can be called both with the synchronous API (eg. in your Jupyter notebook while prototyping) as well as with the asynchronous API (eg. in a [LangServe](https://docs.smith.langchain.com) server). This enables using the same code for prototypes and in production, with great performance, and the ability to handle many concurrent requests in the same server.
**Optimized parallel execution**
Whenever your LCEL chains have steps that can be executed in parallel (eg if you fetch documents from multiple retrievers) we automatically do it, both in the sync and the async interfaces, for the smallest possible latency.
@ -80,9 +80,9 @@ For more complex chains its often very useful to access the results of interm
**Input and output schemas**
Input and output schemas give every LCEL chain Pydantic and JSONSchema schemas inferred from the structure of your chain. This can be used for validation of inputs and outputs, and is an integral part of LangServe.
[**Seamless LangSmith tracing**](/docs/langsmith)
[**Seamless LangSmith tracing**](https://docs.smith.langchain.com)
As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at every step.
With LCEL, **all** steps are automatically logged to [LangSmith](/docs/langsmith/) for maximum observability and debuggability.
With LCEL, **all** steps are automatically logged to [LangSmith](https://docs.smith.langchain.com/) for maximum observability and debuggability.
[**Seamless LangServe deployment**](/docs/langserve)
Any chain created with LCEL can be easily deployed using [LangServe](/docs/langserve).

@ -88,7 +88,7 @@ Concepts covered in `Integrations` should generally exist in `langchain_communit
### Guides and Ecosystem
The [Guides](/docs/tutorials) and [Ecosystem](/docs/langsmith/) sections should contain guides that address higher-level problems than the sections above.
The [Guides](/docs/tutorials) and [Ecosystem](https://docs.smith.langchain.com/) sections should contain guides that address higher-level problems than the sections above.
This includes, but is not limited to, considerations around productionization and development workflows.
These should contain mostly **How-to guides**, **Explanations**, and **Tutorials**.

@ -12,7 +12,7 @@
"\n",
"- Verbose Mode: This adds print statements for \"important\" events in your chain.\n",
"- Debug Mode: This add logging statements for ALL events in your chain.\n",
"- LangSmith Tracing: This logs events to [LangSmith](/docs/langsmith/) to allow for visualization there.\n",
"- LangSmith Tracing: This logs events to [LangSmith](https://docs.smith.langchain.com/) to allow for visualization there.\n",
"\n",
"| | Verbose Mode | Debug Mode | LangSmith Tracing |\n",
"|------------------------|--------------|------------|-------------------|\n",

@ -167,7 +167,7 @@
"source": [
"Above, the `@chain` decorator is used to convert `custom_chain` into a runnable, which we invoke with the `.invoke()` method.\n",
"\n",
"If you are using a tracing with [LangSmith](/docs/langsmith/), you should see a `custom_chain` trace in there, with the calls to OpenAI nested underneath.\n",
"If you are using a tracing with [LangSmith](https://docs.smith.langchain.com/), you should see a `custom_chain` trace in there, with the calls to OpenAI nested underneath.\n",
"\n",
"## Automatic coercion in chains\n",
"\n",

@ -512,7 +512,7 @@
"id": "36f43b87-655c-4f64-aa7b-bd8c1955d8e5",
"metadata": {},
"source": [
"### [LangSmith](/docs/langsmith)\n",
"### [LangSmith](https://docs.smith.langchain.com)\n",
"\n",
"LangSmith is especially useful for something like message history injection, where it can be hard to otherwise understand what the inputs are to various parts of the chain.\n",
"\n",

@ -45,7 +45,7 @@
"id": "36a9c6fc-8264-462f-b8d7-9c7bbec22ef9",
"metadata": {},
"source": [
"If you'd like to trace your runs in [LangSmith](/docs/langsmith/) uncomment and set the following environment variables:"
"If you'd like to trace your runs in [LangSmith](https://docs.smith.langchain.com/) uncomment and set the following environment variables:"
]
},
{

@ -37,7 +37,7 @@
"id": "68107597-0c8c-4bb5-8c12-9992fabdf71a",
"metadata": {},
"source": [
"If you'd like to trace your runs in [LangSmith](/docs/langsmith/) uncomment and set the following environment variables:"
"If you'd like to trace your runs in [LangSmith](https://docs.smith.langchain.com/) uncomment and set the following environment variables:"
]
},
{

@ -72,7 +72,7 @@
"- **LangChain Libraries**: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.\n",
"- **[LangChain Templates](/docs/templates)**: A collection of easily deployable reference architectures for a wide variety of tasks.\n",
"- **[LangServe](/docs/langserve)**: A library for deploying LangChain chains as a REST API.\n",
"- **[LangSmith](/docs/langsmith)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.\n",
"- **[LangSmith](https://docs.smith.langchain.com)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.\n",
"\n",
"![Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers.](https://python.langchain.com/assets/images/langchain_stack-f21828069f74484521f38199910007c1.svg)\n",
"\n",

@ -9,7 +9,7 @@ sidebar_class_name: hidden
LangChain simplifies every stage of the LLM application lifecycle:
- **Development**: Build your applications using LangChain's open-source [building blocks](/docs/concepts#langchain-expression-language) and [components](/docs/concepts). Hit the ground running using [third-party integrations](/docs/integrations/platforms/) and [Templates](/docs/templates).
- **Productionization**: Use [LangSmith](/docs/langsmith/) to inspect, monitor and evaluate your chains, so that you can continuously optimize and deploy with confidence.
- **Productionization**: Use [LangSmith](https://docs.smith.langchain.com/) to inspect, monitor and evaluate your chains, so that you can continuously optimize and deploy with confidence.
- **Deployment**: Turn any chain into an API with [LangServe](/docs/langserve).
import ThemedImage from '@theme/ThemedImage';
@ -30,9 +30,9 @@ Concretely, the framework consists of the following open-source libraries:
- **`langchain-community`**: Third party integrations.
- Partner packages (e.g. **`langchain-openai`**, **`langchain-anthropic`**, etc.): Some integrations have been further split into their own lightweight packages that only depend on **`langchain-core`**.
- **`langchain`**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
- **[langgraph](/docs/langgraph)**: Build robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
- **[langgraph](https://langchain-ai.github.io/langgraph)**: Build robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
- **[langserve](/docs/langserve)**: Deploy LangChain chains as REST APIs.
- **[LangSmith](/docs/langsmith)**: A developer platform that lets you debug, test, evaluate, and monitor LLM applications.
- **[LangSmith](https://docs.smith.langchain.com)**: A developer platform that lets you debug, test, evaluate, and monitor LLM applications.
:::note
@ -69,10 +69,10 @@ Head to the reference section for full documentation of all classes and methods
## Ecosystem
### [🦜🛠️ LangSmith](/docs/langsmith)
### [🦜🛠️ LangSmith](https://docs.smith.langchain.com)
Trace and evaluate your language model applications and intelligent agents to help you move from prototype to production.
### [🦜🕸️ LangGraph](/docs/langgraph)
### [🦜🕸️ LangGraph](https://langchain-ai.github.io/langgraph)
Build stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain primitives.
### [🦜🏓 LangServe](/docs/langserve)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 865 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 839 KiB

@ -1,717 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1a4596ea-a631-416d-a2a4-3577c140493d",
"metadata": {
"tags": []
},
"source": [
"# 🦜🛠️ LangSmith\n",
"\n",
"[LangSmith](https://smith.langchain.com) helps you trace and evaluate your language model applications and intelligent agents to help you\n",
"move from prototype to production.\n",
"\n",
"Check out the interactive walkthrough on this page to get started.\n",
"\n",
"For more information, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/).\n",
"\n",
"For tutorials and other end-to-end examples demonstrating ways to integrate LangSmith in your workflow,\n",
"check out the [LangSmith Cookbook](https://github.com/langchain-ai/langsmith-cookbook). Some of the guides therein include:\n",
"\n",
"- Leveraging user feedback in your JS application ([link](https://github.com/langchain-ai/langsmith-cookbook/blob/main/feedback-examples/nextjs/README.md)).\n",
"- Building an automated feedback pipeline ([link](https://github.com/langchain-ai/langsmith-cookbook/blob/main/feedback-examples/algorithmic-feedback/algorithmic_feedback.ipynb)).\n",
"- How to evaluate and audit your RAG workflows ([link](https://github.com/langchain-ai/langsmith-cookbook/tree/main/testing-examples/qa-correctness)).\n",
"- How to fine-tune an LLM on real usage data ([link](https://github.com/langchain-ai/langsmith-cookbook/blob/main/fine-tuning-examples/export-to-openai/fine-tuning-on-chat-runs.ipynb)).\n",
"- How to use the [LangChain Hub](https://smith.langchain.com/hub) to version your prompts ([link](https://github.com/langchain-ai/langsmith-cookbook/blob/main/hub-examples/retrieval-qa-chain/retrieval-qa.ipynb))\n",
"\n",
"\n",
"# LangSmith Walkthrough\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/langsmith/walkthrough.ipynb)\n",
"\n",
"LangChain makes it easy to prototype LLM applications and Agents. However, delivering LLM applications to production can be deceptively difficult. You will have to iterate on your prompts, chains, and other components to build a high-quality product.\n",
"\n",
"LangSmith makes it easy to debug, test, and continuously improve your LLM applications.\n",
"\n",
"When might this come in handy? You may find it useful when you want to:\n",
"\n",
"- Quickly debug a new chain, agent, or set of tools\n",
"- Create and manage datasets for fine-tuning, few-shot prompting, and evaluation\n",
"- Run regression tests on your application to confidently develop\n",
"- Capture production analytics for product insights and continuous improvements"
]
},
{
"cell_type": "markdown",
"id": "138fbb8f-960d-4d26-9dd5-6d6acab3ee55",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"**[Create a LangSmith account](https://smith.langchain.com/) and create an API key (see bottom left corner). Familiarize yourself with the platform by looking through the [docs](https://docs.smith.langchain.com/)**\n",
"\n",
"Note LangSmith is in closed beta; we're in the process of rolling it out to more users. However, you can fill out the form on the website for expedited access.\n",
"\n",
"Now, let's get started!"
]
},
{
"cell_type": "markdown",
"id": "2d77d064-41b4-41fb-82e6-2d16461269ec",
"metadata": {
"tags": []
},
"source": [
"## Log runs to LangSmith\n",
"\n",
"First, configure your environment variables to tell LangChain to log traces. This is done by setting the `LANGCHAIN_TRACING_V2` environment variable to true.\n",
"You can tell LangChain which project to log to by setting the `LANGCHAIN_PROJECT` environment variable (if this isn't set, runs will be logged to the `default` project). This will automatically create the project for you if it doesn't exist. You must also set the `LANGCHAIN_ENDPOINT` and `LANGCHAIN_API_KEY` environment variables.\n",
"\n",
"For more information on other ways to set up tracing, please reference the [LangSmith documentation](https://docs.smith.langchain.com/docs/).\n",
"\n",
"**NOTE:** You can also use a context manager in python to log traces using\n",
"\n",
"```python\n",
"from langchain_core.tracers.context import tracing_v2_enabled\n",
"\n",
"with tracing_v2_enabled(project_name=\"My Project\"):\n",
" agent.run(\"How many people live in canada as of 2023?\")\n",
"```\n",
"\n",
"However, in this example, we will use environment variables."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e4780363-f05a-4649-8b1a-9b449f960ce4",
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain langsmith langchainhub --quiet\n",
"%pip install --upgrade --quiet langchain-openai tiktoken pandas duckduckgo-search --quiet"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "904db9a5-f387-4a57-914c-c8af8d39e249",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"from uuid import uuid4\n",
"\n",
"unique_id = uuid4().hex[0:8]\n",
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"os.environ[\"LANGCHAIN_PROJECT\"] = f\"Tracing Walkthrough - {unique_id}\"\n",
"os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
"os.environ[\"LANGCHAIN_API_KEY\"] = \"<YOUR-API-KEY>\" # Update to your API key\n",
"\n",
"# Used by the agent in this tutorial\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR-OPENAI-API-KEY>\""
]
},
{
"cell_type": "markdown",
"id": "8ee7f34b-b65c-4e09-ad52-e3ace78d0221",
"metadata": {
"tags": []
},
"source": [
"Create the langsmith client to interact with the API"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "510b5ca0",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langsmith import Client\n",
"\n",
"client = Client()"
]
},
{
"cell_type": "markdown",
"id": "ca27fa11-ddce-4af0-971e-c5c37d5b92ef",
"metadata": {},
"source": [
"Create a LangChain component and log runs to the platform. In this example, we will create a ReAct-style agent with access to a general search tool (DuckDuckGo). The agent's prompt can be viewed in the [Hub here](https://smith.langchain.com/hub/wfh/langsmith-agent-prompt)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a0fbfbba-3c82-4298-a312-9cec016d9d2e",
"metadata": {},
"outputs": [],
"source": [
"from langchain import hub\n",
"from langchain.agents import AgentExecutor\n",
"from langchain.agents.format_scratchpad.openai_tools import (\n",
" format_to_openai_tool_messages,\n",
")\n",
"from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser\n",
"from langchain_community.tools import DuckDuckGoSearchResults\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"# Fetches the latest version of this prompt\n",
"prompt = hub.pull(\"wfh/langsmith-agent-prompt:5d466cbc\")\n",
"\n",
"llm = ChatOpenAI(\n",
" model=\"gpt-3.5-turbo-16k\",\n",
" temperature=0,\n",
")\n",
"\n",
"tools = [\n",
" DuckDuckGoSearchResults(\n",
" name=\"duck_duck_go\"\n",
" ), # General internet search using DuckDuckGo\n",
"]\n",
"\n",
"llm_with_tools = llm.bind_tools(tools)\n",
"\n",
"runnable_agent = (\n",
" {\n",
" \"input\": lambda x: x[\"input\"],\n",
" \"agent_scratchpad\": lambda x: format_to_openai_tool_messages(\n",
" x[\"intermediate_steps\"]\n",
" ),\n",
" }\n",
" | prompt\n",
" | llm_with_tools\n",
" | OpenAIToolsAgentOutputParser()\n",
")\n",
"\n",
"agent_executor = AgentExecutor(\n",
" agent=runnable_agent, tools=tools, handle_parsing_errors=True\n",
")"
]
},
{
"cell_type": "markdown",
"id": "cab51e1e-8270-452c-ba22-22b5b5951899",
"metadata": {},
"source": [
"We are running the agent concurrently on multiple inputs to reduce latency. Runs get logged to LangSmith in the background so execution latency is unaffected."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "19537902-b95c-4390-80a4-f6c9a937081e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"inputs = [\n",
" \"What is LangChain?\",\n",
" \"What's LangSmith?\",\n",
" \"When was Llama-v2 released?\",\n",
" \"What is the langsmith cookbook?\",\n",
" \"When did langchain first announce the hub?\",\n",
"]\n",
"\n",
"results = agent_executor.batch([{\"input\": x} for x in inputs], return_exceptions=True)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9a6a764c-5d7a-4de7-a916-3ecc987d5bb6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'input': 'What is LangChain?',\n",
" 'output': 'I\\'m sorry, but I couldn\\'t find any information about \"LangChain\". Could you please provide more context or clarify your question?'},\n",
" {'input': \"What's LangSmith?\",\n",
" 'output': 'I\\'m sorry, but I couldn\\'t find any information about \"LangSmith\". It could be a company, a product, or a person. Can you provide more context or details about what you are referring to?'}]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"results[:2]"
]
},
{
"cell_type": "markdown",
"id": "9decb964-be07-4b6c-9802-9825c8be7b64",
"metadata": {},
"source": [
"Assuming you've successfully set up your environment, your agent traces should show up in the `Projects` section in the [app](https://smith.langchain.com/). Congrats!\n",
"\n",
"![Initial Runs](./img/log_traces.png)\n",
"\n",
"It looks like the agent isn't effectively using the tools though. Let's evaluate this so we have a baseline."
]
},
{
"cell_type": "markdown",
"id": "6c43c311-4e09-4d57-9ef3-13afb96ff430",
"metadata": {},
"source": [
"## Evaluate Agent\n",
"\n",
"In addition to logging runs, LangSmith also allows you to test and evaluate your LLM applications.\n",
"\n",
"In this section, you will leverage LangSmith to create a benchmark dataset and run AI-assisted evaluators on an agent. You will do so in a few steps:\n",
"\n",
"1. Create a dataset\n",
"2. Initialize a new agent to benchmark\n",
"3. Configure evaluators to grade an agent's output\n",
"4. Run the agent over the dataset and evaluate the results"
]
},
{
"cell_type": "markdown",
"id": "beab1a29-b79d-4a99-b5b1-0870c2d772b1",
"metadata": {},
"source": [
"### 1. Create a LangSmith dataset\n",
"\n",
"Below, we use the LangSmith client to create a dataset from the input questions from above and a list labels. You will use these later to measure performance for a new agent. A dataset is a collection of examples, which are nothing more than input-output pairs you can use as test cases to your application.\n",
"\n",
"For more information on datasets, including how to create them from CSVs or other files or how to create them in the platform, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "43fd40b2-3f02-4e51-9343-705aafe90a36",
"metadata": {},
"outputs": [],
"source": [
"outputs = [\n",
" \"LangChain is an open-source framework for building applications using large language models. It is also the name of the company building LangSmith.\",\n",
" \"LangSmith is a unified platform for debugging, testing, and monitoring language model applications and agents powered by LangChain\",\n",
" \"July 18, 2023\",\n",
" \"The langsmith cookbook is a github repository containing detailed examples of how to use LangSmith to debug, evaluate, and monitor large language model-powered applications.\",\n",
" \"September 5, 2023\",\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "17580c4b-bd04-4dde-9d21-9d4edd25b00d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"dataset_name = f\"agent-qa-{unique_id}\"\n",
"\n",
"dataset = client.create_dataset(\n",
" dataset_name,\n",
" description=\"An example dataset of questions over the LangSmith documentation.\",\n",
")\n",
"\n",
"client.create_examples(\n",
" inputs=[{\"input\": query} for query in inputs],\n",
" outputs=[{\"output\": answer} for answer in outputs],\n",
" dataset_id=dataset.id,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "8adfd29c-b258-49e5-94b4-74597a12ba16",
"metadata": {
"tags": []
},
"source": [
"### 2. Initialize a new agent to benchmark\n",
"\n",
"LangSmith lets you evaluate any LLM, chain, agent, or even a custom function. Conversational agents are stateful (they have memory); to ensure that this state isn't shared between dataset runs, we will pass in a `chain_factory` (aka a `constructor`) function to initialize for each call.\n",
"\n",
"In this case, we will test an agent that uses OpenAI's function calling endpoints."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f42d8ecc-d46a-448b-a89c-04b0f6907f75",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain import hub\n",
"from langchain.agents import AgentExecutor, AgentType, initialize_agent, load_tools\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"\n",
"# Since chains can be stateful (e.g. they can have memory), we provide\n",
"# a way to initialize a new chain for each row in the dataset. This is done\n",
"# by passing in a factory function that returns a new chain for each row.\n",
"def create_agent(prompt, llm_with_tools):\n",
" runnable_agent = (\n",
" {\n",
" \"input\": lambda x: x[\"input\"],\n",
" \"agent_scratchpad\": lambda x: format_to_openai_tool_messages(\n",
" x[\"intermediate_steps\"]\n",
" ),\n",
" }\n",
" | prompt\n",
" | llm_with_tools\n",
" | OpenAIToolsAgentOutputParser()\n",
" )\n",
" return AgentExecutor(agent=runnable_agent, tools=tools, handle_parsing_errors=True)"
]
},
{
"cell_type": "markdown",
"id": "9cb9ef53",
"metadata": {},
"source": [
"### 3. Configure evaluation\n",
"\n",
"Manually comparing the results of chains in the UI is effective, but it can be time consuming.\n",
"It can be helpful to use automated metrics and AI-assisted feedback to evaluate your component's performance.\n",
"\n",
"Below, we will create a custom run evaluator that logs a heuristic evaluation.\n",
"\n",
"**Heuristic evaluators**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "331c3c53-949d-405e-8ba5-38bab1ce413b",
"metadata": {},
"outputs": [],
"source": [
"from langsmith.evaluation import EvaluationResult\n",
"from langsmith.schemas import Example, Run\n",
"\n",
"\n",
"def check_not_idk(run: Run, example: Example):\n",
" \"\"\"Illustration of a custom evaluator.\"\"\"\n",
" agent_response = run.outputs[\"output\"]\n",
" if \"don't know\" in agent_response or \"not sure\" in agent_response:\n",
" score = 0\n",
" else:\n",
" score = 1\n",
" # You can access the dataset labels in example.outputs[key]\n",
" # You can also access the model inputs in run.inputs[key]\n",
" return EvaluationResult(\n",
" key=\"not_uncertain\",\n",
" score=score,\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "1cc51d0a-4982-4ff9-89c1-b294d5cce8f6",
"metadata": {},
"source": [
"#### Batch Evaluators\n",
"\n",
"Some metrics are aggregated over a full \"test\" without being assigned to an individual runs/examples. These could be as simple \n",
"as common classification metrics like Precision, Recall, or AUC, or it could be another custom aggregate metric.\n",
"\n",
"You can define any batch metric on a full test level by defining a function (or any callable) that accepts a list of Runs (system traces) and list of Examples (dataset records)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cada62c4-c237-4f85-aa33-789cbcd1e8c1",
"metadata": {},
"outputs": [],
"source": [
"from typing import List\n",
"\n",
"\n",
"def max_pred_length(runs: List[Run], examples: List[Example]):\n",
" predictions = [len(run.outputs[\"output\"]) for run in runs]\n",
" return EvaluationResult(key=\"max_pred_length\", score=max(predictions))"
]
},
{
"cell_type": "markdown",
"id": "ad9c4791-570b-4adf-a23f-d025ff383254",
"metadata": {},
"source": [
"Below, we will configure the evaluation with the custom evaluator from above, as well as some pre-implemented run evaluators that do the following:\n",
"- Compare results against ground truth labels.\n",
"- Measure semantic (dis)similarity using embedding distance\n",
"- Evaluate 'aspects' of the agent's response in a reference-free manner using custom criteria\n",
"\n",
"For a longer discussion of how to select an appropriate evaluator for your use case and how to create your own\n",
"custom evaluators, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a25dc281",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.evaluation import EvaluatorType\n",
"from langchain.smith import RunEvalConfig\n",
"\n",
"evaluation_config = RunEvalConfig(\n",
" # Evaluators can either be an evaluator type (e.g., \"qa\", \"criteria\", \"embedding_distance\", etc.) or a configuration for that evaluator\n",
" evaluators=[\n",
" check_not_idk,\n",
" # Measures whether a QA response is \"Correct\", based on a reference answer\n",
" # You can also select via the raw string \"qa\"\n",
" EvaluatorType.QA,\n",
" # Measure the embedding distance between the output and the reference answer\n",
" # Equivalent to: EvalConfig.EmbeddingDistance(embeddings=OpenAIEmbeddings())\n",
" EvaluatorType.EMBEDDING_DISTANCE,\n",
" # Grade whether the output satisfies the stated criteria.\n",
" # You can select a default one such as \"helpfulness\" or provide your own.\n",
" RunEvalConfig.LabeledCriteria(\"helpfulness\"),\n",
" # The LabeledScoreString evaluator outputs a score on a scale from 1-10.\n",
" # You can use default criteria or write our own rubric\n",
" RunEvalConfig.LabeledScoreString(\n",
" {\n",
" \"accuracy\": \"\"\"\n",
"Score 1: The answer is completely unrelated to the reference.\n",
"Score 3: The answer has minor relevance but does not align with the reference.\n",
"Score 5: The answer has moderate relevance but contains inaccuracies.\n",
"Score 7: The answer aligns with the reference but has minor errors or omissions.\n",
"Score 10: The answer is completely accurate and aligns perfectly with the reference.\"\"\"\n",
" },\n",
" normalize_by=10,\n",
" ),\n",
" ],\n",
" batch_evaluators=[max_pred_length],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "07885b10",
"metadata": {
"tags": []
},
"source": [
"### 4. Run the agent and evaluators\n",
"\n",
"Use the [run_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.run_on_dataset.html#langchain.smith.evaluation.runner_utils.run_on_dataset) (or asynchronous [arun_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.arun_on_dataset.html#langchain.smith.evaluation.runner_utils.arun_on_dataset)) function to evaluate your model. This will:\n",
"1. Fetch example rows from the specified dataset.\n",
"2. Run your agent (or any custom function) on each example.\n",
"3. Apply evaluators to the resulting run traces and corresponding reference examples to generate automated feedback.\n",
"\n",
"The results will be visible in the LangSmith app."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af8c8469-d70d-46d9-8fcd-517a1ccc7c4b",
"metadata": {},
"outputs": [],
"source": [
"from langchain import hub\n",
"\n",
"# We will test this version of the prompt\n",
"prompt = hub.pull(\"wfh/langsmith-agent-prompt:798e7324\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3733269b-8085-4644-9d5d-baedcff13a2f",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import functools\n",
"\n",
"from langchain.smith import arun_on_dataset, run_on_dataset\n",
"\n",
"chain_results = run_on_dataset(\n",
" dataset_name=dataset_name,\n",
" llm_or_chain_factory=functools.partial(\n",
" create_agent, prompt=prompt, llm_with_tools=llm_with_tools\n",
" ),\n",
" evaluation=evaluation_config,\n",
" verbose=True,\n",
" client=client,\n",
" project_name=f\"tools-agent-test-5d466cbc-{unique_id}\",\n",
" # Project metadata communicates the experiment parameters,\n",
" # Useful for reviewing the test results\n",
" project_metadata={\n",
" \"env\": \"testing-notebook\",\n",
" \"model\": \"gpt-3.5-turbo\",\n",
" \"prompt\": \"5d466cbc\",\n",
" },\n",
")\n",
"\n",
"# Sometimes, the agent will error due to parsing issues, incompatible tool inputs, etc.\n",
"# These are logged as warnings here and captured as errors in the tracing UI."
]
},
{
"cell_type": "markdown",
"id": "cdacd159-eb4d-49e9-bb2a-c55322c40ed4",
"metadata": {
"tags": []
},
"source": [
"### Review the test results\n",
"\n",
"You can review the test results tracing UI below by clicking the URL in the output above or navigating to the \"Testing & Datasets\" page in LangSmith **\"agent-qa-{unique_id}\"** dataset. \n",
"\n",
"![test results](./img/test_results.png)\n",
"\n",
"This will show the new runs and the feedback logged from the selected evaluators. You can also explore a summary of the results in tabular format below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9da60638-5be8-4b5f-a721-2c6627aeaf0c",
"metadata": {},
"outputs": [],
"source": [
"chain_results.to_dataframe()"
]
},
{
"cell_type": "markdown",
"id": "13aad317-73ff-46a7-a5a0-60b5b5295f02",
"metadata": {},
"source": [
"### (Optional) Compare to another prompt\n",
"\n",
"Now that we have our test run results, we can make changes to our agent and benchmark them. Let's try this again with a different prompt and see the results."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5eeb023f-ded2-4d0f-b910-2a57d9675853",
"metadata": {},
"outputs": [],
"source": [
"candidate_prompt = hub.pull(\"wfh/langsmith-agent-prompt:39f3bbd0\")\n",
"\n",
"chain_results = run_on_dataset(\n",
" dataset_name=dataset_name,\n",
" llm_or_chain_factory=functools.partial(\n",
" create_agent, prompt=candidate_prompt, llm_with_tools=llm_with_tools\n",
" ),\n",
" evaluation=evaluation_config,\n",
" verbose=True,\n",
" client=client,\n",
" project_name=f\"tools-agent-test-39f3bbd0-{unique_id}\",\n",
" project_metadata={\n",
" \"env\": \"testing-notebook\",\n",
" \"model\": \"gpt-3.5-turbo\",\n",
" \"prompt\": \"39f3bbd0\",\n",
" },\n",
")"
]
},
{
"cell_type": "markdown",
"id": "9fafd1dd-debf-4256-a609-a6b3a7c52c49",
"metadata": {},
"source": [
"## Exporting datasets and runs\n",
"\n",
"LangSmith lets you export data to common formats such as CSV or JSONL directly in the web app. You can also use the client to fetch runs for further analysis, to store in your own database, or to share with others. Let's fetch the run traces from the evaluation run.\n",
"\n",
"**Note: It may be a few moments before all the runs are accessible.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33bfefde-d1bb-4f50-9f7a-fd572ee76820",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"runs = client.list_runs(project_name=chain_results[\"project_name\"], execution_order=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f45de493-8f2a-43b9-a979-6fe694948532",
"metadata": {},
"outputs": [],
"source": [
"# The resulting tests are stored in a project. You can programmatically\n",
"# access important metadata from the test, such as the dataset version it was run on\n",
"# or your application's revision ID.\n",
"client.read_project(project_name=chain_results[\"project_name\"]).metadata"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6595c888-1f5c-4ae3-9390-0a559f5575d1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# After some time, the test metrics will be populated as well.\n",
"client.read_project(project_name=chain_results[\"project_name\"]).feedback_stats"
]
},
{
"cell_type": "markdown",
"id": "2646f0fb-81d4-43ce-8a9b-54b8e19841e2",
"metadata": {
"tags": []
},
"source": [
"## Conclusion\n",
"\n",
"Congratulations! You have successfully traced and evaluated an agent using LangSmith!\n",
"\n",
"This was a quick guide to get started, but there are many more ways to use LangSmith to speed up your developer flow and produce better results.\n",
"\n",
"For more information on how you can get the most out of LangSmith, check out [LangSmith documentation](https://docs.smith.langchain.com/), and please reach out with questions, feature requests, or feedback at [support@langchain.dev](mailto:support@langchain.dev)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -23,7 +23,7 @@ The following features have been added during the development of 0.1.x:
## Whats coming to LangChain?
- Weve been working hard on [langgraph](https://python.langchain.com/docs/langgraph/). We will be building more capabilities on top of it and focusing on making it the go-to framework for agent architectures.
- Weve been working hard on [langgraph](https://langchain-ai.github.io/langgraph/). We will be building more capabilities on top of it and focusing on making it the go-to framework for agent architectures.
- Vectorstores V2! Well be revisiting our vectorstores abstractions to help improve usability and reliability.
- Better documentation and versioned docs!
- Were planning a breaking release (0.3.0) sometime between July-September to [upgrade to full support of Pydantic 2](https://github.com/langchain-ai/langchain/discussions/19339), and will drop support for Pydantic 1 (including objects originating from the `v1` namespace of Pydantic 2).

@ -55,11 +55,15 @@ module.exports = {
collapsible: false,
items: [
{
type: "doc",
label: "🦜🛠️ LangSmith",
id: "langsmith/index",
type: "link",
href: "https://docs.smith.langchain.com/",
label: "🦜🛠️ LangSmith"
},
{
type: "link",
href: "https://langchain-ai.github.io/langgraph/",
label: "🦜🕸️ LangGraph"
},
"langgraph",
"langserve",
],
},

@ -9,6 +9,14 @@
}
],
"redirects": [
{
"source": "/v0.2/docs/langsmith(/?)",
"destination": "https://docs.smith.langchain.com/"
},
{
"source": "/v0.2/docs/langgraph(/?)",
"destination": "https://langchain-ai.github.io/langgraph"
},
{
"source": "/",
"destination": "/v0.2/docs/introduction/"

Loading…
Cancel
Save