mirror of
https://github.com/hwchase17/langchain
synced 2024-11-08 07:10:35 +00:00
community: Overhaul MLflow Integration documentation (#23067)
This commit is contained in:
parent
e62f8f143f
commit
0e916d0d55
@ -7,29 +7,19 @@
|
||||
"source": [
|
||||
"# MLflow\n",
|
||||
"\n",
|
||||
">[MLflow](https://www.mlflow.org/docs/latest/what-is-mlflow) is a versatile, expandable, open-source platform for managing workflows and artifacts across the machine learning lifecycle. It has built-in integrations with many popular ML libraries, but can be used with any library, algorithm, or deployment tool. It is designed to be extensible, so you can write plugins to support new workflows, libraries, and tools.\n",
|
||||
">[MLflow](https://mlflow.org/) is a versatile, open-source platform for managing workflows and artifacts across the machine learning lifecycle. It has built-in integrations with many popular ML libraries, but can be used with any library, algorithm, or deployment tool. It is designed to be extensible, so you can write plugins to support new workflows, libraries, and tools.\n",
|
||||
"\n",
|
||||
"This notebook goes over how to track your LangChain experiments into your `MLflow Server`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ea73efae-7182-4a89-a492-c865b1fcf981",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## External examples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "97361a84-4e8f-45ba-b291-814cf73cd8f2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`MLflow` provides [several examples](https://github.com/mlflow/mlflow/tree/master/examples/langchain) for the `LangChain` integration:\n",
|
||||
"- [simple_chain](https://github.com/mlflow/mlflow/blob/master/examples/langchain/simple_chain.py)\n",
|
||||
"- [simple_agent](https://github.com/mlflow/mlflow/blob/master/examples/langchain/simple_agent.py)\n",
|
||||
"- [retriever_chain](https://github.com/mlflow/mlflow/blob/master/examples/langchain/retriever_chain.py)\n",
|
||||
"- [retrieval_qa_chain](https://github.com/mlflow/mlflow/blob/master/examples/langchain/retrieval_qa_chain.py)\n"
|
||||
"In the context of LangChain integration, MLflow provides the following capabilities:\n",
|
||||
"\n",
|
||||
"- **Experiment Tracking**: MLflow tracks and stores artifacts from your LangChain experiments, including models, code, prompts, metrics, and more.\n",
|
||||
"- **Dependency Management**: MLflow automatically records model dependencies, ensuring consistency between development and production environments.\n",
|
||||
"- **Model Evaluation** MLflow offers native capabilities for evaluating LangChain applications.\n",
|
||||
"- **Tracing**: MLflow allows you to visually trace data flows through your LangChain chain, agent, retriever, or other components.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"**Note**: The tracing capability is only available in MLflow versions 2.14.0 and later.\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to track your LangChain experiments using MLflow. For more information about this feature and to explore tutorials and examples of using LangChain with MLflow, please refer to the [MLflow documentation for LangChain integration](https://mlflow.org/docs/latest/llms/langchain/index.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -37,7 +27,37 @@
|
||||
"id": "e0cbd74b-1542-45a4-a72b-b2eedeffd2e0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example"
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"Install MLflow Python package:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7fb27b941602401d91542211134fc71a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install google-search-results num"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "42406548",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install mlflow -qU"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8e626bb4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This example utilizes the OpenAI LLM. Feel free to skip the command below and proceed with a different LLM if desired."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -47,142 +67,535 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet azureml-mlflow\n",
|
||||
"%pip install --upgrade --quiet pandas\n",
|
||||
"%pip install --upgrade --quiet textstat\n",
|
||||
"%pip install --upgrade --quiet spacy\n",
|
||||
"%pip install --upgrade --quiet langchain-openai\n",
|
||||
"%pip install --upgrade --quiet google-search-results\n",
|
||||
"!python -m spacy download en_core_web_sm"
|
||||
"%pip install langchain-openai -qU"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bf8e1f5c",
|
||||
"execution_count": 6,
|
||||
"id": "7e87b21d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"# Set MLflow tracking URI if you have MLflow Tracking Server running\n",
|
||||
"os.environ[\"MLFLOW_TRACKING_URI\"] = \"\"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"\"\n",
|
||||
"os.environ[\"SERPAPI_API_KEY\"] = \"\""
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "84616d96",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To begin, let's create a dedicated MLflow experiment in order track our model and artifacts. While you can opt to skip this step and use the default experiment, we strongly recommend organizing your runs and artifacts into separate experiments to avoid clutter and maintain a clean, structured workflow."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fd49fd45",
|
||||
"id": "155d2a6f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.callbacks import MlflowCallbackHandler\n",
|
||||
"from langchain_openai import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "578cac8c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\"\"\"Main function.\n",
|
||||
"import mlflow\n",
|
||||
"\n",
|
||||
"This function is used to try the callback handler.\n",
|
||||
"Scenarios:\n",
|
||||
"1. OpenAI LLM\n",
|
||||
"2. Chain with multiple SubChains on multiple generations\n",
|
||||
"3. Agent with Tools\n",
|
||||
"\"\"\"\n",
|
||||
"mlflow_callback = MlflowCallbackHandler()\n",
|
||||
"llm = OpenAI(\n",
|
||||
" model_name=\"gpt-3.5-turbo\", temperature=0, callbacks=[mlflow_callback], verbose=True\n",
|
||||
"mlflow.set_experiment(\"LangChain MLflow Integration\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "48accc76",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"Integrate MLflow with your LangChain Application using one of the following methods:\n",
|
||||
"\n",
|
||||
"1. **Autologging**: Enable seamless tracking with the `mlflow.langchain.autolog()` command, our recommended first option for leveraging the LangChain MLflow integration.\n",
|
||||
"2. **Manual Logging**: Use MLflow APIs to log LangChain chains and agents, providing fine-grained control over what to track in your experiment.\n",
|
||||
"3. **Custom Callbacks**: Pass MLflow callbacks manually when invoking chains, allowing for semi-automated customization of your workload, such as tracking specific invocations."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c3f10055",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Scenario 1: MLFlow Autologging"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "71118a27",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To get started with autologging, simply call `mlflow.langchain.autolog()`. In this example, we set the `log_models` parameter to `True`, which allows the chain definition and its dependency libraries to be recorded as an MLflow model, providing a comprehensive tracking experience."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 39,
|
||||
"id": "5b08145f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import mlflow\n",
|
||||
"\n",
|
||||
"mlflow.langchain.autolog(\n",
|
||||
" # These are optional configurations to control what information should be logged automatically (default: False)\n",
|
||||
" # For the full list of the arguments, refer to https://mlflow.org/docs/latest/llms/langchain/autologging.html#id1\n",
|
||||
" log_models=True,\n",
|
||||
" log_input_examples=True,\n",
|
||||
" log_model_signatures=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9b20acae",
|
||||
"cell_type": "markdown",
|
||||
"id": "f0570c18",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# SCENARIO 1 - LLM\n",
|
||||
"llm_result = llm.generate([\"Tell me a joke\"])\n",
|
||||
"\n",
|
||||
"mlflow_callback.flush_tracker(llm)"
|
||||
"### Define a Chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8b872046",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain_core.prompts import PromptTemplate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 40,
|
||||
"id": "1b2627ef",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# SCENARIO 2 - Chain\n",
|
||||
"template = \"\"\"You are a playwright. Given the title of play, it is your job to write a synopsis for that title.\n",
|
||||
"Title: {title}\n",
|
||||
"Playwright: This is a synopsis for the above play:\"\"\"\n",
|
||||
"prompt_template = PromptTemplate(input_variables=[\"title\"], template=template)\n",
|
||||
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callbacks=[mlflow_callback])\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"test_prompts = [\n",
|
||||
" {\n",
|
||||
" \"title\": \"documentary about good video games that push the boundary of game design\"\n",
|
||||
" },\n",
|
||||
"llm = ChatOpenAI(model_name=\"gpt-4o\", temperature=0)\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"You are a helpful assistant that translates {input_language} to {output_language}.\",\n",
|
||||
" ),\n",
|
||||
" (\"human\", \"{input}\"),\n",
|
||||
" ]\n",
|
||||
"synopsis_chain.apply(test_prompts)\n",
|
||||
"mlflow_callback.flush_tracker(synopsis_chain)"
|
||||
")\n",
|
||||
"parser = StrOutputParser()\n",
|
||||
"\n",
|
||||
"chain = prompt | llm | parser"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a5b38bae",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Invoke the Chain\n",
|
||||
"\n",
|
||||
"Note that this step may take a few seconds longer than usual, as MLflow runs several background tasks in the background to log models, traces, and artifacts to the tracking server."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 41,
|
||||
"id": "a1df4bc8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Ich liebe das Programmieren.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 41,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"test_input = {\n",
|
||||
" \"input_language\": \"English\",\n",
|
||||
" \"output_language\": \"German\",\n",
|
||||
" \"input\": \"I love programming.\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"chain.invoke(test_input)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5173cdd4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Take a moment to explore the MLflow Tracking UI, where you can gain a deeper understanding of what information are being logged.\n",
|
||||
"* **Traces** - Navigate to the \"Traces\" tab in the experiment and click the request ID link of the first row. The displayed trace tree visualizes the call stack of your chain invocation, providing you with a deep insight into how each component is executed within the chain.\n",
|
||||
"* **MLflow Model** - As we set `log_model=True`, MLflow automatically creates an MLflow Run to track your chain definition. Navigate to the newest Run page and open the \"Artifacts\" tab, which lists file artifacts logged as an MLflow Model, including dependencies, input examples, model signatures, and more.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "36179573",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Invoke the Logged Chain\n",
|
||||
"\n",
|
||||
"Next, let's load the model back and verify that we can reproduce the same prediction, ensuring consistency and reliability.\n",
|
||||
"\n",
|
||||
"There are two ways to load the model\n",
|
||||
"1. `mlflow.langchain.load_model(MODEL_URI)` - This loads the model as the original LangChain object.\n",
|
||||
"2. `mlflow.pyfunc.load_model(MODEL_URI)` - This loads the model within the `PythonModel` wrapper and encapsulates the prediction logic with the `predict()` API, which contains additional logic such as schema enforcement."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 42,
|
||||
"id": "a8e39d72",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Ich liebe Programmieren.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 42,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Replace YOUR_RUN_ID with the Run ID displayed on the MLflow UI\n",
|
||||
"loaded_model = mlflow.langchain.load_model(\"runs:/{YOUR_RUN_ID}/model\")\n",
|
||||
"loaded_model.invoke(test_input)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 57,
|
||||
"id": "9619356d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['Ich liebe das Programmieren.']"
|
||||
]
|
||||
},
|
||||
"execution_count": 57,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pyfunc_model = mlflow.pyfunc.load_model(\"runs:/{YOUR_RUN_ID}/model\")\n",
|
||||
"pyfunc_model.predict(test_input)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "eb23a78c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Configure Autologging\n",
|
||||
"\n",
|
||||
"The `mlflow.langchain.autolog()` function offers several parameters that allow for fine-grained control over the artifacts logged to MLflow. For a comprehensive list of available configurations, please refer to the latest [MLflow LangChain Autologging Documentation](https://mlflow.org/docs/latest/llms/langchain/autologging.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1bf6bb02",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Scenario 2: Manually Logging an Agent from Code"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6e447a02",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"#### Prerequisites\n",
|
||||
"\n",
|
||||
"This example uses `SerpAPI`, a search engine API, as a tool for the agent to retrieve Google Search results. LangChain is natively integrated with `SerpAPI`, allowing you to configure the tool for your agent with just one line of code.\n",
|
||||
"\n",
|
||||
"To get started:\n",
|
||||
"\n",
|
||||
"* Install the required Python package via pip: `pip install google-search-results numexpr`.\n",
|
||||
"* Create an account at [SerpAPI's Official Website](https://serpapi.com/) and retrieve an API key.\n",
|
||||
"* Set the API key in the environment variable: `os.environ[\"SERPAPI_API_KEY\"] = \"YOUR_API_KEY\"`\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d0c914e3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Define an Agent\n",
|
||||
"\n",
|
||||
"In this example, we will log the agent definition **as code**, rather than directly feeding the Python object and saving it in a serialized format. This approach offers several benefits:\n",
|
||||
"\n",
|
||||
"1. **No serialization required**: By saving the model as code, we avoid the need for serialization, which can be problematic when working with components that don't natively support it. This approach also eliminates the risk of incompatibility issues when deserializing the model in a different environment.\n",
|
||||
"2. **Better transparency**: By inspecting the saved code file, you can gain valuable insights into what the model does. This is in contrast to serialized formats like pickle, where the model's behavior remains opaque until it's loaded back, potentially exposing security risks such as remote code execution.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9190a609",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, create a separate `.py` file that defines the agent instance.\n",
|
||||
"\n",
|
||||
"In the interest of time, you can run the following cell to generate a Python file `agent.py`, which contains the agent definition code. In actual dev scenario, you would define it in another notebook or hand-crafted python script."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 64,
|
||||
"id": "62b20e17",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"script_content = \"\"\"\n",
|
||||
"from langchain.agents import AgentType, initialize_agent, load_tools\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"import mlflow\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model_name=\"gpt-4o\", temperature=0)\n",
|
||||
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
|
||||
"agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)\n",
|
||||
"\n",
|
||||
"# IMPORTANT: call set_model() to register the instance to be logged.\n",
|
||||
"mlflow.models.set_model(agent)\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"with open(\"agent.py\", \"w\") as f:\n",
|
||||
" f.write(script_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "82a21f06",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Log the Agent\n",
|
||||
"\n",
|
||||
"Return to the original notebook and run the following cell to log the agent you've defined in the `agent.py` file.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 51,
|
||||
"id": "cd5b8bcc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"The agent is successfully logged to MLflow!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"How long would it take to drive to the Moon with F1 racing cars?\"\n",
|
||||
"\n",
|
||||
"with mlflow.start_run(run_name=\"search-math-agent\") as run:\n",
|
||||
" info = mlflow.langchain.log_model(\n",
|
||||
" lc_model=\"agent.py\", # Specify the relative code path to the agent definition\n",
|
||||
" artifact_path=\"model\",\n",
|
||||
" input_example=question,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"print(\"The agent is successfully logged to MLflow!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b4687052",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, open the MLflow UI and navigate to the \"Artifacts\" tab in the Run detail page. You should see that the `agent.py` file has been successfully logged, along with other model artifacts, such as dependencies, input examples, and more."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9011db62",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Invoke the Logged Agent\n",
|
||||
"\n",
|
||||
"Now load the agent back and invoke it. There are two ways to load the model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 53,
|
||||
"id": "b634b69d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Downloading artifacts: 100%|██████████| 10/10 [00:00<00:00, 331.57it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['It would take approximately 1194.5 hours to drive to the Moon with an F1 racing car.']"
|
||||
]
|
||||
},
|
||||
"execution_count": 53,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Let's turn on the autologging with default configuration, so we can see the trace for the agent invocation.\n",
|
||||
"mlflow.langchain.autolog()\n",
|
||||
"\n",
|
||||
"# Load the model back\n",
|
||||
"agent = mlflow.pyfunc.load_model(info.model_uri)\n",
|
||||
"\n",
|
||||
"# Invoke\n",
|
||||
"agent.predict(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "30bf6133",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Navigate to the **\"Traces\"** tab in the experiment and click the request ID link of the first row. The trace visualizes how the agent operate multiple tasks within the single prediction call:\n",
|
||||
"1. Determine what subtasks are required to answer the questions.\n",
|
||||
"2. Search for the speed of an F1 racing car.\n",
|
||||
"3. Search for the distance from Earth to Moon.\n",
|
||||
"4. Compute the division using LLM."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cbd10f34",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Scenario 3. Using MLflow Callbacks\n",
|
||||
"\n",
|
||||
"**MLflow Callbacks** provide a semi-automated way to track your LangChain application in MLflow. There are two primary callbacks available:\n",
|
||||
"\n",
|
||||
"1. **`MlflowLangchainTracer`:** Primarily used for generating traces, available in `mlflow >= 2.14.0`.\n",
|
||||
"2. **`MLflowCallbackHandler`:** Logs metrics and artifacts to the MLflow tracking server."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d013d309",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### MlflowLangchainTracer\n",
|
||||
"\n",
|
||||
"When the chain or agent is invoked with the `MlflowLangchainTracer` callback, MLflow will automatically generate a trace for the call stack and log it to the MLflow tracking server. The outcome is exactly same as `mlflow.langchain.autolog()`, but this is particularly useful when you want to only trace specific invocation. Autologging is applied to all invocation in the same notebook/script, on the other hand."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e002823a",
|
||||
"metadata": {
|
||||
"id": "_jN73xcPVEpI"
|
||||
},
|
||||
"id": "46d48044",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import AgentType, initialize_agent, load_tools"
|
||||
"from mlflow.langchain.langchain_tracer import MlflowLangchainTracer\n",
|
||||
"\n",
|
||||
"mlflow_tracer = MlflowLangchainTracer()\n",
|
||||
"\n",
|
||||
"# This call generates a trace\n",
|
||||
"chain.invoke(test_input, config={\"callbacks\": [mlflow_tracer]})\n",
|
||||
"\n",
|
||||
"# This call does not generate a trace\n",
|
||||
"chain.invoke(test_input)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "acb6692c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Where to Pass the Callback\n",
|
||||
" LangChain supports two ways of passing callback instances: (1) Request time callbacks - pass them to the `invoke` method or bind with `with_config()` (2) Constructor callbacks - set them in the chain constructor. When using the `MlflowLangchainTracer` as a callback, you **must use request time callbacks**. Setting it in the constructor instead will only apply the callback to the top-level object, preventing it from being propagated to child components, resulting in incomplete traces. For more information on this behavior, please refer to [Callbacks Documentation](https://python.langchain.com/v0.2/docs/concepts/#callbacks) for more details.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"# OK\n",
|
||||
"chain.invoke(test_input, config={\"callbacks\": [mlflow_tracer]})\n",
|
||||
"chain.with_config(callbacks=[mlflow_tracer])\n",
|
||||
"# NG\n",
|
||||
"chain = TheNameOfSomeChain(callbacks=[mlflow_tracer])\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d6a60ba7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Supported Methods\n",
|
||||
"\n",
|
||||
"`MlflowLangchainTracer` supports the following invocation methods from the [Runnable Interfaces](https://python.langchain.com/v0.1/docs/expression_language/interface/).\n",
|
||||
"- Standard interfaces: `invoke`, `stream`, `batch`\n",
|
||||
"- Async interfaces: `astream`, `ainvoke`, `abatch`, `astream_log`, `astream_events`\n",
|
||||
"\n",
|
||||
"Other methods are not guaranteed to be fully compatible."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a72e8854",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### MlflowCallbackHandler\n",
|
||||
"\n",
|
||||
"`MlflowCallbackHandler` is a callback handler that resides in the LangChain Community code base.\n",
|
||||
"\n",
|
||||
"This callback can be passed for chain/agent invocation, but it must be explicitly finished by calling the `flush_tracker()` method.\n",
|
||||
"\n",
|
||||
"When a chain is invoked with the callback, it performs the following actions:\n",
|
||||
"\n",
|
||||
"1. Creates a new MLflow Run or retrieves an active one if available within the active MLflow Experiment.\n",
|
||||
"2. Logs metrics such as the number of LLM calls, token usage, and other relevant metrics. If the chain/agent includes LLM call and you have `spacy` library installed, it logs text complexity metrics such as `flesch_kincaid_grade`.\n",
|
||||
"3. Logs internal steps as a JSON file (this is a legacy version of traces).\n",
|
||||
"4. Logs chain input and output as a Pandas Dataframe.\n",
|
||||
"5. Calls the `flush_tracker()` method with a chain/agent instance, logging the chain/agent as an MLflow Model.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "655bd47e",
|
||||
"metadata": {
|
||||
"id": "Gpq4rk6VT9cu"
|
||||
},
|
||||
"id": "b579aae1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# SCENARIO 3 - Agent with Tools\n",
|
||||
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm, callbacks=[mlflow_callback])\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools,\n",
|
||||
" llm,\n",
|
||||
" agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n",
|
||||
" callbacks=[mlflow_callback],\n",
|
||||
" verbose=True,\n",
|
||||
")\n",
|
||||
"agent.run(\n",
|
||||
" \"Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?\"\n",
|
||||
")\n",
|
||||
"mlflow_callback.flush_tracker(agent, finish=True)"
|
||||
"from langchain_community.callbacks import MlflowCallbackHandler\n",
|
||||
"\n",
|
||||
"mlflow_callback = MlflowCallbackHandler()\n",
|
||||
"\n",
|
||||
"chain.invoke(\"What is LangChain callback?\", config={\"callbacks\": [mlflow_callback]})\n",
|
||||
"\n",
|
||||
"mlflow_callback.flush_tracker()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "84924e35",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## References\n",
|
||||
"To learn more about the feature and visit tutorials and examples of using LangChain with MLflow, please refer to the [MLflow documentation for LangChain integration](https://mlflow.org/docs/latest/llms/langchain/index.html).\n",
|
||||
"\n",
|
||||
"`MLflow` also provides several [tutorials](https://mlflow.org/docs/latest/llms/langchain/index.html#getting-started-with-the-mlflow-langchain-flavor-tutorials-and-guides) and [examples](https://github.com/mlflow/mlflow/tree/master/examples/langchain) for the `LangChain` integration:\n",
|
||||
"- [Quick Start](https://mlflow.org/docs/latest/llms/langchain/notebooks/langchain-quickstart.html)\n",
|
||||
"- [RAG Tutorial](https://mlflow.org/docs/latest/llms/langchain/notebooks/langchain-retriever.html)\n",
|
||||
"- [Agent Example](https://github.com/mlflow/mlflow/blob/master/examples/langchain/simple_agent.py)"
|
||||
]
|
||||
}
|
||||
],
|
||||
@ -191,9 +604,9 @@
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"display_name": "tracing",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
"name": "tracing"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
@ -205,7 +618,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
"version": "3.12.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
Loading…
Reference in New Issue
Block a user