langchain/docs/docs/guides/fallbacks.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "19c9cbd6",
   "metadata": {},
   "source": [
    "# Fallbacks\n",
    "\n",
    "When working with language models, you may often encounter issues from the underlying APIs, whether these be rate limiting or downtime. Therefore, as you go to move your LLM applications into production it becomes more and more important to safeguard against these. That's why we've introduced the concept of fallbacks. \n",
    "\n",
    "A **fallback** is an alternative plan that may be used in an emergency.\n",
    "\n",
    "Crucially, fallbacks can be applied not only on the LLM level but on the whole runnable level. This is important because often times different models require different prompts. So if your call to OpenAI fails, you don't just want to send the same prompt to Anthropic - you probably want to use a different prompt template and send a different version there."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6bb9ba9",
   "metadata": {},
   "source": [
    "## Fallback for LLM API Errors\n",
    "\n",
    "This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API could be down, you could have hit rate limits, any number of things. Therefore, using fallbacks can help protect against these types of things.\n",
    "\n",
    "IMPORTANT: By default, a lot of the LLM wrappers catch errors and retry. You will most likely want to turn those off when working with fallbacks. Otherwise the first wrapper will keep on retrying and not failing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d3e893bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.chat_models import ChatAnthropic\n",
    "from langchain_openai import ChatOpenAI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4847c82d",
   "metadata": {},
   "source": [
    "First, let's mock out what happens if we hit a RateLimitError from OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "dfdd8bf5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from unittest.mock import patch\n",
    "\n",
    "import httpx\n",
    "from openai import RateLimitError\n",
    "\n",
    "request = httpx.Request(\"GET\", \"/\")\n",
    "response = httpx.Response(200, request=request)\n",
    "error = RateLimitError(\"rate limit\", response=response, body=\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e6fdffc1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Note that we set max_retries = 0 to avoid retrying on RateLimits, etc\n",
    "openai_llm = ChatOpenAI(max_retries=0)\n",
    "anthropic_llm = ChatAnthropic()\n",
    "llm = openai_llm.with_fallbacks([anthropic_llm])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "584461ab",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hit error\n"
     ]
    }
   ],
   "source": [
    "# Let's use just the OpenAI LLm first, to show that we run into an error\n",
    "with patch(\"openai.resources.chat.completions.Completions.create\", side_effect=error):\n",
    "    try:\n",
    "        print(openai_llm.invoke(\"Why did the chicken cross the road?\"))\n",
    "    except RateLimitError:\n",
    "        print(\"Hit error\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "4fc1e673",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "content=' I don\\'t actually know why the chicken crossed the road, but here are some possible humorous answers:\\n\\n- To get to the other side!\\n\\n- It was too chicken to just stand there. \\n\\n- It wanted a change of scenery.\\n\\n- It wanted to show the possum it could be done.\\n\\n- It was on its way to a poultry farmers\\' convention.\\n\\nThe joke plays on the double meaning of \"the other side\" - literally crossing the road to the other side, or the \"other side\" meaning the afterlife. So it\\'s an anti-joke, with a silly or unexpected pun as the answer.' additional_kwargs={} example=False\n"
     ]
    }
   ],
   "source": [
    "# Now let's try with fallbacks to Anthropic\n",
    "with patch(\"openai.resources.chat.completions.Completions.create\", side_effect=error):\n",
    "    try:\n",
    "        print(llm.invoke(\"Why did the chicken cross the road?\"))\n",
    "    except RateLimitError:\n",
    "        print(\"Hit error\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f00bea25",
   "metadata": {},
   "source": [
    "We can use our \"LLM with Fallbacks\" as we would a normal LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "4f8eaaa0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "content=\" I don't actually know why the kangaroo crossed the road, but I can take a guess! Here are some possible reasons:\\n\\n- To get to the other side (the classic joke answer!)\\n\\n- It was trying to find some food or water \\n\\n- It was trying to find a mate during mating season\\n\\n- It was fleeing from a predator or perceived threat\\n\\n- It was disoriented and crossed accidentally \\n\\n- It was following a herd of other kangaroos who were crossing\\n\\n- It wanted a change of scenery or environment \\n\\n- It was trying to reach a new habitat or territory\\n\\nThe real reason is unknown without more context, but hopefully one of those potential explanations does the joke justice! Let me know if you have any other animal jokes I can try to decipher.\" additional_kwargs={} example=False\n"
     ]
    }
   ],
   "source": [
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "\n",
    "prompt = ChatPromptTemplate.from_messages(\n",
    "    [\n",
    "        (\n",
    "            \"system\",\n",
    "            \"You're a nice assistant who always includes a compliment in your response\",\n",
    "        ),\n",
    "        (\"human\", \"Why did the {animal} cross the road\"),\n",
    "    ]\n",
    ")\n",
    "chain = prompt | llm\n",
    "with patch(\"openai.resources.chat.completions.Completions.create\", side_effect=error):\n",
    "    try:\n",
    "        print(chain.invoke({\"animal\": \"kangaroo\"}))\n",
    "    except RateLimitError:\n",
    "        print(\"Hit error\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d62241b",
   "metadata": {},
   "source": [
    "## Fallback for Sequences\n",
    "\n",
    "We can also create fallbacks for sequences, that are sequences themselves. Here we do that with two different models: ChatOpenAI and then normal OpenAI (which does not use a chat model). Because OpenAI is NOT a chat model, you likely want a different prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "6d0b8056",
   "metadata": {},
   "outputs": [],
   "source": [
    "# First let's create a chain with a ChatModel\n",
    "# We add in a string output parser here so the outputs between the two are the same type\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "\n",
    "chat_prompt = ChatPromptTemplate.from_messages(\n",
    "    [\n",
    "        (\n",
    "            \"system\",\n",
    "            \"You're a nice assistant who always includes a compliment in your response\",\n",
    "        ),\n",
    "        (\"human\", \"Why did the {animal} cross the road\"),\n",
    "    ]\n",
    ")\n",
    "# Here we're going to use a bad model name to easily create a chain that will error\n",
    "chat_model = ChatOpenAI(model_name=\"gpt-fake\")\n",
    "bad_chain = chat_prompt | chat_model | StrOutputParser()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "8d1fc2a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now lets create a chain with the normal OpenAI model\n",
    "from langchain.prompts import PromptTemplate\n",
    "from langchain_openai import OpenAI\n",
    "\n",
    "prompt_template = \"\"\"Instructions: You should always include a compliment in your response.\n",
    "\n",
    "Question: Why did the {animal} cross the road?\"\"\"\n",
    "prompt = PromptTemplate.from_template(prompt_template)\n",
    "llm = OpenAI()\n",
    "good_chain = prompt | llm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "283bfa44",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\n\\nAnswer: The turtle crossed the road to get to the other side, and I have to say he had some impressive determination.'"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# We can now create a final chain which combines the two\n",
    "chain = bad_chain.with_fallbacks([good_chain])\n",
    "chain.invoke({\"animal\": \"turtle\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec4685b4",
   "metadata": {},
   "source": [
    "## Fallback for Long Inputs\n",
    "\n",
    "One of the big limiting factors of LLMs is their context window. Usually, you can count and track the length of prompts before sending them to an LLM, but in situations where that is hard/complicated, you can fallback to a model with a longer context length."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "564b84c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "short_llm = ChatOpenAI()\n",
    "long_llm = ChatOpenAI(model=\"gpt-3.5-turbo-16k\")\n",
    "llm = short_llm.with_fallbacks([long_llm])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "5e27a775",
   "metadata": {},
   "outputs": [],
   "source": [
    "inputs = \"What is the next number: \" + \", \".join([\"one\", \"two\"] * 3000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "0a502731",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This model's maximum context length is 4097 tokens. However, your messages resulted in 12012 tokens. Please reduce the length of the messages.\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    print(short_llm.invoke(inputs))\n",
    "except Exception as e:\n",
    "    print(e)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "d91ba5d7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "content='The next number in the sequence is two.' additional_kwargs={} example=False\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    print(llm.invoke(inputs))\n",
    "except Exception as e:\n",
    "    print(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a6735df",
   "metadata": {},
   "source": [
    "## Fallback to Better Model\n",
    "\n",
    "Often times we ask models to output format in a specific format (like JSON). Models like GPT-3.5 can do this okay, but sometimes struggle. This naturally points to fallbacks - we can try with GPT-3.5 (faster, cheaper), but then if parsing fails we can use GPT-4."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "867a3793",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.output_parsers import DatetimeOutputParser"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "b8d9959d",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt = ChatPromptTemplate.from_template(\n",
    "    \"what time was {event} (in %Y-%m-%dT%H:%M:%S.%fZ format - only return this value)\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "98087a76",
   "metadata": {},
   "outputs": [],
   "source": [
    "# In this case we are going to do the fallbacks on the LLM + output parser level\n",
    "# Because the error will get raised in the OutputParser\n",
    "openai_35 = ChatOpenAI() | DatetimeOutputParser()\n",
    "openai_4 = ChatOpenAI(model=\"gpt-4\") | DatetimeOutputParser()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "id": "17ec9e8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "only_35 = prompt | openai_35\n",
    "fallback_4 = prompt | openai_35.with_fallbacks([openai_4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "id": "7e536f0b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Error: Could not parse datetime string: The Super Bowl in 1994 took place on January 30th at 3:30 PM local time. Converting this to the specified format (%Y-%m-%dT%H:%M:%S.%fZ) results in: 1994-01-30T15:30:00.000Z\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    print(only_35.invoke({\"event\": \"the superbowl in 1994\"}))\n",
    "except Exception as e:\n",
    "    print(f\"Error: {e}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "id": "01355c5e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1994-01-30 15:30:00\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    print(fallback_4.invoke({\"event\": \"the superbowl in 1994\"}))\n",
    "except Exception as e:\n",
    "    print(f\"Error: {e}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c537f9d0",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}