how to: Update streaming LLM information (#21381)

Update information in streaming llm how-to. This is mirroring the changes in how to stream chat models.
3 weeks ago · b47148bbed
parent a27cab6af0
commit b47148bbed
1 changed files with 109 additions and 50 deletions
--- a/docs/versioned_docs/version-0.2.x/how_to/streaming_llm.ipynb
+++ b/docs/versioned_docs/version-0.2.x/how_to/streaming_llm.ipynb
@ -7,18 +7,40 @@
   "source": [
    "# How to stream responses from an LLM\n",
    "\n",
-    "All `LLM`s implement the `Runnable` interface, which comes with default implementations of all methods, ie. ainvoke, batch, abatch, stream, astream. This gives all `LLM`s basic support for streaming.\n",
+    "All `LLM`s implement the [Runnable interface](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable), which comes with **default** implementations of standard runnable methods (i.e. `ainvoke`, `batch`, `abatch`, `stream`, `astream`, `astream_events`).\n",
    "\n",
-    "Streaming support defaults to returning an Iterator (or AsyncIterator in the case of async streaming) of a single value, the final result returned by the underlying `LLM` provider. This obviously doesn't give you token-by-token streaming, which requires native support from the `LLM` provider, but ensures your code that expects an iterator of tokens can work for any of our `LLM` integrations.\n",
+    "The **default** streaming implementations provide an`Iterator` (or `AsyncIterator` for asynchronous streaming) that yields a single value: the final output from the underlying chat model provider.\n",
    "\n",
-    "See which [integrations support token-by-token streaming here](/docs/integrations/llms/)."
+    "The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.\n",
+    "\n",
+    "See which [integrations support token-by-token streaming here](/docs/integrations/llms/).\n",
+    "\n",
+    "\n",
+    "\n",
+    ":::{.callout-note}\n",
+    "\n",
+    "The **default** implementation does **not** provide support for token-by-token streaming, but it ensures that the model can be swapped in for any other model as it supports the same standard interface.\n",
+    "\n",
+    ":::"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f13124a-7f9d-404f-b7ac-70d8ea49ef8e",
+   "metadata": {},
+   "source": [
+    "## Sync stream\n",
+    "\n",
+    "Below we use a `|` to help visualize the delimiter between tokens."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 1,
   "id": "9baa0527-b97d-41d3-babd-472ec5e59e3e",
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
   "outputs": [
    {
     "name": "stdout",
@ -26,47 +48,49 @@
     "text": [
      "\n",
      "\n",
-      "Verse 1:\n",
-      "Bubbles dancing in my glass\n",
-      "Clear and crisp, it's such a blast\n",
-      "Refreshing taste, it's like a dream\n",
-      "Sparkling water, you make me beam\n",
-      "\n",
-      "Chorus:\n",
-      "Oh sparkling water, you're my delight\n",
-      "With every sip, you make me feel so right\n",
-      "You're like a party in my mouth\n",
-      "I can't get enough, I'm hooked no doubt\n",
-      "\n",
-      "Verse 2:\n",
-      "No sugar, no calories, just pure bliss\n",
-      "You're the perfect drink, I must confess\n",
-      "From lemon to lime, so many flavors to choose\n",
-      "Sparkling water, you never fail to amuse\n",
-      "\n",
-      "Chorus:\n",
-      "Oh sparkling water, you're my delight\n",
-      "With every sip, you make me feel so right\n",
-      "You're like a party in my mouth\n",
-      "I can't get enough, I'm hooked no doubt\n",
-      "\n",
-      "Bridge:\n",
-      "Some may say you're just plain water\n",
-      "But to me, you're so much more\n",
-      "You bring a sparkle to my day\n",
-      "In every single way\n",
+      "|Spark|ling| water|,| oh| so clear|\n",
+      "|Bubbles dancing|,| without| fear|\n",
+      "|Refreshing| taste|,| a| pure| delight|\n",
+      "|Spark|ling| water|,| my| thirst|'s| delight||"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_openai import OpenAI\n",
+    "\n",
+    "llm = OpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0, max_tokens=512)\n",
+    "for chunk in llm.stream(\"Write me a 1 verse song about sparkling water.\"):\n",
+    "    print(chunk, end=\"|\", flush=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "596e477b-a41d-4ff5-9b9a-a7bfb53c3680",
+   "metadata": {},
+   "source": [
+    "## Async streaming\n",
+    "\n",
+    "Let's see how to stream in an async setting using `astream`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "d81140f2-384b-4470-bf93-957013c6620b",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
      "\n",
-      "Chorus:\n",
-      "Oh sparkling water, you're my delight\n",
-      "With every sip, you make me feel so right\n",
-      "You're like a party in my mouth\n",
-      "I can't get enough, I'm hooked no doubt\n",
      "\n",
-      "Outro:\n",
-      "So here's to you, my dear sparkling water\n",
-      "You'll always be my go-to drink forever\n",
-      "With your effervescence and refreshing taste\n",
-      "You'll always have a special place."
+      "|Spark|ling| water|,| oh| so clear|\n",
+      "|Bubbles dancing|,| without| fear|\n",
+      "|Refreshing| taste|,| a| pure| delight|\n",
+      "|Spark|ling| water|,| my| thirst|'s| delight||"
     ]
    }
   ],
@ -74,17 +98,52 @@
    "from langchain_openai import OpenAI\n",
    "\n",
    "llm = OpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0, max_tokens=512)\n",
-    "for chunk in llm.stream(\"Write me a song about sparkling water.\"):\n",
-    "    print(chunk, end=\"\", flush=True)"
+    "async for chunk in llm.astream(\"Write me a 1 verse song about sparkling water.\"):\n",
+    "    print(chunk, end=\"|\", flush=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9ab11306-b0db-4459-a9de-ecefb821c9b1",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## Async event streaming\n",
+    "\n",
+    "\n",
+    "LLMs also support the standard [astream events](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream_events) method.\n",
+    "\n",
+    ":::{.callout-tip}\n",
+    "\n",
+    "`astream_events` is most useful when implementing streaming in a larger LLM application that contains multiple steps (e.g., an application that involves an `agent`).\n",
+    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "d81140f2-384b-4470-bf93-957013c6620b",
-   "metadata": {},
+   "id": "399d74c7-4438-4093-ae05-47fed0255626",
+   "metadata": {
+    "tags": []
+   },
   "outputs": [],
-   "source": []
+   "source": [
+    "from langchain_openai import OpenAI\n",
+    "\n",
+    "llm = OpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0, max_tokens=512)\n",
+    "\n",
+    "idx = 0\n",
+    "\n",
+    "async for event in llm.astream_events(\n",
+    "    \"Write me a 1 verse song about goldfish on the moon\", version=\"v1\"\n",
+    "):\n",
+    "    idx += 1\n",
+    "    if idx >= 5:  # Truncate the output\n",
+    "        print(\"...Truncated\")\n",
+    "        break\n",
+    "    print(event)"
+   ]
  }
 ],
 "metadata": {
@ -103,7 +162,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.1"
+   "version": "3.11.4"
  }
 },
 "nbformat": 4,