"All `LLM`s implement the `Runnable` interface, which comes with default implementations of all methods, ie. ainvoke, batch, abatch, stream, astream. This gives all `LLM`s basic support for streaming.\n",
"All `LLM`s implement the [Runnable interface](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable), which comes with **default** implementations of standard runnable methods (i.e. `ainvoke`, `batch`, `abatch`, `stream`, `astream`, `astream_events`).\n",
"\n",
"Streaming support defaults to returning an Iterator (or AsyncIterator in the case of async streaming) of a single value, the final result returned by the underlying `LLM` provider. This obviously doesn't give you token-by-token streaming, which requires native support from the `LLM` provider, but ensures your code that expects an iterator of tokens can work for any of our `LLM` integrations.\n",
"The **default** streaming implementations provide an`Iterator` (or `AsyncIterator` for asynchronous streaming) that yields a single value: the final output from the underlying chat model provider.\n",
"\n",
"See which [integrations support token-by-token streaming here](/docs/integrations/llms/)."
"The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.\n",
"\n",
"See which [integrations support token-by-token streaming here](/docs/integrations/llms/).\n",
"\n",
"\n",
"\n",
":::{.callout-note}\n",
"\n",
"The **default** implementation does **not** provide support for token-by-token streaming, but it ensures that the model can be swapped in for any other model as it supports the same standard interface.\n",
"\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "2f13124a-7f9d-404f-b7ac-70d8ea49ef8e",
"metadata": {},
"source": [
"## Sync stream\n",
"\n",
"Below we use a `|` to help visualize the delimiter between tokens."
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 1,
"id": "9baa0527-b97d-41d3-babd-472ec5e59e3e",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
@ -26,47 +48,49 @@
"text": [
"\n",
"\n",
"Verse 1:\n",
"Bubbles dancing in my glass\n",
"Clear and crisp, it's such a blast\n",
"Refreshing taste, it's like a dream\n",
"Sparkling water, you make me beam\n",
"\n",
"Chorus:\n",
"Oh sparkling water, you're my delight\n",
"With every sip, you make me feel so right\n",
"You're like a party in my mouth\n",
"I can't get enough, I'm hooked no doubt\n",
"\n",
"Verse 2:\n",
"No sugar, no calories, just pure bliss\n",
"You're the perfect drink, I must confess\n",
"From lemon to lime, so many flavors to choose\n",
"for chunk in llm.stream(\"Write me a song about sparkling water.\"):\n",
" print(chunk, end=\"\", flush=True)"
"async for chunk in llm.astream(\"Write me a 1 verse song about sparkling water.\"):\n",
" print(chunk, end=\"|\", flush=True)"
]
},
{
"cell_type": "markdown",
"id": "9ab11306-b0db-4459-a9de-ecefb821c9b1",
"metadata": {
"tags": []
},
"source": [
"## Async event streaming\n",
"\n",
"\n",
"LLMs also support the standard [astream events](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream_events) method.\n",
"\n",
":::{.callout-tip}\n",
"\n",
"`astream_events` is most useful when implementing streaming in a larger LLM application that contains multiple steps (e.g., an application that involves an `agent`).\n",