how to: Update streaming LLM information (#21381)

Update information in streaming llm how-to.

This is mirroring the changes in how to stream chat models.
pull/20363/head
Eugene Yurtsev 3 weeks ago committed by GitHub
parent a27cab6af0
commit b47148bbed
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -7,18 +7,40 @@
"source": [
"# How to stream responses from an LLM\n",
"\n",
"All `LLM`s implement the `Runnable` interface, which comes with default implementations of all methods, ie. ainvoke, batch, abatch, stream, astream. This gives all `LLM`s basic support for streaming.\n",
"All `LLM`s implement the [Runnable interface](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable), which comes with **default** implementations of standard runnable methods (i.e. `ainvoke`, `batch`, `abatch`, `stream`, `astream`, `astream_events`).\n",
"\n",
"Streaming support defaults to returning an Iterator (or AsyncIterator in the case of async streaming) of a single value, the final result returned by the underlying `LLM` provider. This obviously doesn't give you token-by-token streaming, which requires native support from the `LLM` provider, but ensures your code that expects an iterator of tokens can work for any of our `LLM` integrations.\n",
"The **default** streaming implementations provide an`Iterator` (or `AsyncIterator` for asynchronous streaming) that yields a single value: the final output from the underlying chat model provider.\n",
"\n",
"See which [integrations support token-by-token streaming here](/docs/integrations/llms/)."
"The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.\n",
"\n",
"See which [integrations support token-by-token streaming here](/docs/integrations/llms/).\n",
"\n",
"\n",
"\n",
":::{.callout-note}\n",
"\n",
"The **default** implementation does **not** provide support for token-by-token streaming, but it ensures that the model can be swapped in for any other model as it supports the same standard interface.\n",
"\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "2f13124a-7f9d-404f-b7ac-70d8ea49ef8e",
"metadata": {},
"source": [
"## Sync stream\n",
"\n",
"Below we use a `|` to help visualize the delimiter between tokens."
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 1,
"id": "9baa0527-b97d-41d3-babd-472ec5e59e3e",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
@ -26,47 +48,49 @@
"text": [
"\n",
"\n",
"Verse 1:\n",
"Bubbles dancing in my glass\n",
"Clear and crisp, it's such a blast\n",
"Refreshing taste, it's like a dream\n",
"Sparkling water, you make me beam\n",
"\n",
"Chorus:\n",
"Oh sparkling water, you're my delight\n",
"With every sip, you make me feel so right\n",
"You're like a party in my mouth\n",
"I can't get enough, I'm hooked no doubt\n",
"\n",
"Verse 2:\n",
"No sugar, no calories, just pure bliss\n",
"You're the perfect drink, I must confess\n",
"From lemon to lime, so many flavors to choose\n",
"Sparkling water, you never fail to amuse\n",
"\n",
"Chorus:\n",
"Oh sparkling water, you're my delight\n",
"With every sip, you make me feel so right\n",
"You're like a party in my mouth\n",
"I can't get enough, I'm hooked no doubt\n",
"\n",
"Bridge:\n",
"Some may say you're just plain water\n",
"But to me, you're so much more\n",
"You bring a sparkle to my day\n",
"In every single way\n",
"|Spark|ling| water|,| oh| so clear|\n",
"|Bubbles dancing|,| without| fear|\n",
"|Refreshing| taste|,| a| pure| delight|\n",
"|Spark|ling| water|,| my| thirst|'s| delight||"
]
}
],
"source": [
"from langchain_openai import OpenAI\n",
"\n",
"llm = OpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0, max_tokens=512)\n",
"for chunk in llm.stream(\"Write me a 1 verse song about sparkling water.\"):\n",
" print(chunk, end=\"|\", flush=True)"
]
},
{
"cell_type": "markdown",
"id": "596e477b-a41d-4ff5-9b9a-a7bfb53c3680",
"metadata": {},
"source": [
"## Async streaming\n",
"\n",
"Let's see how to stream in an async setting using `astream`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d81140f2-384b-4470-bf93-957013c6620b",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Chorus:\n",
"Oh sparkling water, you're my delight\n",
"With every sip, you make me feel so right\n",
"You're like a party in my mouth\n",
"I can't get enough, I'm hooked no doubt\n",
"\n",
"Outro:\n",
"So here's to you, my dear sparkling water\n",
"You'll always be my go-to drink forever\n",
"With your effervescence and refreshing taste\n",
"You'll always have a special place."
"|Spark|ling| water|,| oh| so clear|\n",
"|Bubbles dancing|,| without| fear|\n",
"|Refreshing| taste|,| a| pure| delight|\n",
"|Spark|ling| water|,| my| thirst|'s| delight||"
]
}
],
@ -74,17 +98,52 @@
"from langchain_openai import OpenAI\n",
"\n",
"llm = OpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0, max_tokens=512)\n",
"for chunk in llm.stream(\"Write me a song about sparkling water.\"):\n",
" print(chunk, end=\"\", flush=True)"
"async for chunk in llm.astream(\"Write me a 1 verse song about sparkling water.\"):\n",
" print(chunk, end=\"|\", flush=True)"
]
},
{
"cell_type": "markdown",
"id": "9ab11306-b0db-4459-a9de-ecefb821c9b1",
"metadata": {
"tags": []
},
"source": [
"## Async event streaming\n",
"\n",
"\n",
"LLMs also support the standard [astream events](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream_events) method.\n",
"\n",
":::{.callout-tip}\n",
"\n",
"`astream_events` is most useful when implementing streaming in a larger LLM application that contains multiple steps (e.g., an application that involves an `agent`).\n",
":::"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d81140f2-384b-4470-bf93-957013c6620b",
"metadata": {},
"id": "399d74c7-4438-4093-ae05-47fed0255626",
"metadata": {
"tags": []
},
"outputs": [],
"source": []
"source": [
"from langchain_openai import OpenAI\n",
"\n",
"llm = OpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0, max_tokens=512)\n",
"\n",
"idx = 0\n",
"\n",
"async for event in llm.astream_events(\n",
" \"Write me a 1 verse song about goldfish on the moon\", version=\"v1\"\n",
"):\n",
" idx += 1\n",
" if idx >= 5: # Truncate the output\n",
" print(\"...Truncated\")\n",
" break\n",
" print(event)"
]
}
],
"metadata": {
@ -103,7 +162,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
"version": "3.11.4"
}
},
"nbformat": 4,

Loading…
Cancel
Save