- **Description:** Noticed that the Hugging Face Pipeline documentation
was a bit out of date.
Updated with information about passing in a pipeline directly
(consistent with docstring) and a recent contribution of mine on adding
support for multi-gpu specifications with Accelerate in
21eeba075c
"question = \"What is electroencephalography?\"\n",
"\n",
@ -98,6 +125,40 @@
"cell_type": "markdown",
"id": "dbbc3a37",
"metadata": {},
"source": [
"### GPU Inference\n",
"\n",
"When running on a machine with GPU, you can specify the `device=n` parameter to put the model on the specified device.\n",
"Defaults to `-1` for CPU inference.\n",
"\n",
"If you have multiple-GPUs and/or the model is too large for a single GPU, you can specify `device_map=\"auto\"`, which requires and uses the [Accelerate](https://huggingface.co/docs/accelerate/index) library to automatically determine how to load the model weights. \n",
"\n",
"*Note*: both `device` and `device_map` should not be specified together and can lead to unexpected behavior."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"gpu_llm = HuggingFacePipeline.from_model_id(\n",
" model_id=\"gpt2\",\n",
" task=\"text-generation\",\n",
" device=0, # replace with device_map=\"auto\" to use the accelerate library.\n",
" pipeline_kwargs={\"max_new_tokens\": 10},\n",
")\n",
"\n",
"gpu_chain = prompt | gpu_llm\n",
"\n",
"question = \"What is electroencephalography?\"\n",