langchain/docs/extras/integrations/llms/huggingface_pipelines.ipynb
Taqi Jaffri b7290f01d8
Batching for hf_pipeline (#10795)
The huggingface pipeline in langchain (used for locally hosted models)
does not support batching. If you send in a batch of prompts, it just
processes them serially using the base implementation of _generate:
https://github.com/docugami/langchain/blob/master/libs/langchain/langchain/llms/base.py#L1004C2-L1004C29

This PR adds support for batching in this pipeline, so that GPUs can be
fully saturated. I updated the accompanying notebook to show GPU batch
inference.

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
2023-09-25 18:23:11 +01:00

156 lines
4.2 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "959300d4",
"metadata": {},
"source": [
"# Hugging Face Local Pipelines\n",
"\n",
"Hugging Face models can be run locally through the `HuggingFacePipeline` class.\n",
"\n",
"The [Hugging Face Model Hub](https://huggingface.co/models) hosts over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.\n",
"\n",
"These can be called from LangChain either through this local pipeline wrapper or by calling their hosted inference endpoints through the HuggingFaceHub class. For more information on the hosted pipelines, see the [HuggingFaceHub](huggingface_hub.html) notebook."
]
},
{
"cell_type": "markdown",
"id": "4c1b8450-5eaf-4d34-8341-2d785448a1ff",
"metadata": {
"tags": []
},
"source": [
"To use, you should have the ``transformers`` python [package installed](https://pypi.org/project/transformers/), as well as [pytorch](https://pytorch.org/get-started/locally/). You can also install `xformer` for a more memory-efficient attention implementation."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d772b637-de00-4663-bd77-9bc96d798db2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install transformers --quiet"
]
},
{
"cell_type": "markdown",
"id": "91ad075f-71d5-4bc8-ab91-cc0ad5ef16bb",
"metadata": {},
"source": [
"### Load the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "165ae236-962a-4763-8052-c4836d78a5d2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.llms import HuggingFacePipeline\n",
"\n",
"llm = HuggingFacePipeline.from_model_id(\n",
" model_id=\"bigscience/bloom-1b7\",\n",
" task=\"text-generation\",\n",
" model_kwargs={\"temperature\": 0, \"max_length\": 64},\n",
")"
]
},
{
"cell_type": "markdown",
"id": "00104b27-0c15-4a97-b198-4512337ee211",
"metadata": {},
"source": [
"### Create Chain\n",
"\n",
"With the model loaded into memory, you can compose it with a prompt to\n",
"form a chain."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3acf0069",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\n",
"\n",
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"prompt = PromptTemplate.from_template(template)\n",
"\n",
"chain = prompt | llm\n",
"\n",
"question = \"What is electroencephalography?\"\n",
"\n",
"print(chain.invoke({\"question\": question}))"
]
},
{
"cell_type": "markdown",
"id": "dbbc3a37",
"metadata": {},
"source": [
"### Batch GPU Inference\n",
"\n",
"If running on a device with GPU, you can also run inference on the GPU in batch mode."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "097ba62f",
"metadata": {},
"outputs": [],
"source": [
"gpu_llm = HuggingFacePipeline.from_model_id(\n",
" model_id=\"bigscience/bloom-1b7\",\n",
" task=\"text-generation\",\n",
" device=0, # -1 for CPU\n",
" batch_size=2, # adjust as needed based on GPU map and model size.\n",
" model_kwargs={\"temperature\": 0, \"max_length\": 64},\n",
")\n",
"\n",
"gpu_chain = prompt | gpu_llm.bind(stop=[\"\\n\\n\"])\n",
"\n",
"questions = []\n",
"for i in range(4):\n",
" questions.append({\"question\": f\"What is the number {i} in french?\"})\n",
"\n",
"answers = gpu_chain.batch(questions)\n",
"for answer in answers:\n",
" print(answer)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}