New LLM integration: Ctranslate2 (#10400)

## Description:

I've integrated CTranslate2 with LangChain. CTranlate2 is a recently
popular library for efficient inference with Transformer models that
compares favorably to alternatives such as HF Text Generation Inference
and vLLM in
[benchmarks](https://hamel.dev/notes/llm/inference/03_inference.html).
pull/10412/head
eryk-dsai 12 months ago committed by GitHub
parent ddd07001f3
commit 675d57df50
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,240 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CTranslate2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**CTranslate2** is a C++ and Python library for efficient inference with Transformer models.\n",
"\n",
"The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU.\n",
"\n",
"Full list of features and supported models is included in the [project's repository](https://opennmt.net/CTranslate2/guides/transformers.html). To start, please check out the official [quickstart guide](https://opennmt.net/CTranslate2/quickstart.html).\n",
"\n",
"To use, you should have `ctranslate2` python package installed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install ctranslate2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To use a Hugging Face model with CTranslate2, it has to be first converted to CTranslate2 format using the `ct2-transformers-converter` command. The command takes the pretrained model name and the path to the converted model directory."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loading checkpoint shards: 100%|██████████████████| 2/2 [00:01<00:00, 1.81it/s]\n"
]
}
],
"source": [
"# converstion can take several minutes\n",
"!ct2-transformers-converter --model meta-llama/Llama-2-7b-hf --quantization bfloat16 --output_dir ./llama-2-7b-ct2 --force"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import CTranslate2\n",
"\n",
"llm = CTranslate2(\n",
" # output_dir from above:\n",
" model_path=\"./llama-2-7b-ct2\",\n",
" tokenizer_name=\"meta-llama/Llama-2-7b-hf\",\n",
" device=\"cuda\",\n",
" # device_index can be either single int or list or ints,\n",
" # indicating the ids of GPUs to use for inference:\n",
" device_index=[0,1], \n",
" compute_type=\"bfloat16\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Single call"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"He presented me with plausible evidence for the existence of unicorns: 1) they are mentioned in ancient texts; and, more importantly to him (and not so much as a matter that would convince most people), he had seen one.\n",
"I was skeptical but I didn't want my friend upset by his belief being dismissed outright without any consideration or argument on its behalf whatsoever - which is why we were having this conversation at all! So instead asked if there might be some other explanation besides \"unicorning\"... maybe it could have been an ostrich? Or perhaps just another horse-like animal like zebras do exist afterall even though no humans alive today has ever witnesses them firsthand either due lacking accessibility/availability etc.. But then again those animals aren t exactly known around here anyway…” And thus began our discussion about whether these creatures actually existed anywhere else outside Earth itself where only few scientists ventured before us nowadays because technology allows exploration beyond borders once thought impossible centuries ago when travel meant walking everywhere yourself until reaching destination point A->B via footsteps alone unless someone helped guide along way through woods full darkness nighttime hours\n"
]
}
],
"source": [
"print(\n",
" llm(\n",
" \"He presented me with plausible evidence for the existence of unicorns: \",\n",
" max_length=256,\n",
" sampling_topk=50,\n",
" sampling_temperature=0.2,\n",
" repetition_penalty=2,\n",
" cache_static_prompt=False,\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multiple calls:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"generations=[[Generation(text='The list of top romantic songs:\\n1. “I Will Always Love You” by Whitney Houston\\n2. “Cant Help Falling in Love” by Elvis Presley\\n3. “Unchained Melody” by The Righteous Brothers\\n4. “I Will Always Love You” by Dolly Parton\\n5. “I Will Always Love You” by Whitney Houston\\n6. “I Will Always Love You” by Dolly Parton\\n7. “I Will Always Love You” by The Beatles\\n8. “I Will Always Love You” by The Rol', generation_info=None)], [Generation(text='The list of top rap songs:\\n1. “Gods Plan” by Drake\\n2. “Rockstar” by Post Malone\\n3. “Bad and Boujee” by Migos\\n4. “Humble” by Kendrick Lamar\\n5. “Bodak Yellow” by Cardi B\\n6. “Im the One” by DJ Khaled\\n7. “Motorsport” by Migos\\n8. “No Limit” by G-Eazy\\n9. “Bounce Back” by Big Sean\\n10. “', generation_info=None)]] llm_output=None run=[RunInfo(run_id=UUID('628e0491-a310-4d12-81db-6f2c5309d5c2')), RunInfo(run_id=UUID('f88fdbcd-c1f6-4f13-b575-810b80ecbaaf'))]\n"
]
}
],
"source": [
"print(\n",
" llm.generate(\n",
" [\"The list of top romantic songs:\\n1.\", \"The list of top rap songs:\\n1.\"],\n",
" max_length=128\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Integrate the model in an LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Who was the US president in the year the first Pokemon game was released?\n",
"\n",
"Let's think step by step. 1996 was the year the first Pokemon game was released.\n",
"\n",
"\\begin{blockquote}\n",
"\n",
"\\begin{itemize}\n",
" \\item 1996 was the year Bill Clinton was president.\n",
" \\item 1996 was the year the first Pokemon game was released.\n",
" \\item 1996 was the year the first Pokemon game was released.\n",
"\n",
"\\end{itemize}\n",
"\\end{blockquote}\n",
"\n",
"I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n",
"Comment: I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n",
"Comment: @JoeZ. I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n",
"Comment: @JoeZ. I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n",
"Comment: @JoeZ. I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n",
"Comment: @JoeZ. I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n",
"Comment: @JoeZ. I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n",
"Comment: @JoeZ. I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n",
"Comment: @JoeZ. I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n",
"Comment: @JoeZ. I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n",
"Comment: @JoeZ. I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n",
"Comment: @JoeZ. I'm not sure if this is a valid question, but I'm sure it's a fun one.\n",
"\n"
]
}
],
"source": [
"from langchain import PromptTemplate, LLMChain\n",
"\n",
"template = \"\"\"{question}\n",
"\n",
"Let's think step by step. \"\"\"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
"\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"\n",
"question = \"Who was the US president in the year the first Pokemon game was released?\"\n",
"\n",
"print(llm_chain.run(question))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.12 ('langchain_venv': venv)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "d1d3a3c58a58885896c5459933a599607cdbb9917d7e1ad7516c8786c51f2dd2"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -37,6 +37,7 @@ from langchain.llms.chatglm import ChatGLM
from langchain.llms.clarifai import Clarifai
from langchain.llms.cohere import Cohere
from langchain.llms.ctransformers import CTransformers
from langchain.llms.ctranslate2 import CTranslate2
from langchain.llms.databricks import Databricks
from langchain.llms.deepinfra import DeepInfra
from langchain.llms.deepsparse import DeepSparse
@ -100,6 +101,7 @@ __all__ = [
"Beam",
"Bedrock",
"CTransformers",
"CTranslate2",
"CerebriumAI",
"ChatGLM",
"Clarifai",
@ -178,6 +180,7 @@ type_to_cls_dict: Dict[str, Type[BaseLLM]] = {
"clarifai": Clarifai,
"cohere": Cohere,
"ctransformers": CTransformers,
"ctranslate2": CTranslate2,
"databricks": Databricks,
"deepinfra": DeepInfra,
"deepsparse": DeepSparse,

@ -0,0 +1,128 @@
from typing import Any, Dict, List, Optional, Union
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import BaseLLM
from langchain.pydantic_v1 import Field, root_validator
from langchain.schema.output import Generation, LLMResult
class CTranslate2(BaseLLM):
"""CTranslate2 language model."""
model_path: str = ""
"""Path to the CTranslate2 model directory."""
tokenizer_name: str = ""
"""Name of the original Hugging Face model needed to load the proper tokenizer."""
device: str = "cpu"
"""Device to use (possible values are: cpu, cuda, auto)."""
device_index: Union[int, List[int]] = 0
"""Device IDs where to place this generator on."""
compute_type: Union[str, Dict[str, str]] = "default"
"""
Model computation type or a dictionary mapping a device name to the computation type
(possible values are: default, auto, int8, int8_float32, int8_float16,
int8_bfloat16, int16, float16, bfloat16, float32).
"""
max_length: int = 512
"""Maximum generation length."""
sampling_topk: int = 1
"""Randomly sample predictions from the top K candidates."""
sampling_topp: float = 1
"""Keep the most probable tokens whose cumulative probability exceeds this value."""
sampling_temperature: float = 1
"""Sampling temperature to generate more random samples."""
client: Any #: :meta private:
tokenizer: Any #: :meta private:
ctranslate2_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""
Holds any model parameters valid for `ctranslate2.Generator` call not
explicitly specified.
"""
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that python package exists in environment."""
try:
import ctranslate2
except ImportError:
raise ImportError(
"Could not import ctranslate2 python package. "
"Please install it with `pip install ctranslate2`."
)
try:
import transformers
except ImportError:
raise ImportError(
"Could not import transformers python package. "
"Please install it with `pip install transformers`."
)
values["client"] = ctranslate2.Generator(
model_path=values["model_path"],
device=values["device"],
device_index=values["device_index"],
compute_type=values["compute_type"],
**values["ctranslate2_kwargs"],
)
values["tokenizer"] = transformers.AutoTokenizer.from_pretrained(
values["tokenizer_name"]
)
return values
@property
def _default_params(self) -> Dict[str, Any]:
"""Get the default parameters."""
return {
"max_length": self.max_length,
"sampling_topk": self.sampling_topk,
"sampling_topp": self.sampling_topp,
"sampling_temperature": self.sampling_temperature,
}
def _generate(
self,
prompts: List[str],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> LLMResult:
# build sampling parameters
params = {**self._default_params, **kwargs}
# call the model
encoded_prompts = self.tokenizer(prompts)["input_ids"]
tokenized_prompts = [
self.tokenizer.convert_ids_to_tokens(encoded_prompt)
for encoded_prompt in encoded_prompts
]
results = self.client.generate_batch(tokenized_prompts, **params)
sequences = [result.sequences_ids[0] for result in results]
decoded_sequences = [self.tokenizer.decode(seq) for seq in sequences]
generations = []
for text in decoded_sequences:
generations.append([Generation(text=text)])
return LLMResult(generations=generations)
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "ctranslate2"
Loading…
Cancel
Save