Add C Transformers for GGML Models (#5218)

# Add C Transformers for GGML Models
I created Python bindings for the GGML models:
https://github.com/marella/ctransformers

Currently it supports GPT-2, GPT-J, GPT-NeoX, LLaMA, MPT, etc. See
[Supported
Models](https://github.com/marella/ctransformers#supported-models).


It provides a unified interface for all models:

```python
from langchain.llms import CTransformers

llm = CTransformers(model='/path/to/ggml-gpt-2.bin', model_type='gpt2')

print(llm('AI is going to'))
```

It can be used with models hosted on the Hugging Face Hub:

```py
llm = CTransformers(model='marella/gpt-2-ggml')
```

It supports streaming:

```py
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = CTransformers(model='marella/gpt-2-ggml', callbacks=[StreamingStdOutCallbackHandler()])
```

Please see [README](https://github.com/marella/ctransformers#readme) for
more details.
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
searx_updates
Ravindra Marella 12 months ago committed by GitHub
parent ca88b25da6
commit b3988621c5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,57 @@
# C Transformers
This page covers how to use the [C Transformers](https://github.com/marella/ctransformers) library within LangChain.
It is broken into two parts: installation and setup, and then references to specific C Transformers wrappers.
## Installation and Setup
- Install the Python package with `pip install ctransformers`
- Download a supported [GGML model](https://huggingface.co/TheBloke) (see [Supported Models](https://github.com/marella/ctransformers#supported-models))
## Wrappers
### LLM
There exists a CTransformers LLM wrapper, which you can access with:
```python
from langchain.llms import CTransformers
```
It provides a unified interface for all models:
```python
llm = CTransformers(model='/path/to/ggml-gpt-2.bin', model_type='gpt2')
print(llm('AI is going to'))
```
If you are getting `illegal instruction` error, try using `lib='avx'` or `lib='basic'`:
```py
llm = CTransformers(model='/path/to/ggml-gpt-2.bin', model_type='gpt2', lib='avx')
```
It can be used with models hosted on the Hugging Face Hub:
```py
llm = CTransformers(model='marella/gpt-2-ggml')
```
If a model repo has multiple model files (`.bin` files), specify a model file using:
```py
llm = CTransformers(model='marella/gpt-2-ggml', model_file='ggml-model.bin')
```
Additional parameters can be passed using the `config` parameter:
```py
config = {'max_new_tokens': 256, 'repetition_penalty': 1.1}
llm = CTransformers(model='marella/gpt-2-ggml', config=config)
```
See [Documentation](https://github.com/marella/ctransformers#config) for a list of available parameters.
For a more detailed walkthrough of this, see [this notebook](../modules/models/llms/integrations/ctransformers.ipynb).

@ -0,0 +1,125 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# C Transformers\n",
"\n",
"The [C Transformers](https://github.com/marella/ctransformers) library provides Python bindings for GGML models.\n",
"\n",
"This example goes over how to use LangChain to interact with `C Transformers` [models](https://github.com/marella/ctransformers#supported-models)."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Install**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install ctransformers"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Load Model**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import CTransformers\n",
"\n",
"llm = CTransformers(model='marella/gpt-2-ggml')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Generate Text**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(llm('AI is going to'))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Streaming**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"\n",
"llm = CTransformers(model='marella/gpt-2-ggml', callbacks=[StreamingStdOutCallbackHandler()])\n",
"\n",
"response = llm('AI is going to')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**LLMChain**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain import PromptTemplate, LLMChain\n",
"\n",
"template = \"\"\"Question: {question}\n",
"\n",
"Answer:\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=['question'])\n",
"\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"\n",
"response = llm_chain.run('What is AI?')"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -10,6 +10,7 @@ from langchain.llms.base import BaseLLM
from langchain.llms.beam import Beam
from langchain.llms.cerebriumai import CerebriumAI
from langchain.llms.cohere import Cohere
from langchain.llms.ctransformers import CTransformers
from langchain.llms.deepinfra import DeepInfra
from langchain.llms.fake import FakeListLLM
from langchain.llms.forefrontai import ForefrontAI
@ -48,6 +49,7 @@ __all__ = [
"Beam",
"CerebriumAI",
"Cohere",
"CTransformers",
"DeepInfra",
"ForefrontAI",
"GooglePalm",
@ -92,6 +94,7 @@ type_to_cls_dict: Dict[str, Type[BaseLLM]] = {
"beam": Beam,
"cerebriumai": CerebriumAI,
"cohere": Cohere,
"ctransformers": CTransformers,
"deepinfra": DeepInfra,
"forefrontai": ForefrontAI,
"google_palm": GooglePalm,

@ -0,0 +1,104 @@
"""Wrapper around the C Transformers library."""
from typing import Any, Dict, Optional, Sequence
from pydantic import root_validator
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
class CTransformers(LLM):
"""Wrapper around the C Transformers LLM interface.
To use, you should have the ``ctransformers`` python package installed.
See https://github.com/marella/ctransformers
Example:
.. code-block:: python
from langchain.llms import CTransformers
llm = CTransformers(model="/path/to/ggml-gpt-2.bin", model_type="gpt2")
"""
client: Any #: :meta private:
model: str
"""The path to a model file or directory or the name of a Hugging Face Hub
model repo."""
model_type: Optional[str] = None
"""The model type."""
model_file: Optional[str] = None
"""The name of the model file in repo or directory."""
config: Optional[Dict[str, Any]] = None
"""The config parameters.
See https://github.com/marella/ctransformers#config"""
lib: Optional[str] = None
"""The path to a shared library or one of `avx2`, `avx`, `basic`."""
@property
def _identifying_params(self) -> Dict[str, Any]:
"""Get the identifying parameters."""
return {
"model": self.model,
"model_type": self.model_type,
"model_file": self.model_file,
"config": self.config,
}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "ctransformers"
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that ``ctransformers`` package is installed."""
try:
from ctransformers import AutoModelForCausalLM
except ImportError:
raise ImportError(
"Could not import `ctransformers` package. "
"Please install it with `pip install ctransformers`"
)
config = values["config"] or {}
values["client"] = AutoModelForCausalLM.from_pretrained(
values["model"],
model_type=values["model_type"],
model_file=values["model_file"],
lib=values["lib"],
**config,
)
return values
def _call(
self,
prompt: str,
stop: Optional[Sequence[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
) -> str:
"""Generate text from a prompt.
Args:
prompt: The prompt to generate text from.
stop: A list of sequences to stop generation when encountered.
Returns:
The generated text.
Example:
.. code-block:: python
response = llm("Tell me a joke.")
"""
text = []
_run_manager = run_manager or CallbackManagerForLLMRun.get_noop_manager()
for chunk in self.client(prompt, stop=stop, stream=True):
text.append(chunk)
_run_manager.on_llm_new_token(chunk, verbose=self.verbose)
return "".join(text)

@ -0,0 +1,21 @@
"""Test C Transformers wrapper."""
from langchain.llms import CTransformers
from tests.unit_tests.callbacks.fake_callback_handler import FakeCallbackHandler
def test_ctransformers_call() -> None:
"""Test valid call to C Transformers."""
config = {"max_new_tokens": 5}
callback_handler = FakeCallbackHandler()
llm = CTransformers(
model="marella/gpt-2-ggml",
config=config,
callbacks=[callback_handler],
)
output = llm("Say foo:")
assert isinstance(output, str)
assert len(output) > 1
assert 0 < callback_handler.llm_streams <= config["max_new_tokens"]
Loading…
Cancel
Save