Create `llama-cpu` variant for systems without GPU

Fixes #16
10 months ago · 00340c0504
parent 6562cf0e16
commit 00340c0504
2 changed files with 9 additions and 0 deletions
--- a/8
+++ b/8
@ -106,6 +106,14 @@ RUN pip install git+https://github.com/sterlind/GPTQ-for-LLaMa.git@lora_4bit
 ENV EXTRA_LAUNCH_ARGS=""
 CMD ["python3", "/app/server.py", "--monkey-patch"]

+FROM base AS llama-cpu
+RUN echo "LLAMA-CPU" >> /variant.txt
+RUN apt-get install --no-install-recommends -y git python3-dev build-essential python3-pip
+RUN unset TORCH_CUDA_ARCH_LIST LLAMA_CUBLAS
+RUN pip uninstall -y llama_cpp_python_cuda llama-cpp-python && pip install llama-cpp-python --force-reinstall --upgrade
+ENV EXTRA_LAUNCH_ARGS=""
+CMD ["python3", "/app/server.py", "--monkey-patch"]
+
 FROM base AS default
 RUN echo "DEFAULT" >> /variant.txt
 ENV EXTRA_LAUNCH_ARGS=""
--- a/README.md
+++ b/README.md
@ -23,6 +23,7 @@ Each variant has the 'extras' incuded in `default` but has some changes made as
 | `default-nightly` | Automated nightly build of the `default` variant. This image is built and pushed automatically - it is untested and may be unstable. *Suitable when more frequent updates are required and instability is not an issue.*  |
 | `triton` | Updated `GPTQ-for-llama` using the latest `triton` branch from `qwopqwop200/GPTQ-for-LLaMa`. Suitable for Linux only. *This version is accurate but a little slow.* |
 | `cuda` | Updated `GPTQ-for-llama` using the latest `cuda` branch from `qwopqwop200/GPTQ-for-LLaMa`. *This version is very slow!* |
+| `llama-cpu` | GPU supported is REMOVED from `llama-cpp`. Suitable for systems without a CUDA-capable GPU. *This is only for when GPU acceleration is not available and is a slower way to run models!* |
 | `monkey-patch` | Use LoRAs in 4-Bit `GPTQ-for-llama` mode. ***DEPRECATION WARNING:** This version is outdated, but will remain for now.* |
 | `llama-cublas` | CUDA GPU offloading enabled for `llama-cpp`. Use by setting option `n-gpu-layers` > 0. ***DEPRECATION WARNING:** This capability has been rolled into the default. The variant will be removed if the upstream dependency does not conflict with `default`.* |