Implement llama-cpp with CUBLAS

Also switch to devel image for deployment - it will be more flexible for variant builds.
pull/2/head
Atinoda 1 year ago
parent bd884491e4
commit dd8dab6fb9

@ -12,7 +12,6 @@ ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN pip3 install --upgrade pip setuptools && \
pip3 install torch torchvision torchaudio
FROM env_base AS app_base
### DEVELOPERS/ADVANCED USERS ###
# Clone oobabooga/text-generation-webui
@ -36,7 +35,7 @@ ARG TORCH_CUDA_ARCH_LIST="6.1;7.0;7.5;8.0;8.6+PTX"
RUN cd /app/repositories/GPTQ-for-LLaMa/ && python3 setup_cuda.py install
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04 AS base
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 AS base
# Runtime pre-reqs
RUN apt-get update && apt-get install --no-install-recommends -y \
python3-venv git
@ -61,7 +60,6 @@ RUN chmod +x /scripts/docker-entrypoint.sh
ENTRYPOINT ["/scripts/docker-entrypoint.sh"]
# VARIANT BUILDS
FROM base AS cuda
RUN echo "CUDA" >> /variant.txt
@ -73,7 +71,6 @@ RUN pip3 uninstall -y quant-cuda && \
ENV EXTRA_LAUNCH_ARGS=""
CMD ["python3", "/app/server.py"]
FROM base AS triton
RUN echo "TRITON" >> /variant.txt
RUN apt-get install --no-install-recommends -y git python3-dev build-essential python3-pip
@ -84,6 +81,14 @@ RUN pip3 uninstall -y quant-cuda && \
ENV EXTRA_LAUNCH_ARGS=""
CMD ["python3", "/app/server.py"]
FROM base AS llama-cublas
RUN echo "LLAMA-CUBLAS" >> /variant.txt
RUN apt-get install --no-install-recommends -y git python3-dev build-essential python3-pip
ENV LLAMA_CUBLAS=1
RUN pip uninstall -y llama-cpp-python && \
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
ENV EXTRA_LAUNCH_ARGS=""
CMD ["python3", "/app/server.py"]
FROM base AS default
RUN echo "DEFAULT" >> /variant.txt

@ -1,10 +1,10 @@
# Introduction
This project dockerises the deployment of [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) and its variants. It provides a default configuration (corresponding to a vanilla deployment of the application) as well as pre-configured support for other set-ups (e.g., the more recent `triton` and `cuda` branches of GPTQ).
This project dockerises the deployment of [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) and its variants. It provides a default configuration (corresponding to a vanilla deployment of the application) as well as pre-configured support for other set-ups (e.g., latest `llama-cpp-python` with GPU offloading, the more recent `triton` and `cuda` branches of GPTQ).
*This goal of this project is to be to [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui), what [AbdBarho/stable-diffusion-webui-docker](https://github.com/AbdBarho/stable-diffusion-webui-docker) is to [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui).*
# Usage
*This project currently supports Linux as the deployment platform.*
*This project currently supports Linux as the deployment platform. It will also probably work using WSL2.*
## Pre-Requisites
- docker
@ -22,8 +22,8 @@ Choose the desired variant by setting the build `target` in `docker-compose.yml`
| `default` | Minimal implementation of the default deployment from source. |
| `triton` | Updated GPTQ using the latest `triton` branch from `qwopqwop200/GPTQ-for-LLaMa`. Suitable for Linux only. |
| `cuda` | Updated GPTQ using the latest `cuda` branch from `qwopqwop200/GPTQ-for-LLaMa`. |
*See: [oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md) for more information on variants.*
| `llama-cublas` | CUDA GPU offloading enabled for llama-cpp. Use by setting option `n-gpu-layers` > 0. |
*See: [oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md) and [obabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md) for more information on variants.*
### Build
Build the image:

Loading…
Cancel
Save