From 58b90f30b0a72e392a3428116560e2f2417e06eb Mon Sep 17 00:00:00 2001 From: ElliotKetchup <64539418+ElliotKetchup@users.noreply.github.com> Date: Thu, 2 Nov 2023 00:32:02 +0100 Subject: [PATCH] Update llama.cpp integration (#11864) --- docs/docs/integrations/llms/llamacpp.ipynb | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/docs/integrations/llms/llamacpp.ipynb b/docs/docs/integrations/llms/llamacpp.ipynb index 668766f30a..370a6b1945 100644 --- a/docs/docs/integrations/llms/llamacpp.ipynb +++ b/docs/docs/integrations/llms/llamacpp.ipynb @@ -6,9 +6,9 @@ "source": [ "# Llama.cpp\n", "\n", - "[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). \n", + "[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp).\n", "\n", - "It supports inference for [many LLMs](https://github.com/ggerganov/llama.cpp), which can be accessed on [HuggingFace](https://huggingface.co/TheBloke).\n", + "It supports inference for [many LLMs](https://github.com/ggerganov/llama.cpp#description) models, which can be accessed on [HuggingFace](https://huggingface.co/TheBloke).\n", "\n", "This notebook goes over how to run `llama-cpp-python` within LangChain.\n", "\n", @@ -54,7 +54,7 @@ "source": [ "### Installation with OpenBLAS / cuBLAS / CLBlast\n", "\n", - "`lama.cpp` supports multiple BLAS backends for faster processing. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the desired BLAS backend ([source](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast)).\n", + "`llama.cpp` supports multiple BLAS backends for faster processing. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the desired BLAS backend ([source](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast)).\n", "\n", "Example installation with cuBLAS backend:" ] @@ -177,7 +177,11 @@ "\n", "You don't need an `API_TOKEN` as you will run the LLM locally.\n", "\n", - "It is worth understanding which models are suitable to be used on the desired machine." + "It is worth understanding which models are suitable to be used on the desired machine.\n", + "\n", + "[TheBloke's](https://huggingface.co/TheBloke) Hugging Face models have a `Provided files` section that exposes the RAM required to run models of different quantisation sizes and methods (eg: [Llama2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF#provided-files)).\n", + "\n", + "This [github issue](https://github.com/facebookresearch/llama/issues/425) is also relevant to find the right model for your machine." ] }, {