From 91a0817e39df297ee20ab1754f8d5bceb2429bf5 Mon Sep 17 00:00:00 2001 From: Dayou Liu Date: Fri, 4 Aug 2023 17:19:43 -0400 Subject: [PATCH] docs: llamacpp minor fixes (#8738) - Description: minor updates on llama cpp doc --- docs/extras/integrations/llms/llamacpp.ipynb | 41 +++++++++----------- 1 file changed, 19 insertions(+), 22 deletions(-) diff --git a/docs/extras/integrations/llms/llamacpp.ipynb b/docs/extras/integrations/llms/llamacpp.ipynb index c7c3a46446..2ec6de39c9 100644 --- a/docs/extras/integrations/llms/llamacpp.ipynb +++ b/docs/extras/integrations/llms/llamacpp.ipynb @@ -4,12 +4,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Llama-cpp\n", + "# Llama.cpp\n", "\n", - "[llama-cpp](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). \n", + "[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). \n", "It supports [several LLMs](https://github.com/ggerganov/llama.cpp).\n", "\n", - "This notebook goes over how to run `llama-cpp` within LangChain." + "This notebook goes over how to run `llama-cpp-python` within LangChain." ] }, { @@ -18,7 +18,7 @@ "source": [ "## Installation\n", "\n", - "There is a bunch of options how to install the llama-cpp package: \n", + "There are different options on how to install the llama-cpp package: \n", "- only CPU usage\n", "- CPU + GPU (using one of many BLAS backends)\n", "- Metal GPU (MacOS with Apple Silicon Chip) \n", @@ -61,7 +61,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**IMPORTANT**: If you have already installed a cpu only version of the package, you need to reinstall it from scratch: consider the following command: " + "**IMPORTANT**: If you have already installed the CPU only version of the package, you need to reinstall it from scratch. Consider the following command: " ] }, { @@ -79,7 +79,7 @@ "source": [ "### Installation with Metal\n", "\n", - "`lama.cpp` supports Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the Metal support ([source](https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md)).\n", + "`llama.cpp` supports Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the Metal support ([source](https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md)).\n", "\n", "Example installation with Metal Support:" ] @@ -143,7 +143,7 @@ "\n", "#### Compiling and installing\n", "\n", - "In the same command prompt (anaconda prompt) you set the variables, you can cd into `llama-cpp-python` directory and run the following commands.\n", + "In the same command prompt (anaconda prompt) you set the variables, you can `cd` into `llama-cpp-python` directory and run the following commands.\n", "\n", "```\n", "python setup.py clean\n", @@ -164,7 +164,9 @@ "source": [ "Make sure you are following all instructions to [install all necessary model files](https://github.com/ggerganov/llama.cpp).\n", "\n", - "You don't need an `API_TOKEN`!" + "You don't need an `API_TOKEN` as you will run the LLM locally.\n", + "\n", + "It is worth understanding which models are suitable to be used on the desired machine." ] }, { @@ -227,7 +229,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "`Llama-v2`" + "Example using a LLaMA 2 7B model" ] }, { @@ -304,7 +306,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "`Llama-v1`" + "Example using a LLaMA v1 model" ] }, { @@ -381,7 +383,7 @@ "source": [ "### GPU\n", "\n", - "If the installation with BLAS backend was correct, you will see an `BLAS = 1` indicator in model properties.\n", + "If the installation with BLAS backend was correct, you will see a `BLAS = 1` indicator in model properties.\n", "\n", "Two of the most important parameters for use with GPU are:\n", "\n", @@ -473,22 +475,15 @@ "llm_chain.run(question)" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Metal\n", "\n", - "If the installation with Metal was correct, you will see an `NEON = 1` indicator in model properties.\n", + "If the installation with Metal was correct, you will see a `NEON = 1` indicator in model properties.\n", "\n", - "Two of the most important parameters for use with GPU are:\n", + "Two of the most important GPU parameters are:\n", "\n", "- `n_gpu_layers` - determines how many layers of the model are offloaded to your Metal GPU, in the most case, set it to `1` is enough for Metal\n", "- `n_batch` - how many tokens are processed in parallel, default is 8, set to bigger number.\n", @@ -522,7 +517,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The rest are almost same as GPU, the console log will show the following log to indicate the Metal was enable properly.\n", + "The console log will show the following log to indicate Metal was enable properly.\n", "\n", "```\n", "ggml_metal_init: allocating\n", @@ -530,7 +525,9 @@ "...\n", "```\n", "\n", - "You also could check the `Activity Monitor` by watching the % GPU of the process, the % CPU will drop dramatically after turn on `n_gpu_layers=1`. Also for the first time call LLM, the performance might be slow due to the model compilation in Metal GPU." + "You also could check `Activity Monitor` by watching the GPU usage of the process, the CPU usage will drop dramatically after turn on `n_gpu_layers=1`. \n", + "\n", + "For the first call to the LLM, the performance may be slow due to the model compilation in Metal GPU." ] } ],