docs: llamacpp minor fixes (#8738)

- Description: minor updates on llama cpp doc
pull/8781/head
Dayou Liu 1 year ago committed by GitHub
parent f437311eef
commit 91a0817e39
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -4,12 +4,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Llama-cpp\n",
"# Llama.cpp\n",
"\n",
"[llama-cpp](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). \n",
"[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). \n",
"It supports [several LLMs](https://github.com/ggerganov/llama.cpp).\n",
"\n",
"This notebook goes over how to run `llama-cpp` within LangChain."
"This notebook goes over how to run `llama-cpp-python` within LangChain."
]
},
{
@ -18,7 +18,7 @@
"source": [
"## Installation\n",
"\n",
"There is a bunch of options how to install the llama-cpp package: \n",
"There are different options on how to install the llama-cpp package: \n",
"- only CPU usage\n",
"- CPU + GPU (using one of many BLAS backends)\n",
"- Metal GPU (MacOS with Apple Silicon Chip) \n",
@ -61,7 +61,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**IMPORTANT**: If you have already installed a cpu only version of the package, you need to reinstall it from scratch: consider the following command: "
"**IMPORTANT**: If you have already installed the CPU only version of the package, you need to reinstall it from scratch. Consider the following command: "
]
},
{
@ -79,7 +79,7 @@
"source": [
"### Installation with Metal\n",
"\n",
"`lama.cpp` supports Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the Metal support ([source](https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md)).\n",
"`llama.cpp` supports Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the Metal support ([source](https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md)).\n",
"\n",
"Example installation with Metal Support:"
]
@ -143,7 +143,7 @@
"\n",
"#### Compiling and installing\n",
"\n",
"In the same command prompt (anaconda prompt) you set the variables, you can cd into `llama-cpp-python` directory and run the following commands.\n",
"In the same command prompt (anaconda prompt) you set the variables, you can `cd` into `llama-cpp-python` directory and run the following commands.\n",
"\n",
"```\n",
"python setup.py clean\n",
@ -164,7 +164,9 @@
"source": [
"Make sure you are following all instructions to [install all necessary model files](https://github.com/ggerganov/llama.cpp).\n",
"\n",
"You don't need an `API_TOKEN`!"
"You don't need an `API_TOKEN` as you will run the LLM locally.\n",
"\n",
"It is worth understanding which models are suitable to be used on the desired machine."
]
},
{
@ -227,7 +229,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"`Llama-v2`"
"Example using a LLaMA 2 7B model"
]
},
{
@ -304,7 +306,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"`Llama-v1`"
"Example using a LLaMA v1 model"
]
},
{
@ -381,7 +383,7 @@
"source": [
"### GPU\n",
"\n",
"If the installation with BLAS backend was correct, you will see an `BLAS = 1` indicator in model properties.\n",
"If the installation with BLAS backend was correct, you will see a `BLAS = 1` indicator in model properties.\n",
"\n",
"Two of the most important parameters for use with GPU are:\n",
"\n",
@ -473,22 +475,15 @@
"llm_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Metal\n",
"\n",
"If the installation with Metal was correct, you will see an `NEON = 1` indicator in model properties.\n",
"If the installation with Metal was correct, you will see a `NEON = 1` indicator in model properties.\n",
"\n",
"Two of the most important parameters for use with GPU are:\n",
"Two of the most important GPU parameters are:\n",
"\n",
"- `n_gpu_layers` - determines how many layers of the model are offloaded to your Metal GPU, in the most case, set it to `1` is enough for Metal\n",
"- `n_batch` - how many tokens are processed in parallel, default is 8, set to bigger number.\n",
@ -522,7 +517,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The rest are almost same as GPU, the console log will show the following log to indicate the Metal was enable properly.\n",
"The console log will show the following log to indicate Metal was enable properly.\n",
"\n",
"```\n",
"ggml_metal_init: allocating\n",
@ -530,7 +525,9 @@
"...\n",
"```\n",
"\n",
"You also could check the `Activity Monitor` by watching the % GPU of the process, the % CPU will drop dramatically after turn on `n_gpu_layers=1`. Also for the first time call LLM, the performance might be slow due to the model compilation in Metal GPU."
"You also could check `Activity Monitor` by watching the GPU usage of the process, the CPU usage will drop dramatically after turn on `n_gpu_layers=1`. \n",
"\n",
"For the first call to the LLM, the performance may be slow due to the model compilation in Metal GPU."
]
}
],

Loading…
Cancel
Save