Update llamacpp demonstration notebook (#5344)

# Update llamacpp demonstration notebook Add instructions to install with BLAS backend, and update the example of model usage. Fixes #5071. However, it is more like a prevention of similar issues in the future, not a fix, since there was no problem in the framework functionality ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: - @hwchase17 - @agola11
12 months ago · f6615cac41
parent 44b48d9518
commit f6615cac41
1 changed files with 220 additions and 10 deletions
--- a/docs/modules/models/llms/integrations/llamacpp.ipynb
+++ b/docs/modules/models/llms/integrations/llamacpp.ipynb
@ -1,6 +1,7 @@
 {
 "cells": [
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@ -12,6 +13,20 @@
    "This notebook goes over how to run `llama-cpp` within LangChain."
   ]
  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Installation\n",
+    "\n",
+    "There is a banch of options how to install the llama-cpp package: \n",
+    "- only CPU usage\n",
+    "- CPU + GPU (using one of many BLAS backends)\n",
+    "\n",
+    "### CPU only installation"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -24,6 +39,53 @@
   ]
  },
  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Installation with OpenBLAS / cuBLAS / CLBlast\n",
+    "\n",
+    "`lama.cpp` supports multiple BLAS backends for faster processing. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the desired BLAS backend ([source](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast)).\n",
+    "\n",
+    "Example installation with cuBLAS backend:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!CMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" FORCE_CMAKE=1 pip install llama-cpp-python"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**IMPORTANT**: If you have already installed a cpu only version of the package, you need to reinstall it from scratch: condiser the following command: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!CMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Usage"
+   ]
+  },
+  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@ -46,6 +108,14 @@
    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler"
   ]
  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Consider using a template that suits your model! Check the models page on HuggingFace etc. to get a correct prompting template.**"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 4,
@ -56,14 +126,14 @@
   "source": [
    "template = \"\"\"Question: {question}\n",
    "\n",
-    "Answer: Let's think step by step.\"\"\"\n",
+    "Answer: Let's work this out in a step by step way to be sure we have the right answer.\"\"\"\n",
    "\n",
    "prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 5,
   "metadata": {
    "tags": []
   },
@ -71,17 +141,125 @@
   "source": [
    "# Callbacks support token-wise streaming\n",
    "callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])\n",
-    "# Verbose is required to pass to the callback manager\n",
+    "# Verbose is required to pass to the callback manager"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### CPU"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Make sure the model path is correct for your system!\n",
+    "llm = LlamaCpp(\n",
+    "    model_path=\"./ggml-model-q4_0.bin\", \n",
+    "    callback_manager=callback_manager, \n",
+    "    verbose=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "1. First, find out when Justin Bieber was born.\n",
+      "2. We know that Justin Bieber was born on March 1, 1994.\n",
+      "3. Next, we need to look up when the Super Bowl was played in that year.\n",
+      "4. The Super Bowl was played on January 28, 1995.\n",
+      "5. Finally, we can use this information to answer the question. The NFL team that won the Super Bowl in the year Justin Bieber was born is the San Francisco 49ers."
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "llama_print_timings:        load time =   434.15 ms\n",
+      "llama_print_timings:      sample time =    41.81 ms /   121 runs   (    0.35 ms per token)\n",
+      "llama_print_timings: prompt eval time =  2523.78 ms /    48 tokens (   52.58 ms per token)\n",
+      "llama_print_timings:        eval time = 23971.57 ms /   121 runs   (  198.11 ms per token)\n",
+      "llama_print_timings:       total time = 28945.95 ms\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'\\n\\n1. First, find out when Justin Bieber was born.\\n2. We know that Justin Bieber was born on March 1, 1994.\\n3. Next, we need to look up when the Super Bowl was played in that year.\\n4. The Super Bowl was played on January 28, 1995.\\n5. Finally, we can use this information to answer the question. The NFL team that won the Super Bowl in the year Justin Bieber was born is the San Francisco 49ers.'"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "question = \"What NFL team won the Super Bowl in the year Justin Bieber was born?\"\n",
+    "\n",
+    "llm_chain.run(question)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### GPU\n",
+    "\n",
+    "If the installation with BLAS backend was correct, you will see an `BLAS = 1` indicator in model properties.\n",
+    "\n",
+    "Two of the most important parameters for use with GPU are:\n",
+    "\n",
+    "- `n_gpu_layers` - determines how many layers of the model are offloaded to your GPU.\n",
+    "- `n_batch` - how many tokens are processed in parallel. \n",
+    "\n",
+    "Setting these parameters correctly will dramatically improve the evaluation speed (see [wrapper code](https://github.com/mmagnesium/langchain/blob/master/langchain/llms/llamacpp.py) for more details)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n_gpu_layers = 40 # Change this value based on your model and your GPU VRAM pool.\n",
+    "n_batch = 512 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.\n",
    "\n",
    "# Make sure the model path is correct for your system!\n",
    "llm = LlamaCpp(\n",
-    "    model_path=\"./ggml-model-q4_0.bin\", callback_manager=callback_manager, verbose=True\n",
+    "    model_path=\"./ggml-model-q4_0.bin\",\n",
+    "    n_gpu_layers=n_gpu_layers, n_batch=n_batch,\n",
+    "    callback_manager=callback_manager, \n",
+    "    verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
@ -90,23 +268,48 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      " First we need to identify what year Justin Beiber was born in. A quick google search reveals that he was born on March 1st, 1994. Now we know when the Super Bowl was played in, so we can look up which NFL team won it. The NFL Superbowl of the year 1994 was won by the San Francisco 49ers against the San Diego Chargers."
+      " We are looking for an NFL team that won the Super Bowl when Justin Bieber (born March 1, 1994) was born. \n",
+      "\n",
+      "First, let's look up which year is closest to when Justin Bieber was born:\n",
+      "\n",
+      "* The year before he was born: 1993\n",
+      "* The year of his birth: 1994\n",
+      "* The year after he was born: 1995\n",
+      "\n",
+      "We want to know what NFL team won the Super Bowl in the year that is closest to when Justin Bieber was born. Therefore, we should look up the NFL team that won the Super Bowl in either 1993 or 1994.\n",
+      "\n",
+      "Now let's find out which NFL team did win the Super Bowl in either of those years:\n",
+      "\n",
+      "* In 1993, the San Francisco 49ers won the Super Bowl against the Dallas Cowboys by a score of 20-16.\n",
+      "* In 1994, the San Francisco 49ers won the Super Bowl again, this time against the San Diego Chargers by a score of 49-26.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "llama_print_timings:        load time =   238.10 ms\n",
+      "llama_print_timings:      sample time =    84.23 ms /   256 runs   (    0.33 ms per token)\n",
+      "llama_print_timings: prompt eval time =   238.04 ms /    49 tokens (    4.86 ms per token)\n",
+      "llama_print_timings:        eval time = 10391.96 ms /   255 runs   (   40.75 ms per token)\n",
+      "llama_print_timings:       total time = 15664.80 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
-       "' First we need to identify what year Justin Beiber was born in. A quick google search reveals that he was born on March 1st, 1994. Now we know when the Super Bowl was played in, so we can look up which NFL team won it. The NFL Superbowl of the year 1994 was won by the San Francisco 49ers against the San Diego Chargers.'"
+       "\" We are looking for an NFL team that won the Super Bowl when Justin Bieber (born March 1, 1994) was born. \\n\\nFirst, let's look up which year is closest to when Justin Bieber was born:\\n\\n* The year before he was born: 1993\\n* The year of his birth: 1994\\n* The year after he was born: 1995\\n\\nWe want to know what NFL team won the Super Bowl in the year that is closest to when Justin Bieber was born. Therefore, we should look up the NFL team that won the Super Bowl in either 1993 or 1994.\\n\\nNow let's find out which NFL team did win the Super Bowl in either of those years:\\n\\n* In 1993, the San Francisco 49ers won the Super Bowl against the Dallas Cowboys by a score of 20-16.\\n* In 1994, the San Francisco 49ers won the Super Bowl again, this time against the San Diego Chargers by a score of 49-26.\\n\""
      ]
     },
-     "execution_count": 6,
+     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -116,6 +319,13 @@
    "\n",
    "llm_chain.run(question)"
   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
  }
 ],
 "metadata": {
@ -134,7 +344,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.3"
+   "version": "3.10.9"
  }
 },
 "nbformat": 4,