From 58b90f30b0a72e392a3428116560e2f2417e06eb Mon Sep 17 00:00:00 2001
From: ElliotKetchup <64539418+ElliotKetchup@users.noreply.github.com>
Date: Thu, 2 Nov 2023 00:32:02 +0100
Subject: [PATCH] Update llama.cpp integration (#11864)

<!--
- **Description:** removed redondant link, replaced it with Meta's LLaMA
repo, add resources for models' hardware requirements,
  - **Issue:** None,
  - **Dependencies:** None,
  - **Tag maintainer:** None,
  - **Twitter handle:** @ElliotAlladaye
 -->
---
 docs/docs/integrations/llms/llamacpp.ipynb | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/docs/docs/integrations/llms/llamacpp.ipynb b/docs/docs/integrations/llms/llamacpp.ipynb
index 668766f30a..370a6b1945 100644
--- a/docs/docs/integrations/llms/llamacpp.ipynb
+++ b/docs/docs/integrations/llms/llamacpp.ipynb
@@ -6,9 +6,9 @@
    "source": [
     "# Llama.cpp\n",
     "\n",
-    "[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). \n",
+    "[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp).\n",
     "\n",
-    "It supports inference for [many LLMs](https://github.com/ggerganov/llama.cpp), which can be accessed on [HuggingFace](https://huggingface.co/TheBloke).\n",
+    "It supports inference for [many LLMs](https://github.com/ggerganov/llama.cpp#description) models, which can be accessed on [HuggingFace](https://huggingface.co/TheBloke).\n",
     "\n",
     "This notebook goes over how to run `llama-cpp-python` within LangChain.\n",
     "\n",
@@ -54,7 +54,7 @@
    "source": [
     "### Installation with OpenBLAS / cuBLAS / CLBlast\n",
     "\n",
-    "`lama.cpp` supports multiple BLAS backends for faster processing. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the desired BLAS backend ([source](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast)).\n",
+    "`llama.cpp` supports multiple BLAS backends for faster processing. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the desired BLAS backend ([source](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast)).\n",
     "\n",
     "Example installation with cuBLAS backend:"
    ]
@@ -177,7 +177,11 @@
     "\n",
     "You don't need an `API_TOKEN` as you will run the LLM locally.\n",
     "\n",
-    "It is worth understanding which models are suitable to be used on the desired machine."
+    "It is worth understanding which models are suitable to be used on the desired machine.\n",
+    "\n",
+    "[TheBloke's](https://huggingface.co/TheBloke) Hugging Face models have a `Provided files` section that exposes the RAM required to run models of different quantisation sizes and methods (eg: [Llama2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF#provided-files)).\n",
+    "\n",
+    "This [github issue](https://github.com/facebookresearch/llama/issues/425) is also relevant to find the right model for your machine."
    ]
   },
   {