From 279156e116a0bc72e172e5dc8050547ef6adbd8a Mon Sep 17 00:00:00 2001
From: Richard Guo <richardg7890@gmail.com>
Date: Thu, 18 May 2023 16:07:57 -0400
Subject: [PATCH] GPT4All Updated Docs and FAQ (#632)

* working on docs

* more doc organization

* faq

* some reformatting
---
 gpt4all-bindings/python/docs/gpt4all_faq.md   | 39 +++++++++++++++++++
 .../python/docs/gpt4all_python.md             | 37 ++++++++++++++++--
 gpt4all-bindings/python/docs/index.md         | 27 +++----------
 gpt4all-bindings/python/mkdocs.yml            |  5 ++-
 4 files changed, 81 insertions(+), 27 deletions(-)
 create mode 100644 gpt4all-bindings/python/docs/gpt4all_faq.md

diff --git a/gpt4all-bindings/python/docs/gpt4all_faq.md b/gpt4all-bindings/python/docs/gpt4all_faq.md
new file mode 100644
index 00000000..967cec42
--- /dev/null
+++ b/gpt4all-bindings/python/docs/gpt4all_faq.md
@@ -0,0 +1,39 @@
+# GPT4All FAQ
+
+## What models are supported by the GPT4All ecosystem?
+
+Currently, there are three different model architectures that are supported:
+
+1. GPTJ - Based off of the GPT-J architecture with examples found [here](https://huggingface.co/EleutherAI/gpt-j-6b)
+2. LLAMA - Based off of the LLAMA architecture with examples found [here](https://huggingface.co/models?sort=downloads&search=llama)
+3. MPT - Based off of Mosaic ML's MPT architecture with examples found [here](https://huggingface.co/mosaicml/mpt-7b)
+
+## Why so many different architectures? What differentiates them?
+
+One of the major differences is license. Currently, the LLAMA based models are subject to a non-commercial license, whereas the GPTJ and MPT base models allow commercial usage. In the early advent of the recent explosion of activity in open source local models, the llama models have generally been seen as performing better, but that is changing quickly. Every week - even every day! - new models are released with some of the GPTJ and MPT models competitive in performance/quality with LLAMA. What's more, there are some very nice architectural innovations with the MPT models that could lead to new performance/quality gains.
+
+## How does GPT4All make these models available for CPU inference?
+
+By leveraging the ggml library written by Georgi Gerganov and a growing community of developers. There are currently multiple different versions of this library. The original github repo can be found [here](https://github.com/ggerganov/ggml), but the developer of the library has also created a LLAMA based version [here](https://github.com/ggerganov/llama.cpp). Currently, this backend is using the latter as a submodule.
+
+## Does that mean GPT4All is compatible with all llama.cpp models and vice versa?
+
+Unfortunately, no for three reasons:
+
+1. The upstream [llama.cpp](https://github.com/ggerganov/llama.cpp) project has introduced [a compatibility breaking](https://github.com/ggerganov/llama.cpp/commit/b9fd7eee57df101d4a3e3eabc9fd6c2cb13c9ca1) re-quantization method recently. This is a breaking change that renders all previous models (including the ones that GPT4All uses) inoperative with newer versions of llama.cpp since that change.
+2. The GPT4All backend has the llama.cpp submodule specifically pinned to a version prior to this breaking change.
+3. The GPT4All backend currently supports MPT based models as an added feature. Neither llama.cpp nor the original ggml repo support this architecture as of this writing, however efforts are underway to make MPT available in the ggml repo which you can follow [here.](https://github.com/ggerganov/ggml/pull/145)
+
+## What is being done to make them more compatible?
+
+A few things. Number one, we are maintaining compatibility with our current model zoo by way of the submodule pinning. However, we are also exploring how we can update to newer versions of llama.cpp without breaking our current models. This might involve an additional magic header check or it could possibly involve keeping the currently pinned submodule and also adding a new submodule with later changes and differienting them with namespaces or some other manner. Investigations continue.
+
+## What about GPU inference?
+
+In newer versions of llama.cpp, there has been some added support for NVIDIA GPU's for inference. We're investigating how to incorporate this into our downloadable installers.
+
+## Ok, so bottom line... how do I make my model on huggingface compatible with GPT4All ecosystem right now?
+
+1. Check to make sure the huggingface model is available in one of our three supported architectures
+2. If it is, then you can use the conversion script inside of our pinned llama.cpp submodule for GPTJ and LLAMA based models
+3. Or if your model is an MPT model you can use the conversion script located directly in this backend directory under the scripts subdirectory 
\ No newline at end of file
diff --git a/gpt4all-bindings/python/docs/gpt4all_python.md b/gpt4all-bindings/python/docs/gpt4all_python.md
index 0f1a9162..bd3f08d0 100644
--- a/gpt4all-bindings/python/docs/gpt4all_python.md
+++ b/gpt4all-bindings/python/docs/gpt4all_python.md
@@ -1,6 +1,37 @@
 # GPT4All Python API
-The `GPT4All` provides a universal API to call all GPT4All models and 
-introduces additional helpful functionality such as downloading models.
+The `GPT4All` package provides Python bindings and an API to our C/C++ model backend libraries.
+The source code, README, and local build instructions can be found [here](https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/python).
+
+
+## Quickstart
+
+```bash
+pip install gpt4all
+```
+
+In Python, run the following commands to retrieve a GPT4All model and generate a response
+to a prompt.
+
+**Download Note:**
+By default, models are stored in `~/.cache/gpt4all/` (you can change this with `model_path`). If the file already exists, model download will be skipped.
+
+```python
+import gpt4all
+gptj = gpt4all.GPT4All("ggml-gpt4all-j-v1.3-groovy")
+messages = [{"role": "user", "content": "Name 3 colors"}]
+gptj.chat_completion(messages)
+```
+
+## Give it a try!
+[Google Colab Tutorial](https://colab.research.google.com/drive/1QRFHV5lj1Kb7_tGZZGZ-E6BfX6izpeMI?usp=sharing)
+
+## Supported Models
+Python bindings support the following ggml architectures: `gptj`, `llama`, `mpt`. See API reference for more details.
+
+## Best Practices
+
+There are two methods to interface with the underlying language model, `chat_completion()` and `generate()`. Chat completion formats a user-provided message dictionary into a prompt template (see API documentation for more details and options). This will usually produce much better results and is the approach we recommend. You may also prompt the model with `generate()` which will just pass the raw input string to the model. 
+
+## API Reference
 
 ::: gpt4all.gpt4all.GPT4All
-
diff --git a/gpt4all-bindings/python/docs/index.md b/gpt4all-bindings/python/docs/index.md
index cfd70ad2..c4469b42 100644
--- a/gpt4all-bindings/python/docs/index.md
+++ b/gpt4all-bindings/python/docs/index.md
@@ -1,32 +1,15 @@
-# GPT4All with Python
+# GPT4All
 
-In this package, we introduce Python bindings built around GPT4All's C/C++ model backends.
+GTP4All is an ecosystem to train and deploy **powerful** and **customized** large language models that run locally on consumer grade CPUs.
 
-## Quickstart
 
-```bash
-pip install gpt4all
-```
+## Models
 
-In Python, run the following commands to retrieve a GPT4All model and generate a response
-to a prompt.
+A GPT4All model is a 3GB - 8GB file that you can download and plug into the GPT4All open-source ecosystem software. **Nomic AI** supports and maintains this software ecosystem to enforce quality and security alongside spearheading the effort to allow any person or enterprise to easily train and deploy their own on-edge large language models. 
 
-**Download Note:**
-By default, models are stored in `~/.cache/gpt4all/` (you can change this with `model_path`). If the file already exists, model download will be skipped.
-
-```python
-import gpt4all
-gptj = gpt4all.GPT4All("ggml-gpt4all-j-v1.3-groovy")
-messages = [{"role": "user", "content": "Name 3 colors"}]
-gptj.chat_completion(messages)
-```
-
-## Give it a try!
-[Google Colab Tutorial](https://colab.research.google.com/drive/1QRFHV5lj1Kb7_tGZZGZ-E6BfX6izpeMI?usp=sharing)
+See FAQ for frequently asked questions about GPT4All model backends.
 
 
 ## Best Practices
 GPT4All models are designed to run locally on your own CPU. Large prompts may require longer computation time and
 result in worse performance. Giving an instruction to the model will typically produce the best results.
-
-There are two methods to interface with the underlying language model, `chat_completion()` and `generate()`. Chat completion formats a user-provided message dictionary into a prompt template (see API documentation for more details and options). This will usually produce much better results and is the approach we recommend. You may also prompt the model with `generate()` which will just pass the raw input string to the model. 
\ No newline at end of file
diff --git a/gpt4all-bindings/python/mkdocs.yml b/gpt4all-bindings/python/mkdocs.yml
index 5763665f..29377da9 100644
--- a/gpt4all-bindings/python/mkdocs.yml
+++ b/gpt4all-bindings/python/mkdocs.yml
@@ -10,10 +10,11 @@ use_directory_urls: false
 nav:
     - 'index.md'
     - 'gpt4all_chat.md'
+    - 'gpt4all_python.md'
     - 'Tutorials':
       - 'gpt4all_modal.md'
-    - 'API Reference':
-      - 'gpt4all_python.md'
+    - 'Wiki':
+      - 'gpt4all_faq.md'
 
 theme:
   name: material