From 08badb61942fe37bb2c64223d315630ebb27c4b1 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Wed, 29 Mar 2023 17:13:55 -0400
Subject: [PATCH 01/45] Update README.md

---
 README.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/README.md b/README.md
index 770a2806..68046118 100644
--- a/README.md
+++ b/README.md
@@ -28,6 +28,14 @@ Clone this repository down and place the quantized model in the `chat` directory
 
 To compile for custom hardware, see our fork of the [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repo.
 
+-----------
+
+[Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin)
+
+This model had all refusal to answer responses removed from training. Try it with:
+- `cd chat;./gpt4all-lora-quantized-OSX-m1 -m gpt4all-lora-unfiltered-quantized.bin`
+
+-----------
 Note: the full model on GPU (16GB of RAM required) performs much better in our qualitative evaluations.
 
 # Reproducibility

From e7a73a1642b2ff1a2a3332f37b25828a2243b412 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Wed, 29 Mar 2023 17:18:21 -0400
Subject: [PATCH 02/45] Update README.md

---
 README.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/README.md b/README.md
index 68046118..30cd7349 100644
--- a/README.md
+++ b/README.md
@@ -172,3 +172,6 @@ If you utilize this reposistory, models or data in a downstream project, please
 ### Alternative Download Locations
 #### gpt4all-lora-quantized.bin Backup Torrent Link
 magnet:?xt=urn:btih:1F11A9691EE06C18F0040E359361DCA0479BCB5A&dn=gpt4all-lora-quantized.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce
+
+### Unfiltered
+https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent

From bbf589d06770d24ec195e78baf5893556df7a811 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Wed, 29 Mar 2023 17:18:46 -0400
Subject: [PATCH 03/45] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 30cd7349..f21c788b 100644
--- a/README.md
+++ b/README.md
@@ -173,5 +173,5 @@ If you utilize this reposistory, models or data in a downstream project, please
 #### gpt4all-lora-quantized.bin Backup Torrent Link
 magnet:?xt=urn:btih:1F11A9691EE06C18F0040E359361DCA0479BCB5A&dn=gpt4all-lora-quantized.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce
 
-### Unfiltered
+#### Unfiltered Checkpoint Torrent Link
 https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent

From 614d4fab184c95c279cce2f66adafb30475efbe9 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Wed, 29 Mar 2023 17:22:05 -0400
Subject: [PATCH 04/45] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index f21c788b..708f7616 100644
--- a/README.md
+++ b/README.md
@@ -170,8 +170,8 @@ If you utilize this reposistory, models or data in a downstream project, please
 ```
 
 ### Alternative Download Locations
-#### gpt4all-lora-quantized.bin Backup Torrent Link
+#### gpt4all-lora-quantized.bin Torrent Link
 magnet:?xt=urn:btih:1F11A9691EE06C18F0040E359361DCA0479BCB5A&dn=gpt4all-lora-quantized.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce
 
-#### Unfiltered Checkpoint Torrent Link
+#### gpt4all-lora-unfiltered-quantized.bin Torrent Link
 https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent

From 248f4fc324af8a5a8b6f91e4cc9a1558e10f974f Mon Sep 17 00:00:00 2001
From: Brandon Duderstadt <brandonduderstadt@gmail.com>
Date: Wed, 29 Mar 2023 22:36:43 -0400
Subject: [PATCH 05/45] Update README.md

---
 README.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/README.md b/README.md
index 708f7616..a8a13c43 100644
--- a/README.md
+++ b/README.md
@@ -155,7 +155,23 @@ python generate.py --config configs/generate/generate.yaml --prompt "Write a scr
 ### What is a three word topic describing the following keywords: baseball, football, soccer: 
 >Sports, athletics, games
     
+### GPU Interface
+There are two ways to get up and running with this model on GPU.
+1. clone the nomic client [repo](https://github.com/nomic-ai/nomic) and run `pip install .[GPT4All]` in the home dir.
+2. run `pip install nomic` and install the additional deps from the wheels built [here](https://github.com/nomic-ai/nomic/tree/main/bin)
 
+Once this is done, you can run the model on GPU with a script like the following:
+```
+from nomic import GPT4AllGPU
+m = GPT4AllGPU(LLAMA_PATH)
+config = {'num_beams': 2,
+          'min_new_tokens': 10,
+          'max_length': 100,
+          'repetition_penalty': 2.0}
+out = m.generate('write me a story about a lonely computer', config)
+print(out)
+```
+You can pass any of the [huggingface generation config params](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig) in the config.
 
 If you utilize this reposistory, models or data in a downstream project, please consider citing it with:
 ```

From 4d282b00ad349e8822cb515ac13360cc5ad4faa6 Mon Sep 17 00:00:00 2001
From: Feldwor <BoQsc@users.noreply.github.com>
Date: Thu, 30 Mar 2023 16:56:12 +0300
Subject: [PATCH 06/45] Update README.md - Move Torrent/Magnet links to save
 space in the readme file.

---
 README.md | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/README.md b/README.md
index a8a13c43..2bc31acd 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ Run on M1 Mac (not sped up!)
 
 # Try it yourself
 
-Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin).
+Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized) 
 
 
 Clone this repository down and place the quantized model in the `chat` directory and start chatting by running:
@@ -30,7 +30,7 @@ To compile for custom hardware, see our fork of the [Alpaca C++](https://github.
 
 -----------
 
-[Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin)
+[Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) - [[Torrent]](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent)
 
 This model had all refusal to answer responses removed from training. Try it with:
 - `cd chat;./gpt4all-lora-quantized-OSX-m1 -m gpt4all-lora-unfiltered-quantized.bin`
@@ -184,10 +184,3 @@ If you utilize this reposistory, models or data in a downstream project, please
   howpublished = {\url{https://github.com/nomic-ai/gpt4all}},
 }
 ```
-
-### Alternative Download Locations
-#### gpt4all-lora-quantized.bin Torrent Link
-magnet:?xt=urn:btih:1F11A9691EE06C18F0040E359361DCA0479BCB5A&dn=gpt4all-lora-quantized.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce
-
-#### gpt4all-lora-unfiltered-quantized.bin Torrent Link
-https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent

From 6ce06359e928edae8ecd9515c14d800b058ce2c4 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Thu, 30 Mar 2023 10:30:50 -0400
Subject: [PATCH 07/45] Updated training data link

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index a8a13c43..951fce63 100644
--- a/README.md
+++ b/README.md
@@ -45,7 +45,7 @@ Trained LoRa Weights:
 - gpt4all-lora-epoch-2 (three full epochs of training) https://huggingface.co/nomic-ai/gpt4all-lora-epoch-2
 
 Raw Data:
-- [Training Data Without P3](https://s3.amazonaws.com/static.nomic.ai/gpt4all/2022_03_27/gpt4all_curated_data_without_p3_2022_03_27.tar.gz)
+- [Training Data Without P3](https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations)
   - Explorer: https://atlas.nomic.ai/map/gpt4all_data_clean_without_p3
 - [Full Dataset with P3](https://s3.amazonaws.com/static.nomic.ai/gpt4all/2022_03_27/gpt4all_curated_data_full_2022_03_27.tar.gz)
   - Explorer: https://atlas.nomic.ai/map/gpt4all_data_clean

From 0536aa9c54be51c3f43a44833584471ed4fbbf06 Mon Sep 17 00:00:00 2001
From: Feldwor <BoQsc@users.noreply.github.com>
Date: Thu, 30 Mar 2023 17:32:17 +0300
Subject: [PATCH 08/45] Update README.md - Improve the Try it yourself section.

---
 README.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/README.md b/README.md
index 2bc31acd..7dea0f12 100644
--- a/README.md
+++ b/README.md
@@ -16,17 +16,17 @@ Run on M1 Mac (not sped up!)
 
 # Try it yourself
 
-Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized) 
+Here's how to get started with the CPU quantized gpt4all model checkpoint:
 
+1. Download the `gpt4all-lora-quantized.bin` file from [Direct Link](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) or [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized).
+2. Clone this repository, navigate to `chat`, and place the downloaded file there.
+3. Run the appropriate command for your OS:
+   - M1 Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-m1`
+   - Linux: `cd chat;./gpt4all-lora-quantized-linux-x86`
+   - Windows (PowerShell): `cd chat;./gpt4all-lora-quantized-win64.exe`
+   - Intel Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-intel`
 
-Clone this repository down and place the quantized model in the `chat` directory and start chatting by running:
-
-- `cd chat;./gpt4all-lora-quantized-OSX-m1` on M1 Mac/OSX
-- `cd chat;./gpt4all-lora-quantized-linux-x86` on Linux
-- `cd chat;./gpt4all-lora-quantized-win64.exe` on Windows (PowerShell)
-- `cd chat;./gpt4all-lora-quantized-OSX-intel` on Intel Mac/OSX
-
-To compile for custom hardware, see our fork of the [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repo.
+For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repository.
 
 -----------
 

From 0a552594243f0523b536bd854c53af7a4ab8c69b Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Thu, 30 Mar 2023 10:32:52 -0400
Subject: [PATCH 09/45] Torrent Magnet Link Update

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 9a0dc33b..8cc4515f 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ Run on M1 Mac (not sped up!)
 
 # Try it yourself
 
-Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized) 
+Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](magnet:?xt=urn:btih:EE5150157050CB5D1979669A1EA14FC2C4C3692E&dn=gpt4all-lora-quantized.bin&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a80%2fannounce&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce) 
 
 
 Clone this repository down and place the quantized model in the `chat` directory and start chatting by running:

From 741e52a886a71bef9585044db41786d87bb07459 Mon Sep 17 00:00:00 2001
From: Feldwor <BoQsc@users.noreply.github.com>
Date: Thu, 30 Mar 2023 17:40:43 +0300
Subject: [PATCH 10/45] Update README.md - Fix GitHub Markdown does not
 recognize Torrent Magnets.

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8cc4515f..9a0dc33b 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ Run on M1 Mac (not sped up!)
 
 # Try it yourself
 
-Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](magnet:?xt=urn:btih:EE5150157050CB5D1979669A1EA14FC2C4C3692E&dn=gpt4all-lora-quantized.bin&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a80%2fannounce&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce) 
+Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized) 
 
 
 Clone this repository down and place the quantized model in the `chat` directory and start chatting by running:

From 2ad7cf7ba6e30b050288a03c8dbe464ad2c5182f Mon Sep 17 00:00:00 2001
From: bstadt <brandonduderstadt@gmail.com>
Date: Thu, 30 Mar 2023 11:10:07 -0400
Subject: [PATCH 11/45] added roadmap

---
 README.md | 70 +++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 53 insertions(+), 17 deletions(-)

diff --git a/README.md b/README.md
index 8cc4515f..52a68912 100644
--- a/README.md
+++ b/README.md
@@ -38,6 +38,58 @@ This model had all refusal to answer responses removed from training. Try it wit
 -----------
 Note: the full model on GPU (16GB of RAM required) performs much better in our qualitative evaluations.
 
+# Python Client
+## CPU Interface
+To get running using the python client with the CPU interface, first install the [nomic client](https://github.com/nomic-ai/nomic) using `pip install nomic`
+Then, you can use the following script to interact with GPU4All:
+```
+from nomic import GPT4All
+m = GPT4All()
+m.connect()
+m.prompt('write me a story about a lonely computer')
+```
+
+## GPU Interface
+There are two ways to get up and running with this model on GPU.
+The setup here is slightly more involved than the CPU model.
+1. clone the nomic client [repo](https://github.com/nomic-ai/nomic) and run `pip install .[GPT4All]` in the home dir.
+2. run `pip install nomic` and install the additional deps from the wheels built [here](https://github.com/nomic-ai/nomic/tree/main/bin)
+
+Once this is done, you can run the model on GPU with a script like the following:
+```
+from nomic import GPT4AllGPU
+m = GPT4AllGPU(LLAMA_PATH)
+config = {'num_beams': 2,
+          'min_new_tokens': 10,
+          'max_length': 100,
+          'repetition_penalty': 2.0}
+out = m.generate('write me a story about a lonely computer', config)
+print(out)
+```
+Where LLAMA_PATH is the path to a Huggingface Automodel compliant LLAMA model.
+Nomic is unable to distribute this file at this time.
+We are working on a GPT4All that does not have this limitation right now.
+
+You can pass any of the [huggingface generation config params](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig) in the config.
+
+# Roadmap
+## Short Term
+ - <span style="color:green">(IN PROGRESS)</span> Train a GPT4All model based on GPTJ to alleviate llama distribution issues.
+ - <span style="color:green">(IN PROGRESS)</span> Create improved CPU and GPU interfaces for this model.
+ - <span style="color:red">(NOT STARTED)</span> Integrate llama.cpp bindings
+ - <span style="color:red">(NOT STARTED)</span> Create a good conversational chat interface for the model.
+ - <span style="color:red">(NOT STARTED)</span> Allow users to opt in and submit their chats for subsequent training runs
+
+## Medium Term
+ - <span style="color:red">(NOT STARTED)</span> Integrate GPT4All with [Atlas](https://atlas.nomic.ai) to allow for document retrieval.
+   - BLOCKED by GPT4All based on GPTJ
+ - <span style="color:red">(NOT STARTED)</span> Integrate GPT4All with Langchain.
+ - <span style="color:red">(NOT STARTED)</span> Build easy custom training scripts to allow users to fine tune models.
+
+## Long Term
+ - <span style="color:red">(NOT STARTED)</span> Allow anyone to curate training data for subsequent GPT4All releases using Atlas.
+ - <span style="color:green">(IN PROGRESS)</span> Democratize AI. 
+
 # Reproducibility
 
 Trained LoRa Weights:
@@ -155,23 +207,7 @@ python generate.py --config configs/generate/generate.yaml --prompt "Write a scr
 ### What is a three word topic describing the following keywords: baseball, football, soccer: 
 >Sports, athletics, games
     
-### GPU Interface
-There are two ways to get up and running with this model on GPU.
-1. clone the nomic client [repo](https://github.com/nomic-ai/nomic) and run `pip install .[GPT4All]` in the home dir.
-2. run `pip install nomic` and install the additional deps from the wheels built [here](https://github.com/nomic-ai/nomic/tree/main/bin)
-
-Once this is done, you can run the model on GPU with a script like the following:
-```
-from nomic import GPT4AllGPU
-m = GPT4AllGPU(LLAMA_PATH)
-config = {'num_beams': 2,
-          'min_new_tokens': 10,
-          'max_length': 100,
-          'repetition_penalty': 2.0}
-out = m.generate('write me a story about a lonely computer', config)
-print(out)
-```
-You can pass any of the [huggingface generation config params](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig) in the config.
+## Citation
 
 If you utilize this reposistory, models or data in a downstream project, please consider citing it with:
 ```

From 8ac7c1a9fe623af38413519c06f05f46fb3358dd Mon Sep 17 00:00:00 2001
From: Ikko Eltociear Ashimine <eltociear@gmail.com>
Date: Fri, 31 Mar 2023 00:53:53 +0900
Subject: [PATCH 12/45] Fix typo in TRAINING_LOG.md

Conditonal -> Conditional
---
 TRAINING_LOG.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/TRAINING_LOG.md b/TRAINING_LOG.md
index 31b9bb21..50469645 100644
--- a/TRAINING_LOG.md
+++ b/TRAINING_LOG.md
@@ -160,7 +160,7 @@ We realized that we had two bugs however:
 - We accidentally duplicated data and effectively trained for 2 epochs instead of 1
 - We added an eos token to every sequence, even those that we truncated (e.g. long code that exceeds the 1024).
 
-## Conditonal EOS and 1 Epoch
+## Conditional EOS and 1 Epoch
 
 Using the same parameters, we then trained a model using a "conditional" eos token where we only add an `eos` when the inputs are less than the maximum sequence length for one epoch.
 

From 8c9c02e42b9d297ca4f35e251f087d7a51a8c0c5 Mon Sep 17 00:00:00 2001
From: bstadt <brandonduderstadt@gmail.com>
Date: Thu, 30 Mar 2023 12:32:14 -0400
Subject: [PATCH 13/45] updated roadmap

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 52a68912..fd9cfdc0 100644
--- a/README.md
+++ b/README.md
@@ -84,7 +84,7 @@ You can pass any of the [huggingface generation config params](https://huggingfa
  - <span style="color:red">(NOT STARTED)</span> Integrate GPT4All with [Atlas](https://atlas.nomic.ai) to allow for document retrieval.
    - BLOCKED by GPT4All based on GPTJ
  - <span style="color:red">(NOT STARTED)</span> Integrate GPT4All with Langchain.
- - <span style="color:red">(NOT STARTED)</span> Build easy custom training scripts to allow users to fine tune models.
+ - <span style="color:green">(IN PROGRESS)</span> Build easy custom training scripts to allow users to fine tune models.
 
 ## Long Term
  - <span style="color:red">(NOT STARTED)</span> Allow anyone to curate training data for subsequent GPT4All releases using Atlas.

From 40ea0a74d01d78e8f30efe7e66ceee4a9416fb5d Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Thu, 30 Mar 2023 12:54:28 -0400
Subject: [PATCH 14/45] Huggingface Datasets link

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 351a8de9..d33c9c96 100644
--- a/README.md
+++ b/README.md
@@ -99,7 +99,7 @@ Trained LoRa Weights:
 Raw Data:
 - [Training Data Without P3](https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations)
   - Explorer: https://atlas.nomic.ai/map/gpt4all_data_clean_without_p3
-- [Full Dataset with P3](https://s3.amazonaws.com/static.nomic.ai/gpt4all/2022_03_27/gpt4all_curated_data_full_2022_03_27.tar.gz)
+- [Full Dataset with P3](https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations_with_p3)
   - Explorer: https://atlas.nomic.ai/map/gpt4all_data_clean
 
 We are not distributing a LLaMa 7B checkpoint.

From de0f8602ca7799783ef25be1d7851ad3e41edd2b Mon Sep 17 00:00:00 2001
From: Benjamin Schmidt <bmschmidt@gmail.com>
Date: Thu, 30 Mar 2023 13:46:03 -0400
Subject: [PATCH 15/45] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index d33c9c96..b0c53fc3 100644
--- a/README.md
+++ b/README.md
@@ -43,7 +43,7 @@ Note: the full model on GPU (16GB of RAM required) performs much better in our q
 To get running using the python client with the CPU interface, first install the [nomic client](https://github.com/nomic-ai/nomic) using `pip install nomic`
 Then, you can use the following script to interact with GPU4All:
 ```
-from nomic import GPT4All
+from nomic.gpt4all import GPT4All
 m = GPT4All()
 m.connect()
 m.prompt('write me a story about a lonely computer')
@@ -57,7 +57,7 @@ The setup here is slightly more involved than the CPU model.
 
 Once this is done, you can run the model on GPU with a script like the following:
 ```
-from nomic import GPT4AllGPU
+from nomic.gpt4all import GPT4AllGPU
 m = GPT4AllGPU(LLAMA_PATH)
 config = {'num_beams': 2,
           'min_new_tokens': 10,

From 377a09fdedfe3fbedac45ab0223ab7280cc26847 Mon Sep 17 00:00:00 2001
From: Benjamin Schmidt <bmschmidt@gmail.com>
Date: Thu, 30 Mar 2023 13:47:04 -0400
Subject: [PATCH 16/45] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b0c53fc3..01de1072 100644
--- a/README.md
+++ b/README.md
@@ -45,7 +45,7 @@ Then, you can use the following script to interact with GPU4All:
 ```
 from nomic.gpt4all import GPT4All
 m = GPT4All()
-m.connect()
+m.open()
 m.prompt('write me a story about a lonely computer')
 ```
 

From 5c9b1817899af2ad4932dfe4e4fddbde0e2797e8 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto <mudler@users.noreply.github.com>
Date: Thu, 30 Mar 2023 21:51:40 +0200
Subject: [PATCH 17/45] Fix typo

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 01de1072..fbcd5b76 100644
--- a/README.md
+++ b/README.md
@@ -41,7 +41,7 @@ Note: the full model on GPU (16GB of RAM required) performs much better in our q
 # Python Client
 ## CPU Interface
 To get running using the python client with the CPU interface, first install the [nomic client](https://github.com/nomic-ai/nomic) using `pip install nomic`
-Then, you can use the following script to interact with GPU4All:
+Then, you can use the following script to interact with GPT4All:
 ```
 from nomic.gpt4all import GPT4All
 m = GPT4All()

From 495effae7ba7c3b0b93af5ab445fb9f50e115173 Mon Sep 17 00:00:00 2001
From: Yuvanesh-ux <68208096+Yuvanesh-ux@users.noreply.github.com>
Date: Thu, 30 Mar 2023 17:53:24 -0400
Subject: [PATCH 18/45] Update README.md

---
 README.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/README.md b/README.md
index fbcd5b76..e17c65fe 100644
--- a/README.md
+++ b/README.md
@@ -138,6 +138,10 @@ accelerate launch --dynamo_backend=inductor --num_processes=8 --num_machines=1 -
 python generate.py --config configs/generate/generate.yaml --prompt "Write a script to reverse a string in Python"
 ```
 
+## Need Help?
+
+Join the <a href="https://discord.gg/kvmy6dQB"> Discord </a> and ask for help in `#gpt4all-help`
+
 # Sample Generations
 
 ### Provide instructions for the given exercise. Leg Raises

From 632c44b606a9b8d2580a84120f251bfa91fad6f7 Mon Sep 17 00:00:00 2001
From: Sajjad <haq.sajjad220@gmail.com>
Date: Fri, 31 Mar 2023 02:48:14 -0500
Subject: [PATCH 19/45] Update README.md unfiltered.bin Instructions

Added terminal commands to run gpt4all-lora-unfiltered-quantized.bin on Mac, Windows, Linux, Intel OS
---
 README.md | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index e17c65fe..15a4baaf 100644
--- a/README.md
+++ b/README.md
@@ -33,8 +33,11 @@ For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussb
 [Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) - [[Torrent]](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent)
 
 This model had all refusal to answer responses removed from training. Try it with:
-- `cd chat;./gpt4all-lora-quantized-OSX-m1 -m gpt4all-lora-unfiltered-quantized.bin`
-
+- ``
+- M1 Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-m1 -m gpt4all-lora-unfiltered-quantized.bin`
+- Linux: `cd chat;./gpt4all-lora-quantized-linux-x86 -m gpt4all-lora-unfiltered-quantized.bin`
+- Windows (PowerShell): `cd chat;./gpt4all-lora-quantized-win64.exe -m gpt4all-lora-unfiltered-quantized.bin`
+- Intel Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-intel -m gpt4all-lora-unfiltered-quantized.bin`
 -----------
 Note: the full model on GPU (16GB of RAM required) performs much better in our qualitative evaluations.
 

From 64c346fb08cf2c617ceacd0d96468cae5bc9969a Mon Sep 17 00:00:00 2001
From: Sajjad <haq.sajjad220@gmail.com>
Date: Fri, 31 Mar 2023 02:50:02 -0500
Subject: [PATCH 20/45] Update README.md

removed extra line: ``
---
 README.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/README.md b/README.md
index 15a4baaf..a7c635a6 100644
--- a/README.md
+++ b/README.md
@@ -33,7 +33,6 @@ For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussb
 [Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) - [[Torrent]](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent)
 
 This model had all refusal to answer responses removed from training. Try it with:
-- ``
 - M1 Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-m1 -m gpt4all-lora-unfiltered-quantized.bin`
 - Linux: `cd chat;./gpt4all-lora-quantized-linux-x86 -m gpt4all-lora-unfiltered-quantized.bin`
 - Windows (PowerShell): `cd chat;./gpt4all-lora-quantized-win64.exe -m gpt4all-lora-unfiltered-quantized.bin`

From 6a9b3fc3f7aabd1cce0fb569e612fe739da8b046 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Fri, 31 Mar 2023 12:29:38 -0400
Subject: [PATCH 21/45] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index e17c65fe..02f55f92 100644
--- a/README.md
+++ b/README.md
@@ -26,7 +26,7 @@ Here's how to get started with the CPU quantized gpt4all model checkpoint:
    - Windows (PowerShell): `cd chat;./gpt4all-lora-quantized-win64.exe`
    - Intel Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-intel`
 
-For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repository.
+For custom hardware compilation, see our [llama.cpp](https://github.com/zanussbaum/gpt4all.cpp) fork.
 
 -----------
 

From e07985e83ca2d3e4c9fc28669cd41d3055f9eae3 Mon Sep 17 00:00:00 2001
From: ParisNeo <aloui.seifeddine@gmail.com>
Date: Sat, 1 Apr 2023 01:16:16 +0200
Subject: [PATCH 22/45] Added vscode files to gitignore

---
 .gitignore | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index 8addd972..02ba78ce 100644
--- a/.gitignore
+++ b/.gitignore
@@ -161,4 +161,8 @@ cython_debug/
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
\ No newline at end of file
+#.idea/
+
+
+# vs code
+.vscode
\ No newline at end of file

From bc99eabfa1c5d1db4fa9f5e21256eb5d2871c67f Mon Sep 17 00:00:00 2001
From: ParisNeo <aloui.seifeddine@gmail.com>
Date: Sat, 1 Apr 2023 01:35:50 +0200
Subject: [PATCH 23/45] added *.bin to the gitignore

---
 .gitignore | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index 02ba78ce..14e10a78 100644
--- a/.gitignore
+++ b/.gitignore
@@ -165,4 +165,5 @@ cython_debug/
 
 
 # vs code
-.vscode
\ No newline at end of file
+.vscode
+*.bin
\ No newline at end of file

From c0b3de38140e62c0d6f7a7236de45ea1dd9d3c66 Mon Sep 17 00:00:00 2001
From: HiraduNakamura <127570430+HiraduNakamura@users.noreply.github.com>
Date: Fri, 31 Mar 2023 20:26:09 -0400
Subject: [PATCH 24/45] Made capitalization consistent

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 02f55f92..d62c875e 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ Run on M1 Mac (not sped up!)
 
 # Try it yourself
 
-Here's how to get started with the CPU quantized gpt4all model checkpoint:
+Here's how to get started with the CPU quantized GPT4All model checkpoint:
 
 1. Download the `gpt4all-lora-quantized.bin` file from [Direct Link](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) or [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized).
 2. Clone this repository, navigate to `chat`, and place the downloaded file there.

From 2dfef9741a99405290d5f7fb596fe2b032e7345c Mon Sep 17 00:00:00 2001
From: gourcetools <120996278+gourcetools@users.noreply.github.com>
Date: Sat, 1 Apr 2023 17:30:40 +0200
Subject: [PATCH 25/45] Create launcher.sh

The script detects the user's operating system, lists available .bin files and prompts the user to select a .bin file to run.
Ensuring a more user-friendly experience.
---
 launcher.sh | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 launcher.sh

diff --git a/launcher.sh b/launcher.sh
new file mode 100644
index 00000000..ed7b99cd
--- /dev/null
+++ b/launcher.sh
@@ -0,0 +1,88 @@
+#!/bin/bash
+
+# Display header
+echo "=========================================================="
+echo " ██████  ██████  ████████ ██   ██  █████  ██      ██      "
+echo "██       ██   ██    ██    ██   ██ ██   ██ ██      ██      "
+echo "██   ███ ██████     ██    ███████ ███████ ██      ██      "
+echo "██    ██ ██         ██         ██ ██   ██ ██      ██      "
+echo " ██████  ██         ██         ██ ██   ██ ███████ ███████ "
+echo " └─> https://github.com/nomic-ai/gpt4all"
+
+# Function to detect macOS architecture and set the binary filename
+detect_mac_arch() {
+  local mac_arch
+  mac_arch=$(uname -m)
+  case "$mac_arch" in
+    arm64)
+      os_type="M1 Mac/OSX"
+      binary_filename="gpt4all-lora-quantized-OSX-m1"
+      ;;
+    x86_64)
+      os_type="Intel Mac/OSX"
+      binary_filename="gpt4all-lora-quantized-OSX-intel"
+      ;;
+    *)
+      echo "Unknown macOS architecture"
+      exit 1
+      ;;
+  esac
+}
+
+# Detect operating system and set the binary filename
+case "$(uname -s)" in
+  Darwin*)
+    detect_mac_arch
+    ;;
+  Linux*)
+    if grep -q Microsoft /proc/version; then
+      os_type="Windows (WSL)"
+      binary_filename="gpt4all-lora-quantized-win64.exe"
+    else
+      os_type="Linux"
+      binary_filename="gpt4all-lora-quantized-linux-x86"
+    fi
+    ;;
+  CYGWIN*|MINGW32*|MSYS*|MINGW*)
+    os_type="Windows (Cygwin/MSYS/MINGW)"
+    binary_filename="gpt4all-lora-quantized-win64.exe"
+    ;;
+  *)
+    echo "Unknown operating system"
+    exit 1
+    ;;
+esac
+echo "================================"
+echo "== You are using $os_type."
+
+
+# Change to the chat directory
+cd chat
+
+# List .bin files and prompt user to select one
+bin_files=(*.bin)
+echo "== Available .bin files:"
+for i in "${!bin_files[@]}"; do
+  echo "   [$((i+1))] ${bin_files[i]}"
+done
+
+# Function to get user input and validate it
+get_valid_user_input() {
+  local input_valid=false
+
+  while ! $input_valid; do
+    echo "==> Please enter a number:"
+    read -r user_selection
+    if [[ $user_selection =~ ^[0-9]+$ ]] && (( user_selection >= 1 && user_selection <= ${#bin_files[@]} )); then
+      input_valid=true
+    else
+      echo "Invalid input. Please enter a number between 1 and ${#bin_files[@]}."
+    fi
+  done
+}
+
+get_valid_user_input
+selected_bin_file="${bin_files[$((user_selection-1))]}"
+
+# Run the selected .bin file with the appropriate command
+./"$binary_filename" -m "$selected_bin_file"

From 157bb8f60218e05cb06a682fe3257f28aeef97a5 Mon Sep 17 00:00:00 2001
From: Wayner Barrios <waybarrios@gmail.com>
Date: Sat, 1 Apr 2023 23:52:25 -0400
Subject: [PATCH 26/45] DatasetDict to dataset object.

---
 data.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/data.py b/data.py
index 0e356f7d..ff79924c 100644
--- a/data.py
+++ b/data.py
@@ -68,7 +68,7 @@ def load_data(config, tokenizer):
         dataset = load_dataset("json", data_files=files, split="train")
 
     else:
-        dataset = load_dataset(dataset_path)
+        dataset = load_dataset(dataset_path,split='train')
 
     dataset = dataset.train_test_split(test_size=.05, seed=config["seed"])
 

From 0e74c707945a163c054be56ce371bbeecc566c35 Mon Sep 17 00:00:00 2001
From: Chris Messina <chris.messina@gmail.com>
Date: Sat, 1 Apr 2023 21:24:19 -0700
Subject: [PATCH 27/45] Update README.md

Type and formatting improvements.
---
 README.md | 66 ++++++++++++++++++++++++++++---------------------------
 1 file changed, 34 insertions(+), 32 deletions(-)

diff --git a/README.md b/README.md
index 02f55f92..bf3b8971 100644
--- a/README.md
+++ b/README.md
@@ -168,7 +168,8 @@ Join the <a href="https://discord.gg/kvmy6dQB"> Discord </a> and ask for help in
 
 ### Come up with an interesting idea for a new movie plot. Your plot should be described with a title and a summary.
 
->1. The title of your movie plot should be "The Last Stand". 2. The summary of your movie plot should be: A group of mercenaries are hired to protect a small town from a dangerous criminal who is on the run. However, the town's sheriff is not willing to let them take over and decides to take matters into his own hands.
+> 1. The title of your movie plot should be "The Last Stand". 
+> 2. The summary of your movie plot should be: A group of mercenaries are hired to protect a small town from a dangerous criminal who is on the run. However, the town's sheriff is not willing to let them take over and decides to take matters into his own hands.
 
 ### Reverse a string in python.
 > In Python, you can reverse a list or tuple by using the reversed() function on it. Here's an example of how to use this method with strings: 
@@ -178,42 +179,43 @@ Join the <a href="https://discord.gg/kvmy6dQB"> Discord </a> and ask for help in
 >   print(reversed_str)     # Output will be reversed version of the inputted string.
 
 ### List 10 dogs.
->1. Labrador Retriever
->2. Golden Retriever
->3. Beagle
->4. Poodle
->5. German Shepherd
->6. Bulldog
->7. Rottweiler
->8. Boxer
->9. Shih Tzu
->10. Pug
-
-### Write me a poem about the fall of Julius Ceasar into a ceasar salad in iambic pentameter. 
->The fall of Julius Ceasar into a ceasar salad
->In iambic pentameter,
->The mighty Roman emperor
->Fell into a salad of lettuce and croutons,
->His empire crumbling around him,
->As he was devoured by the hungry diners.
->The once mighty emperor
->Was now just a salad topping,
->His legacy reduced to a mere garnish,
->As he was consumed by the masses.
->The fall of Julius Ceasar into a ceasar salad,
->In iambic pentameter,
->Was a tragic end to a great man,
->Who had once ruled the world with an iron fist.
->But now, he was just a salad topping,
->His legacy reduced to a mere garnish,
->As he was consumed by the masses.
+> 1. Labrador Retriever
+> 2. Golden Retriever
+> 3. Beagle
+> 4. Poodle
+> 5. German Shepherd
+> 6. Bulldog
+> 7. Rottweiler
+> 8. Boxer
+> 9. Shih Tzu
+> 10. Pug
+
+### Write me a poem about the fall of Julius Caesar into a caesar salad in iambic pentameter. 
+
+> The fall of Julius Caesar into a caesar salad<br>
+> In iambic pentameter,<br>
+> The mighty Roman emperor<br>
+> Fell into a salad of lettuce and croutons,<br>
+> His empire crumbling around him,<br>
+> As he was devoured by the hungry diners.<br>
+> The once mighty emperor<br>
+> Was now just a salad topping,<br>
+> His legacy reduced to a mere garnish,<br>
+> As he was consumed by the masses.<br>
+> The fall of Julius Caesar into a caesar salad,<br>
+> In iambic pentameter,<br>
+> Was a tragic end to a great man,<br>
+> Who had once ruled the world with an iron fist.<br>
+> But now, he was just a salad topping,<br>
+> His legacy reduced to a mere garnish,<br>
+> As he was consumed by the masses.
     
 ### What is a three word topic describing the following keywords: baseball, football, soccer: 
->Sports, athletics, games
+> Sports, athletics, games
     
 ## Citation
 
-If you utilize this reposistory, models or data in a downstream project, please consider citing it with:
+If you utilize this repository, models or data in a downstream project, please consider citing it with:
 ```
 @misc{gpt4all,
   author = {Yuvanesh Anand and Zach Nussbaum and Brandon Duderstadt and Benjamin Schmidt and Andriy Mulyar},

From 4af3bccff176b1cf914fc50dfd202080d0ad1ac4 Mon Sep 17 00:00:00 2001
From: Jo Liss <joliss42@gmail.com>
Date: Sun, 2 Apr 2023 19:19:02 +0300
Subject: [PATCH 28/45] Fix `git submodule` instructions

---
 README.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 02f55f92..434ac75d 100644
--- a/README.md
+++ b/README.md
@@ -110,9 +110,10 @@ You can reproduce our trained model by doing the following:
 
 Clone the repo
 
-`git clone --recurse-submodules https://github.com/nomic-ai/gpt4all.git`
-
-`git submodule configure && git submodule update`
+```
+git clone --recurse-submodules https://github.com/nomic-ai/gpt4all.git
+git submodule update --init
+```
 
 Setup the environment
 

From a782025ab604a93efad46f8693695329c774bb30 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Mon, 3 Apr 2023 01:50:43 -0400
Subject: [PATCH 29/45] Updated Python Bindings

---
 README.md | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/README.md b/README.md
index 02f55f92..e3301f7a 100644
--- a/README.md
+++ b/README.md
@@ -4,6 +4,12 @@
 <p align="center">
 <a href="https://s3.amazonaws.com/static.nomic.ai/gpt4all/2023_GPT4All_Technical_Report.pdf">:green_book: Technical Report</a>
 </p>
+
+
+<p align="center">
+<a href="https://github.com/nomic-ai/pyllamacpp">:snake: Official Python Bindings</a>
+</p>
+
 <p align="center">
 <a href="https://discord.gg/kvmy6dQB">Discord</a>
 </p>
@@ -40,6 +46,9 @@ Note: the full model on GPU (16GB of RAM required) performs much better in our q
 
 # Python Client
 ## CPU Interface
+To run GPT4all in python, see the new [official Python bindings](https://github.com/nomic-ai/pyllamacpp).
+
+The old bindings are still available but now deprecated. They will not work in a notebook environment.
 To get running using the python client with the CPU interface, first install the [nomic client](https://github.com/nomic-ai/nomic) using `pip install nomic`
 Then, you can use the following script to interact with GPT4All:
 ```

From 4802a72f52036d05eb13fbf1431dc514eed9cdb2 Mon Sep 17 00:00:00 2001
From: Malik M Alnakhaleh <mmikema.developer@gmail.com>
Date: Mon, 3 Apr 2023 20:09:51 -0400
Subject: [PATCH 30/45] Update README.md

Fixing punctuation and capitalization to maintain consistency within the README file.
---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index e3301f7a..bda9c4f6 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
 <h1 align="center">GPT4All</h1>
-<p align="center">Demo, data and code to train an assistant-style large language model with ~800k GPT-3.5-Turbo Generations based on LLaMa</p>
+<p align="center">Demo, data, and code to train an assistant-style large language model with ~800k GPT-3.5-Turbo Generations based on LLaMa</p>
 
 <p align="center">
 <a href="https://s3.amazonaws.com/static.nomic.ai/gpt4all/2023_GPT4All_Technical_Report.pdf">:green_book: Technical Report</a>
@@ -46,7 +46,7 @@ Note: the full model on GPU (16GB of RAM required) performs much better in our q
 
 # Python Client
 ## CPU Interface
-To run GPT4all in python, see the new [official Python bindings](https://github.com/nomic-ai/pyllamacpp).
+To run GPT4All in python, see the new [official Python bindings](https://github.com/nomic-ai/pyllamacpp).
 
 The old bindings are still available but now deprecated. They will not work in a notebook environment.
 To get running using the python client with the CPU interface, first install the [nomic client](https://github.com/nomic-ai/nomic) using `pip install nomic`

From 467dc5bc7ed7993a875a0f49f35cb22db9c4a815 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Tue, 4 Apr 2023 23:23:34 -0400
Subject: [PATCH 31/45] Discord Link

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8b2e6327..cf63a86d 100644
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@
 </p>
 
 <p align="center">
-<a href="https://discord.gg/kvmy6dQB">Discord</a>
+<a href="https://discord.gg/mGZE39AS3e">Discord</a>
 </p>
 
 

From d617c54a5eb31a004acc56591a495f674de8db79 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Wed, 5 Apr 2023 12:48:54 -0400
Subject: [PATCH 32/45] GPT4All Compatibility Ecosystem

---
 README.md | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/README.md b/README.md
index cf63a86d..6bc2207e 100644
--- a/README.md
+++ b/README.md
@@ -81,6 +81,17 @@ We are working on a GPT4All that does not have this limitation right now.
 
 You can pass any of the [huggingface generation config params](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig) in the config.
 
+# GPT4All Compatibility Ecosystem
+Edge models in the GPT4All Ecosystem. Please PR as the [community grows](https://huggingface.co/models?sort=modified&search=4bit).
+Feel free to convert this to a more structured table.
+
+- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin)
+- [gpt4all-unfiltered](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin)
+- [ggml-vicuna-7b-4bit](https://huggingface.co/eachadea/ggml-vicuna-7b-4bit)
+- [vicuna-13b-GPTQ-4bit-128g](https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g)
+- [LLaMa-Storytelling-4Bit](https://huggingface.co/GamerUntouch/LLaMa-Storytelling-4Bit)
+
+
 # Roadmap
 ## Short Term
  - <span style="color:green">(IN PROGRESS)</span> Train a GPT4All model based on GPTJ to alleviate llama distribution issues.

From 873a5588256a1767e3a629a67bc706beeb1f2aff Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Wed, 5 Apr 2023 13:03:17 -0400
Subject: [PATCH 33/45] Typescript bindings link

---
 README.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/README.md b/README.md
index 6bc2207e..05cfcb3c 100644
--- a/README.md
+++ b/README.md
@@ -10,12 +10,17 @@
 <a href="https://github.com/nomic-ai/pyllamacpp">:snake: Official Python Bindings</a>
 </p>
 
+<p align="center">
+<a href="https://github.com/nomic-ai/gpt4all-ts">:computer: Official Typescript Bindings</a>
+</p>
+
 <p align="center">
 <a href="https://discord.gg/mGZE39AS3e">Discord</a>
 </p>
 
 
 
+
 ![gpt4all-lora-demo](https://user-images.githubusercontent.com/13879686/228352356-de66ca7a-df70-474e-b929-2e3656165051.gif)
 
 Run on M1 Mac (not sped up!)

From f90831180e95f057706f493de7a7f9a674b08689 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Wed, 5 Apr 2023 13:15:23 -0400
Subject: [PATCH 34/45] Added MD5 signatures to ecosystem links.

---
 README.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 05cfcb3c..6a58fe9a 100644
--- a/README.md
+++ b/README.md
@@ -40,6 +40,7 @@ Here's how to get started with the CPU quantized GPT4All model checkpoint:
 For custom hardware compilation, see our [llama.cpp](https://github.com/zanussbaum/gpt4all.cpp) fork.
 
 -----------
+Find all compatible models in the GPT4All Ecosystem section.
 
 [Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) - [[Torrent]](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent)
 
@@ -90,8 +91,9 @@ You can pass any of the [huggingface generation config params](https://huggingfa
 Edge models in the GPT4All Ecosystem. Please PR as the [community grows](https://huggingface.co/models?sort=modified&search=4bit).
 Feel free to convert this to a more structured table.
 
-- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin)
-- [gpt4all-unfiltered](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin)
+- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
+   - [gpt4all-ggml-converted](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
+- [gpt4all-unfiltered](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.md5)]
 - [ggml-vicuna-7b-4bit](https://huggingface.co/eachadea/ggml-vicuna-7b-4bit)
 - [vicuna-13b-GPTQ-4bit-128g](https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g)
 - [LLaMa-Storytelling-4Bit](https://huggingface.co/GamerUntouch/LLaMa-Storytelling-4Bit)

From 5977a650e6c3f3aa475e6c307d1b0af75f87a678 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Wed, 5 Apr 2023 13:24:47 -0400
Subject: [PATCH 35/45] Typescript and Langchain bindings

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 6a58fe9a..14cd8af1 100644
--- a/README.md
+++ b/README.md
@@ -5,13 +5,12 @@
 <a href="https://s3.amazonaws.com/static.nomic.ai/gpt4all/2023_GPT4All_Technical_Report.pdf">:green_book: Technical Report</a>
 </p>
 
-
 <p align="center">
 <a href="https://github.com/nomic-ai/pyllamacpp">:snake: Official Python Bindings</a>
 </p>
 
 <p align="center">
-<a href="https://github.com/nomic-ai/gpt4all-ts">:computer: Official Typescript Bindings</a>
+<a href="https://github.com/nomic-ai/gpt4all-ts">:computer: Official Typescript Bindings</a> --- <a href="https://python.langchain.com/en/latest/modules/models/llms/integrations/gpt4all.html">Official Langchain Backend 🦜️🔗</a> 
 </p>
 
 <p align="center">
@@ -21,6 +20,7 @@
 
 
 
+
 ![gpt4all-lora-demo](https://user-images.githubusercontent.com/13879686/228352356-de66ca7a-df70-474e-b929-2e3656165051.gif)
 
 Run on M1 Mac (not sped up!)

From 828a3a67fa1871365e88ebe3e20c866c0e8784ce Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Wed, 5 Apr 2023 14:10:00 -0400
Subject: [PATCH 36/45] Formatting Update

---
 README.md | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 14cd8af1..9077fd5b 100644
--- a/README.md
+++ b/README.md
@@ -10,9 +10,14 @@
 </p>
 
 <p align="center">
-<a href="https://github.com/nomic-ai/gpt4all-ts">:computer: Official Typescript Bindings</a> --- <a href="https://python.langchain.com/en/latest/modules/models/llms/integrations/gpt4all.html">Official Langchain Backend 🦜️🔗</a> 
+<a href="https://github.com/nomic-ai/gpt4all-ts">:computer: Official Typescript Bindings</a>
 </p>
 
+<p align="center">
+<a href="https://python.langchain.com/en/latest/modules/models/llms/integrations/gpt4all.html">🦜️🔗 Official Langchain Backend</a> 
+</p>
+
+
 <p align="center">
 <a href="https://discord.gg/mGZE39AS3e">Discord</a>
 </p>

From 5fa260e18a94e74046bf132782f1bb6a9ec52cfb Mon Sep 17 00:00:00 2001
From: Ben Schmidt <bmschmidt@gmail.com>
Date: Thu, 6 Apr 2023 11:28:59 -0400
Subject: [PATCH 37/45] Add MIT license.

---
 LICENSE.txt | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
 create mode 100644 LICENSE.txt

diff --git a/LICENSE.txt b/LICENSE.txt
new file mode 100644
index 00000000..51aef442
--- /dev/null
+++ b/LICENSE.txt
@@ -0,0 +1,19 @@
+Copyright (c) 2023 Nomic, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

From 2052f459ed796060b380382cbded8123b8ccea08 Mon Sep 17 00:00:00 2001
From: Dillon Erb <585865+dte@users.noreply.github.com>
Date: Thu, 6 Apr 2023 18:11:05 -0400
Subject: [PATCH 38/45] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 31b943b8..cd7c43ee 100644
--- a/README.md
+++ b/README.md
@@ -98,7 +98,7 @@ You can pass any of the [huggingface generation config params](https://huggingfa
 Edge models in the GPT4All Ecosystem. Please PR as the [community grows](https://huggingface.co/models?sort=modified&search=4bit).
 Feel free to convert this to a more structured table.
 
-- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
+- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin.md5)]
    - [gpt4all-ggml-converted](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
 - [gpt4all-unfiltered](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.md5)]
 - [ggml-vicuna-7b-4bit](https://huggingface.co/eachadea/ggml-vicuna-7b-4bit)

From b8292dd7d0042cf54634ef63dd2495aa950ac214 Mon Sep 17 00:00:00 2001
From: MalikMAlna <mmikema.developer@gmail.com>
Date: Thu, 6 Apr 2023 19:56:49 -0400
Subject: [PATCH 39/45] Slight cleanup of superfluous comment and space after
 comma

---
 data.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/data.py b/data.py
index ff79924c..72dc4574 100644
--- a/data.py
+++ b/data.py
@@ -57,7 +57,6 @@ def load_data(config, tokenizer):
     dataset_path = config["dataset_path"]
 
     if os.path.exists(dataset_path):
-        # check if path is a directory
         if os.path.isdir(dataset_path):
             files = glob.glob(os.path.join(dataset_path, "*_clean.jsonl"))
         else:
@@ -68,7 +67,7 @@ def load_data(config, tokenizer):
         dataset = load_dataset("json", data_files=files, split="train")
 
     else:
-        dataset = load_dataset(dataset_path,split='train')
+        dataset = load_dataset(dataset_path, split='train')
 
     dataset = dataset.train_test_split(test_size=.05, seed=config["seed"])
 

From 334f36d8440b815afe5265b250c5f117f2b2c10d Mon Sep 17 00:00:00 2001
From: MalikMAlna <mmikema.developer@gmail.com>
Date: Thu, 6 Apr 2023 19:57:46 -0400
Subject: [PATCH 40/45] Slight cleanup of superfluous comment and space after
 commas

---
 data.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/data.py b/data.py
index 72dc4574..4457d93e 100644
--- a/data.py
+++ b/data.py
@@ -86,7 +86,7 @@ def load_data(config, tokenizer):
         **kwargs
     )
     val_dataset = val_dataset.map(
-        lambda ele: tokenize_inputs(config, tokenizer, ele), 
+        lambda ele: tokenize_inputs(config, tokenizer, ele),
         batched=True,
         remove_columns=["source", "prompt"],
         **kwargs

From 17fb6f668a2a9536c100016e6fe81fe375a6f13b Mon Sep 17 00:00:00 2001
From: MalikMAlna <mmikema.developer@gmail.com>
Date: Thu, 6 Apr 2023 20:07:08 -0400
Subject: [PATCH 41/45] Changing single to double quotes for quote consistency

---
 data.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/data.py b/data.py
index 4457d93e..e5a7fb14 100644
--- a/data.py
+++ b/data.py
@@ -31,7 +31,7 @@ def tokenize_inputs(config, tokenizer, examples):
 
         # add target tokens, remove bos
         input_ids[i, newline_plus_inputs: newline_plus_inputs + len(target_tokens)] = target_tokens
-        # add eos token, enforce stopping if we don't truncate 
+        # add eos token, enforce stopping if we don't truncate
         # we don't want long code to stop generating if truncated during training
         if newline_plus_inputs + len(target_tokens) < max_length:
             input_ids[i, newline_plus_inputs + len(target_tokens)] = tokenizer.eos_token_id
@@ -67,7 +67,7 @@ def load_data(config, tokenizer):
         dataset = load_dataset("json", data_files=files, split="train")
 
     else:
-        dataset = load_dataset(dataset_path, split='train')
+        dataset = load_dataset(dataset_path, split="train")
 
     dataset = dataset.train_test_split(test_size=.05, seed=config["seed"])
 

From 1195d09fba7dd51c14e15b7dfea6227ca75739e9 Mon Sep 17 00:00:00 2001
From: MalikMAlna <mmikema.developer@gmail.com>
Date: Thu, 6 Apr 2023 20:20:18 -0400
Subject: [PATCH 42/45] Rephrasing comment for clarity

---
 data.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/data.py b/data.py
index e5a7fb14..a83ed3d6 100644
--- a/data.py
+++ b/data.py
@@ -31,7 +31,7 @@ def tokenize_inputs(config, tokenizer, examples):
 
         # add target tokens, remove bos
         input_ids[i, newline_plus_inputs: newline_plus_inputs + len(target_tokens)] = target_tokens
-        # add eos token, enforce stopping if we don't truncate
+        # add eos token; ensure generation stops if inputs aren't truncated
         # we don't want long code to stop generating if truncated during training
         if newline_plus_inputs + len(target_tokens) < max_length:
             input_ids[i, newline_plus_inputs + len(target_tokens)] = tokenizer.eos_token_id

From 98da69119bd70ee4d1868b3ca4304f9ed0426866 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Fri, 7 Apr 2023 10:47:15 -0400
Subject: [PATCH 43/45] Update README.md

---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index 31b943b8..d231236a 100644
--- a/README.md
+++ b/README.md
@@ -104,6 +104,7 @@ Feel free to convert this to a more structured table.
 - [ggml-vicuna-7b-4bit](https://huggingface.co/eachadea/ggml-vicuna-7b-4bit)
 - [vicuna-13b-GPTQ-4bit-128g](https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g)
 - [LLaMa-Storytelling-4Bit](https://huggingface.co/GamerUntouch/LLaMa-Storytelling-4Bit)
+- [Alpaca Native 4bit](https://huggingface.co/Sosaka/Alpaca-native-4bit-ggml/tree/main)
 
 
 # Roadmap

From aa39e808335587176d2b98fb0584f4bc31c83b98 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Fri, 7 Apr 2023 10:50:02 -0400
Subject: [PATCH 44/45] Correct MD5 Hash

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index d231236a..b3545fe4 100644
--- a/README.md
+++ b/README.md
@@ -98,7 +98,7 @@ You can pass any of the [huggingface generation config params](https://huggingfa
 Edge models in the GPT4All Ecosystem. Please PR as the [community grows](https://huggingface.co/models?sort=modified&search=4bit).
 Feel free to convert this to a more structured table.
 
-- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
+- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin.md5)]
    - [gpt4all-ggml-converted](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
 - [gpt4all-unfiltered](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.md5)]
 - [ggml-vicuna-7b-4bit](https://huggingface.co/eachadea/ggml-vicuna-7b-4bit)

From a5110c81d984ded511f23063a353af5ffad6b813 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar <andriy.mulyar@gmail.com>
Date: Fri, 7 Apr 2023 13:53:47 -0400
Subject: [PATCH 45/45] Updated roadmap and links.

---
 README.md | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index b3545fe4..821eb0ff 100644
--- a/README.md
+++ b/README.md
@@ -13,6 +13,10 @@
 <a href="https://github.com/nomic-ai/gpt4all-ts">:computer: Official Typescript Bindings</a>
 </p>
 
+<p align="center">
+<a href="https://github.com/nomic-ai/gpt4all-ui">:speech_balloon: Official Chat Interface</a>
+</p>
+
 <p align="center">
 <a href="https://python.langchain.com/en/latest/modules/models/llms/integrations/gpt4all.html">🦜️🔗 Official Langchain Backend</a> 
 </p>
@@ -111,9 +115,9 @@ Feel free to convert this to a more structured table.
 ## Short Term
  - <span style="color:green">(IN PROGRESS)</span> Train a GPT4All model based on GPTJ to alleviate llama distribution issues.
  - <span style="color:green">(IN PROGRESS)</span> Create improved CPU and GPU interfaces for this model.
- - <span style="color:red">(NOT STARTED)</span> Integrate llama.cpp bindings
- - <span style="color:red">(NOT STARTED)</span> Create a good conversational chat interface for the model.
- - <span style="color:red">(NOT STARTED)</span> Allow users to opt in and submit their chats for subsequent training runs
+ - <span style="color:green">(Done)</span> [Integrate llama.cpp bindings](https://github.com/nomic-ai/pyllamacpp)
+ - <span style="color:green">(Done)</span> [Create a good conversational chat interface for the model.](https://github.com/nomic-ai/gpt4all-ui)
+ - <span style="color:green">(Done)</span> [Allow users to opt in and submit their chats for subsequent training runs](https://github.com/nomic-ai/gpt4all-ui)
 
 ## Medium Term
  - <span style="color:red">(NOT STARTED)</span> Integrate GPT4All with [Atlas](https://atlas.nomic.ai) to allow for document retrieval.