From 08badb61942fe37bb2c64223d315630ebb27c4b1 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Wed, 29 Mar 2023 17:13:55 -0400
Subject: [PATCH 01/45] Update README.md
---
README.md | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/README.md b/README.md
index 770a2806..68046118 100644
--- a/README.md
+++ b/README.md
@@ -28,6 +28,14 @@ Clone this repository down and place the quantized model in the `chat` directory
To compile for custom hardware, see our fork of the [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repo.
+-----------
+
+[Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin)
+
+This model had all refusal to answer responses removed from training. Try it with:
+- `cd chat;./gpt4all-lora-quantized-OSX-m1 -m gpt4all-lora-unfiltered-quantized.bin`
+
+-----------
Note: the full model on GPU (16GB of RAM required) performs much better in our qualitative evaluations.
# Reproducibility
From e7a73a1642b2ff1a2a3332f37b25828a2243b412 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Wed, 29 Mar 2023 17:18:21 -0400
Subject: [PATCH 02/45] Update README.md
---
README.md | 3 +++
1 file changed, 3 insertions(+)
diff --git a/README.md b/README.md
index 68046118..30cd7349 100644
--- a/README.md
+++ b/README.md
@@ -172,3 +172,6 @@ If you utilize this reposistory, models or data in a downstream project, please
### Alternative Download Locations
#### gpt4all-lora-quantized.bin Backup Torrent Link
magnet:?xt=urn:btih:1F11A9691EE06C18F0040E359361DCA0479BCB5A&dn=gpt4all-lora-quantized.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce
+
+### Unfiltered
+https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent
From bbf589d06770d24ec195e78baf5893556df7a811 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Wed, 29 Mar 2023 17:18:46 -0400
Subject: [PATCH 03/45] Update README.md
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 30cd7349..f21c788b 100644
--- a/README.md
+++ b/README.md
@@ -173,5 +173,5 @@ If you utilize this reposistory, models or data in a downstream project, please
#### gpt4all-lora-quantized.bin Backup Torrent Link
magnet:?xt=urn:btih:1F11A9691EE06C18F0040E359361DCA0479BCB5A&dn=gpt4all-lora-quantized.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce
-### Unfiltered
+#### Unfiltered Checkpoint Torrent Link
https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent
From 614d4fab184c95c279cce2f66adafb30475efbe9 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Wed, 29 Mar 2023 17:22:05 -0400
Subject: [PATCH 04/45] Update README.md
---
README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index f21c788b..708f7616 100644
--- a/README.md
+++ b/README.md
@@ -170,8 +170,8 @@ If you utilize this reposistory, models or data in a downstream project, please
```
### Alternative Download Locations
-#### gpt4all-lora-quantized.bin Backup Torrent Link
+#### gpt4all-lora-quantized.bin Torrent Link
magnet:?xt=urn:btih:1F11A9691EE06C18F0040E359361DCA0479BCB5A&dn=gpt4all-lora-quantized.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce
-#### Unfiltered Checkpoint Torrent Link
+#### gpt4all-lora-unfiltered-quantized.bin Torrent Link
https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent
From 248f4fc324af8a5a8b6f91e4cc9a1558e10f974f Mon Sep 17 00:00:00 2001
From: Brandon Duderstadt
Date: Wed, 29 Mar 2023 22:36:43 -0400
Subject: [PATCH 05/45] Update README.md
---
README.md | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/README.md b/README.md
index 708f7616..a8a13c43 100644
--- a/README.md
+++ b/README.md
@@ -155,7 +155,23 @@ python generate.py --config configs/generate/generate.yaml --prompt "Write a scr
### What is a three word topic describing the following keywords: baseball, football, soccer:
>Sports, athletics, games
+### GPU Interface
+There are two ways to get up and running with this model on GPU.
+1. clone the nomic client [repo](https://github.com/nomic-ai/nomic) and run `pip install .[GPT4All]` in the home dir.
+2. run `pip install nomic` and install the additional deps from the wheels built [here](https://github.com/nomic-ai/nomic/tree/main/bin)
+Once this is done, you can run the model on GPU with a script like the following:
+```
+from nomic import GPT4AllGPU
+m = GPT4AllGPU(LLAMA_PATH)
+config = {'num_beams': 2,
+ 'min_new_tokens': 10,
+ 'max_length': 100,
+ 'repetition_penalty': 2.0}
+out = m.generate('write me a story about a lonely computer', config)
+print(out)
+```
+You can pass any of the [huggingface generation config params](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig) in the config.
If you utilize this reposistory, models or data in a downstream project, please consider citing it with:
```
From 4d282b00ad349e8822cb515ac13360cc5ad4faa6 Mon Sep 17 00:00:00 2001
From: Feldwor
Date: Thu, 30 Mar 2023 16:56:12 +0300
Subject: [PATCH 06/45] Update README.md - Move Torrent/Magnet links to save
space in the readme file.
---
README.md | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/README.md b/README.md
index a8a13c43..2bc31acd 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ Run on M1 Mac (not sped up!)
# Try it yourself
-Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin).
+Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized)
Clone this repository down and place the quantized model in the `chat` directory and start chatting by running:
@@ -30,7 +30,7 @@ To compile for custom hardware, see our fork of the [Alpaca C++](https://github.
-----------
-[Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin)
+[Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) - [[Torrent]](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent)
This model had all refusal to answer responses removed from training. Try it with:
- `cd chat;./gpt4all-lora-quantized-OSX-m1 -m gpt4all-lora-unfiltered-quantized.bin`
@@ -184,10 +184,3 @@ If you utilize this reposistory, models or data in a downstream project, please
howpublished = {\url{https://github.com/nomic-ai/gpt4all}},
}
```
-
-### Alternative Download Locations
-#### gpt4all-lora-quantized.bin Torrent Link
-magnet:?xt=urn:btih:1F11A9691EE06C18F0040E359361DCA0479BCB5A&dn=gpt4all-lora-quantized.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce
-
-#### gpt4all-lora-unfiltered-quantized.bin Torrent Link
-https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent
From 6ce06359e928edae8ecd9515c14d800b058ce2c4 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Thu, 30 Mar 2023 10:30:50 -0400
Subject: [PATCH 07/45] Updated training data link
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index a8a13c43..951fce63 100644
--- a/README.md
+++ b/README.md
@@ -45,7 +45,7 @@ Trained LoRa Weights:
- gpt4all-lora-epoch-2 (three full epochs of training) https://huggingface.co/nomic-ai/gpt4all-lora-epoch-2
Raw Data:
-- [Training Data Without P3](https://s3.amazonaws.com/static.nomic.ai/gpt4all/2022_03_27/gpt4all_curated_data_without_p3_2022_03_27.tar.gz)
+- [Training Data Without P3](https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations)
- Explorer: https://atlas.nomic.ai/map/gpt4all_data_clean_without_p3
- [Full Dataset with P3](https://s3.amazonaws.com/static.nomic.ai/gpt4all/2022_03_27/gpt4all_curated_data_full_2022_03_27.tar.gz)
- Explorer: https://atlas.nomic.ai/map/gpt4all_data_clean
From 0536aa9c54be51c3f43a44833584471ed4fbbf06 Mon Sep 17 00:00:00 2001
From: Feldwor
Date: Thu, 30 Mar 2023 17:32:17 +0300
Subject: [PATCH 08/45] Update README.md - Improve the Try it yourself section.
---
README.md | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/README.md b/README.md
index 2bc31acd..7dea0f12 100644
--- a/README.md
+++ b/README.md
@@ -16,17 +16,17 @@ Run on M1 Mac (not sped up!)
# Try it yourself
-Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized)
+Here's how to get started with the CPU quantized gpt4all model checkpoint:
+1. Download the `gpt4all-lora-quantized.bin` file from [Direct Link](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) or [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized).
+2. Clone this repository, navigate to `chat`, and place the downloaded file there.
+3. Run the appropriate command for your OS:
+ - M1 Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-m1`
+ - Linux: `cd chat;./gpt4all-lora-quantized-linux-x86`
+ - Windows (PowerShell): `cd chat;./gpt4all-lora-quantized-win64.exe`
+ - Intel Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-intel`
-Clone this repository down and place the quantized model in the `chat` directory and start chatting by running:
-
-- `cd chat;./gpt4all-lora-quantized-OSX-m1` on M1 Mac/OSX
-- `cd chat;./gpt4all-lora-quantized-linux-x86` on Linux
-- `cd chat;./gpt4all-lora-quantized-win64.exe` on Windows (PowerShell)
-- `cd chat;./gpt4all-lora-quantized-OSX-intel` on Intel Mac/OSX
-
-To compile for custom hardware, see our fork of the [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repo.
+For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repository.
-----------
From 0a552594243f0523b536bd854c53af7a4ab8c69b Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Thu, 30 Mar 2023 10:32:52 -0400
Subject: [PATCH 09/45] Torrent Magnet Link Update
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 9a0dc33b..8cc4515f 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ Run on M1 Mac (not sped up!)
# Try it yourself
-Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized)
+Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](magnet:?xt=urn:btih:EE5150157050CB5D1979669A1EA14FC2C4C3692E&dn=gpt4all-lora-quantized.bin&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a80%2fannounce&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce)
Clone this repository down and place the quantized model in the `chat` directory and start chatting by running:
From 741e52a886a71bef9585044db41786d87bb07459 Mon Sep 17 00:00:00 2001
From: Feldwor
Date: Thu, 30 Mar 2023 17:40:43 +0300
Subject: [PATCH 10/45] Update README.md - Fix GitHub Markdown does not
recognize Torrent Magnets.
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 8cc4515f..9a0dc33b 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ Run on M1 Mac (not sped up!)
# Try it yourself
-Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](magnet:?xt=urn:btih:EE5150157050CB5D1979669A1EA14FC2C4C3692E&dn=gpt4all-lora-quantized.bin&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a80%2fannounce&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce)
+Download the CPU quantized gpt4all model checkpoint: [gpt4all-lora-quantized.bin](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) - [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized)
Clone this repository down and place the quantized model in the `chat` directory and start chatting by running:
From 2ad7cf7ba6e30b050288a03c8dbe464ad2c5182f Mon Sep 17 00:00:00 2001
From: bstadt
Date: Thu, 30 Mar 2023 11:10:07 -0400
Subject: [PATCH 11/45] added roadmap
---
README.md | 70 +++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 53 insertions(+), 17 deletions(-)
diff --git a/README.md b/README.md
index 8cc4515f..52a68912 100644
--- a/README.md
+++ b/README.md
@@ -38,6 +38,58 @@ This model had all refusal to answer responses removed from training. Try it wit
-----------
Note: the full model on GPU (16GB of RAM required) performs much better in our qualitative evaluations.
+# Python Client
+## CPU Interface
+To get running using the python client with the CPU interface, first install the [nomic client](https://github.com/nomic-ai/nomic) using `pip install nomic`
+Then, you can use the following script to interact with GPU4All:
+```
+from nomic import GPT4All
+m = GPT4All()
+m.connect()
+m.prompt('write me a story about a lonely computer')
+```
+
+## GPU Interface
+There are two ways to get up and running with this model on GPU.
+The setup here is slightly more involved than the CPU model.
+1. clone the nomic client [repo](https://github.com/nomic-ai/nomic) and run `pip install .[GPT4All]` in the home dir.
+2. run `pip install nomic` and install the additional deps from the wheels built [here](https://github.com/nomic-ai/nomic/tree/main/bin)
+
+Once this is done, you can run the model on GPU with a script like the following:
+```
+from nomic import GPT4AllGPU
+m = GPT4AllGPU(LLAMA_PATH)
+config = {'num_beams': 2,
+ 'min_new_tokens': 10,
+ 'max_length': 100,
+ 'repetition_penalty': 2.0}
+out = m.generate('write me a story about a lonely computer', config)
+print(out)
+```
+Where LLAMA_PATH is the path to a Huggingface Automodel compliant LLAMA model.
+Nomic is unable to distribute this file at this time.
+We are working on a GPT4All that does not have this limitation right now.
+
+You can pass any of the [huggingface generation config params](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig) in the config.
+
+# Roadmap
+## Short Term
+ - (IN PROGRESS) Train a GPT4All model based on GPTJ to alleviate llama distribution issues.
+ - (IN PROGRESS) Create improved CPU and GPU interfaces for this model.
+ - (NOT STARTED) Integrate llama.cpp bindings
+ - (NOT STARTED) Create a good conversational chat interface for the model.
+ - (NOT STARTED) Allow users to opt in and submit their chats for subsequent training runs
+
+## Medium Term
+ - (NOT STARTED) Integrate GPT4All with [Atlas](https://atlas.nomic.ai) to allow for document retrieval.
+ - BLOCKED by GPT4All based on GPTJ
+ - (NOT STARTED) Integrate GPT4All with Langchain.
+ - (NOT STARTED) Build easy custom training scripts to allow users to fine tune models.
+
+## Long Term
+ - (NOT STARTED) Allow anyone to curate training data for subsequent GPT4All releases using Atlas.
+ - (IN PROGRESS) Democratize AI.
+
# Reproducibility
Trained LoRa Weights:
@@ -155,23 +207,7 @@ python generate.py --config configs/generate/generate.yaml --prompt "Write a scr
### What is a three word topic describing the following keywords: baseball, football, soccer:
>Sports, athletics, games
-### GPU Interface
-There are two ways to get up and running with this model on GPU.
-1. clone the nomic client [repo](https://github.com/nomic-ai/nomic) and run `pip install .[GPT4All]` in the home dir.
-2. run `pip install nomic` and install the additional deps from the wheels built [here](https://github.com/nomic-ai/nomic/tree/main/bin)
-
-Once this is done, you can run the model on GPU with a script like the following:
-```
-from nomic import GPT4AllGPU
-m = GPT4AllGPU(LLAMA_PATH)
-config = {'num_beams': 2,
- 'min_new_tokens': 10,
- 'max_length': 100,
- 'repetition_penalty': 2.0}
-out = m.generate('write me a story about a lonely computer', config)
-print(out)
-```
-You can pass any of the [huggingface generation config params](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig) in the config.
+## Citation
If you utilize this reposistory, models or data in a downstream project, please consider citing it with:
```
From 8ac7c1a9fe623af38413519c06f05f46fb3358dd Mon Sep 17 00:00:00 2001
From: Ikko Eltociear Ashimine
Date: Fri, 31 Mar 2023 00:53:53 +0900
Subject: [PATCH 12/45] Fix typo in TRAINING_LOG.md
Conditonal -> Conditional
---
TRAINING_LOG.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/TRAINING_LOG.md b/TRAINING_LOG.md
index 31b9bb21..50469645 100644
--- a/TRAINING_LOG.md
+++ b/TRAINING_LOG.md
@@ -160,7 +160,7 @@ We realized that we had two bugs however:
- We accidentally duplicated data and effectively trained for 2 epochs instead of 1
- We added an eos token to every sequence, even those that we truncated (e.g. long code that exceeds the 1024).
-## Conditonal EOS and 1 Epoch
+## Conditional EOS and 1 Epoch
Using the same parameters, we then trained a model using a "conditional" eos token where we only add an `eos` when the inputs are less than the maximum sequence length for one epoch.
From 8c9c02e42b9d297ca4f35e251f087d7a51a8c0c5 Mon Sep 17 00:00:00 2001
From: bstadt
Date: Thu, 30 Mar 2023 12:32:14 -0400
Subject: [PATCH 13/45] updated roadmap
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 52a68912..fd9cfdc0 100644
--- a/README.md
+++ b/README.md
@@ -84,7 +84,7 @@ You can pass any of the [huggingface generation config params](https://huggingfa
- (NOT STARTED) Integrate GPT4All with [Atlas](https://atlas.nomic.ai) to allow for document retrieval.
- BLOCKED by GPT4All based on GPTJ
- (NOT STARTED) Integrate GPT4All with Langchain.
- - (NOT STARTED) Build easy custom training scripts to allow users to fine tune models.
+ - (IN PROGRESS) Build easy custom training scripts to allow users to fine tune models.
## Long Term
- (NOT STARTED) Allow anyone to curate training data for subsequent GPT4All releases using Atlas.
From 40ea0a74d01d78e8f30efe7e66ceee4a9416fb5d Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Thu, 30 Mar 2023 12:54:28 -0400
Subject: [PATCH 14/45] Huggingface Datasets link
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 351a8de9..d33c9c96 100644
--- a/README.md
+++ b/README.md
@@ -99,7 +99,7 @@ Trained LoRa Weights:
Raw Data:
- [Training Data Without P3](https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations)
- Explorer: https://atlas.nomic.ai/map/gpt4all_data_clean_without_p3
-- [Full Dataset with P3](https://s3.amazonaws.com/static.nomic.ai/gpt4all/2022_03_27/gpt4all_curated_data_full_2022_03_27.tar.gz)
+- [Full Dataset with P3](https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations_with_p3)
- Explorer: https://atlas.nomic.ai/map/gpt4all_data_clean
We are not distributing a LLaMa 7B checkpoint.
From de0f8602ca7799783ef25be1d7851ad3e41edd2b Mon Sep 17 00:00:00 2001
From: Benjamin Schmidt
Date: Thu, 30 Mar 2023 13:46:03 -0400
Subject: [PATCH 15/45] Update README.md
---
README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index d33c9c96..b0c53fc3 100644
--- a/README.md
+++ b/README.md
@@ -43,7 +43,7 @@ Note: the full model on GPU (16GB of RAM required) performs much better in our q
To get running using the python client with the CPU interface, first install the [nomic client](https://github.com/nomic-ai/nomic) using `pip install nomic`
Then, you can use the following script to interact with GPU4All:
```
-from nomic import GPT4All
+from nomic.gpt4all import GPT4All
m = GPT4All()
m.connect()
m.prompt('write me a story about a lonely computer')
@@ -57,7 +57,7 @@ The setup here is slightly more involved than the CPU model.
Once this is done, you can run the model on GPU with a script like the following:
```
-from nomic import GPT4AllGPU
+from nomic.gpt4all import GPT4AllGPU
m = GPT4AllGPU(LLAMA_PATH)
config = {'num_beams': 2,
'min_new_tokens': 10,
From 377a09fdedfe3fbedac45ab0223ab7280cc26847 Mon Sep 17 00:00:00 2001
From: Benjamin Schmidt
Date: Thu, 30 Mar 2023 13:47:04 -0400
Subject: [PATCH 16/45] Update README.md
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index b0c53fc3..01de1072 100644
--- a/README.md
+++ b/README.md
@@ -45,7 +45,7 @@ Then, you can use the following script to interact with GPU4All:
```
from nomic.gpt4all import GPT4All
m = GPT4All()
-m.connect()
+m.open()
m.prompt('write me a story about a lonely computer')
```
From 5c9b1817899af2ad4932dfe4e4fddbde0e2797e8 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 30 Mar 2023 21:51:40 +0200
Subject: [PATCH 17/45] Fix typo
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 01de1072..fbcd5b76 100644
--- a/README.md
+++ b/README.md
@@ -41,7 +41,7 @@ Note: the full model on GPU (16GB of RAM required) performs much better in our q
# Python Client
## CPU Interface
To get running using the python client with the CPU interface, first install the [nomic client](https://github.com/nomic-ai/nomic) using `pip install nomic`
-Then, you can use the following script to interact with GPU4All:
+Then, you can use the following script to interact with GPT4All:
```
from nomic.gpt4all import GPT4All
m = GPT4All()
From 495effae7ba7c3b0b93af5ab445fb9f50e115173 Mon Sep 17 00:00:00 2001
From: Yuvanesh-ux <68208096+Yuvanesh-ux@users.noreply.github.com>
Date: Thu, 30 Mar 2023 17:53:24 -0400
Subject: [PATCH 18/45] Update README.md
---
README.md | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/README.md b/README.md
index fbcd5b76..e17c65fe 100644
--- a/README.md
+++ b/README.md
@@ -138,6 +138,10 @@ accelerate launch --dynamo_backend=inductor --num_processes=8 --num_machines=1 -
python generate.py --config configs/generate/generate.yaml --prompt "Write a script to reverse a string in Python"
```
+## Need Help?
+
+Join the Discord and ask for help in `#gpt4all-help`
+
# Sample Generations
### Provide instructions for the given exercise. Leg Raises
From 632c44b606a9b8d2580a84120f251bfa91fad6f7 Mon Sep 17 00:00:00 2001
From: Sajjad
Date: Fri, 31 Mar 2023 02:48:14 -0500
Subject: [PATCH 19/45] Update README.md unfiltered.bin Instructions
Added terminal commands to run gpt4all-lora-unfiltered-quantized.bin on Mac, Windows, Linux, Intel OS
---
README.md | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index e17c65fe..15a4baaf 100644
--- a/README.md
+++ b/README.md
@@ -33,8 +33,11 @@ For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussb
[Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) - [[Torrent]](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent)
This model had all refusal to answer responses removed from training. Try it with:
-- `cd chat;./gpt4all-lora-quantized-OSX-m1 -m gpt4all-lora-unfiltered-quantized.bin`
-
+- ``
+- M1 Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-m1 -m gpt4all-lora-unfiltered-quantized.bin`
+- Linux: `cd chat;./gpt4all-lora-quantized-linux-x86 -m gpt4all-lora-unfiltered-quantized.bin`
+- Windows (PowerShell): `cd chat;./gpt4all-lora-quantized-win64.exe -m gpt4all-lora-unfiltered-quantized.bin`
+- Intel Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-intel -m gpt4all-lora-unfiltered-quantized.bin`
-----------
Note: the full model on GPU (16GB of RAM required) performs much better in our qualitative evaluations.
From 64c346fb08cf2c617ceacd0d96468cae5bc9969a Mon Sep 17 00:00:00 2001
From: Sajjad
Date: Fri, 31 Mar 2023 02:50:02 -0500
Subject: [PATCH 20/45] Update README.md
removed extra line: ``
---
README.md | 1 -
1 file changed, 1 deletion(-)
diff --git a/README.md b/README.md
index 15a4baaf..a7c635a6 100644
--- a/README.md
+++ b/README.md
@@ -33,7 +33,6 @@ For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussb
[Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) - [[Torrent]](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent)
This model had all refusal to answer responses removed from training. Try it with:
-- ``
- M1 Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-m1 -m gpt4all-lora-unfiltered-quantized.bin`
- Linux: `cd chat;./gpt4all-lora-quantized-linux-x86 -m gpt4all-lora-unfiltered-quantized.bin`
- Windows (PowerShell): `cd chat;./gpt4all-lora-quantized-win64.exe -m gpt4all-lora-unfiltered-quantized.bin`
From 6a9b3fc3f7aabd1cce0fb569e612fe739da8b046 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Fri, 31 Mar 2023 12:29:38 -0400
Subject: [PATCH 21/45] Update README.md
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index e17c65fe..02f55f92 100644
--- a/README.md
+++ b/README.md
@@ -26,7 +26,7 @@ Here's how to get started with the CPU quantized gpt4all model checkpoint:
- Windows (PowerShell): `cd chat;./gpt4all-lora-quantized-win64.exe`
- Intel Mac/OSX: `cd chat;./gpt4all-lora-quantized-OSX-intel`
-For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repository.
+For custom hardware compilation, see our [llama.cpp](https://github.com/zanussbaum/gpt4all.cpp) fork.
-----------
From e07985e83ca2d3e4c9fc28669cd41d3055f9eae3 Mon Sep 17 00:00:00 2001
From: ParisNeo
Date: Sat, 1 Apr 2023 01:16:16 +0200
Subject: [PATCH 22/45] Added vscode files to gitignore
---
.gitignore | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/.gitignore b/.gitignore
index 8addd972..02ba78ce 100644
--- a/.gitignore
+++ b/.gitignore
@@ -161,4 +161,8 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
\ No newline at end of file
+#.idea/
+
+
+# vs code
+.vscode
\ No newline at end of file
From bc99eabfa1c5d1db4fa9f5e21256eb5d2871c67f Mon Sep 17 00:00:00 2001
From: ParisNeo
Date: Sat, 1 Apr 2023 01:35:50 +0200
Subject: [PATCH 23/45] added *.bin to the gitignore
---
.gitignore | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/.gitignore b/.gitignore
index 02ba78ce..14e10a78 100644
--- a/.gitignore
+++ b/.gitignore
@@ -165,4 +165,5 @@ cython_debug/
# vs code
-.vscode
\ No newline at end of file
+.vscode
+*.bin
\ No newline at end of file
From c0b3de38140e62c0d6f7a7236de45ea1dd9d3c66 Mon Sep 17 00:00:00 2001
From: HiraduNakamura <127570430+HiraduNakamura@users.noreply.github.com>
Date: Fri, 31 Mar 2023 20:26:09 -0400
Subject: [PATCH 24/45] Made capitalization consistent
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 02f55f92..d62c875e 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ Run on M1 Mac (not sped up!)
# Try it yourself
-Here's how to get started with the CPU quantized gpt4all model checkpoint:
+Here's how to get started with the CPU quantized GPT4All model checkpoint:
1. Download the `gpt4all-lora-quantized.bin` file from [Direct Link](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) or [[Torrent-Magnet]](https://tinyurl.com/gpt4all-lora-quantized).
2. Clone this repository, navigate to `chat`, and place the downloaded file there.
From 2dfef9741a99405290d5f7fb596fe2b032e7345c Mon Sep 17 00:00:00 2001
From: gourcetools <120996278+gourcetools@users.noreply.github.com>
Date: Sat, 1 Apr 2023 17:30:40 +0200
Subject: [PATCH 25/45] Create launcher.sh
The script detects the user's operating system, lists available .bin files and prompts the user to select a .bin file to run.
Ensuring a more user-friendly experience.
---
launcher.sh | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
create mode 100644 launcher.sh
diff --git a/launcher.sh b/launcher.sh
new file mode 100644
index 00000000..ed7b99cd
--- /dev/null
+++ b/launcher.sh
@@ -0,0 +1,88 @@
+#!/bin/bash
+
+# Display header
+echo "=========================================================="
+echo " ██████ ██████ ████████ ██ ██ █████ ██ ██ "
+echo "██ ██ ██ ██ ██ ██ ██ ██ ██ ██ "
+echo "██ ███ ██████ ██ ███████ ███████ ██ ██ "
+echo "██ ██ ██ ██ ██ ██ ██ ██ ██ "
+echo " ██████ ██ ██ ██ ██ ██ ███████ ███████ "
+echo " └─> https://github.com/nomic-ai/gpt4all"
+
+# Function to detect macOS architecture and set the binary filename
+detect_mac_arch() {
+ local mac_arch
+ mac_arch=$(uname -m)
+ case "$mac_arch" in
+ arm64)
+ os_type="M1 Mac/OSX"
+ binary_filename="gpt4all-lora-quantized-OSX-m1"
+ ;;
+ x86_64)
+ os_type="Intel Mac/OSX"
+ binary_filename="gpt4all-lora-quantized-OSX-intel"
+ ;;
+ *)
+ echo "Unknown macOS architecture"
+ exit 1
+ ;;
+ esac
+}
+
+# Detect operating system and set the binary filename
+case "$(uname -s)" in
+ Darwin*)
+ detect_mac_arch
+ ;;
+ Linux*)
+ if grep -q Microsoft /proc/version; then
+ os_type="Windows (WSL)"
+ binary_filename="gpt4all-lora-quantized-win64.exe"
+ else
+ os_type="Linux"
+ binary_filename="gpt4all-lora-quantized-linux-x86"
+ fi
+ ;;
+ CYGWIN*|MINGW32*|MSYS*|MINGW*)
+ os_type="Windows (Cygwin/MSYS/MINGW)"
+ binary_filename="gpt4all-lora-quantized-win64.exe"
+ ;;
+ *)
+ echo "Unknown operating system"
+ exit 1
+ ;;
+esac
+echo "================================"
+echo "== You are using $os_type."
+
+
+# Change to the chat directory
+cd chat
+
+# List .bin files and prompt user to select one
+bin_files=(*.bin)
+echo "== Available .bin files:"
+for i in "${!bin_files[@]}"; do
+ echo " [$((i+1))] ${bin_files[i]}"
+done
+
+# Function to get user input and validate it
+get_valid_user_input() {
+ local input_valid=false
+
+ while ! $input_valid; do
+ echo "==> Please enter a number:"
+ read -r user_selection
+ if [[ $user_selection =~ ^[0-9]+$ ]] && (( user_selection >= 1 && user_selection <= ${#bin_files[@]} )); then
+ input_valid=true
+ else
+ echo "Invalid input. Please enter a number between 1 and ${#bin_files[@]}."
+ fi
+ done
+}
+
+get_valid_user_input
+selected_bin_file="${bin_files[$((user_selection-1))]}"
+
+# Run the selected .bin file with the appropriate command
+./"$binary_filename" -m "$selected_bin_file"
From 157bb8f60218e05cb06a682fe3257f28aeef97a5 Mon Sep 17 00:00:00 2001
From: Wayner Barrios
Date: Sat, 1 Apr 2023 23:52:25 -0400
Subject: [PATCH 26/45] DatasetDict to dataset object.
---
data.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/data.py b/data.py
index 0e356f7d..ff79924c 100644
--- a/data.py
+++ b/data.py
@@ -68,7 +68,7 @@ def load_data(config, tokenizer):
dataset = load_dataset("json", data_files=files, split="train")
else:
- dataset = load_dataset(dataset_path)
+ dataset = load_dataset(dataset_path,split='train')
dataset = dataset.train_test_split(test_size=.05, seed=config["seed"])
From 0e74c707945a163c054be56ce371bbeecc566c35 Mon Sep 17 00:00:00 2001
From: Chris Messina
Date: Sat, 1 Apr 2023 21:24:19 -0700
Subject: [PATCH 27/45] Update README.md
Type and formatting improvements.
---
README.md | 66 ++++++++++++++++++++++++++++---------------------------
1 file changed, 34 insertions(+), 32 deletions(-)
diff --git a/README.md b/README.md
index 02f55f92..bf3b8971 100644
--- a/README.md
+++ b/README.md
@@ -168,7 +168,8 @@ Join the Discord and ask for help in
### Come up with an interesting idea for a new movie plot. Your plot should be described with a title and a summary.
->1. The title of your movie plot should be "The Last Stand". 2. The summary of your movie plot should be: A group of mercenaries are hired to protect a small town from a dangerous criminal who is on the run. However, the town's sheriff is not willing to let them take over and decides to take matters into his own hands.
+> 1. The title of your movie plot should be "The Last Stand".
+> 2. The summary of your movie plot should be: A group of mercenaries are hired to protect a small town from a dangerous criminal who is on the run. However, the town's sheriff is not willing to let them take over and decides to take matters into his own hands.
### Reverse a string in python.
> In Python, you can reverse a list or tuple by using the reversed() function on it. Here's an example of how to use this method with strings:
@@ -178,42 +179,43 @@ Join the Discord and ask for help in
> print(reversed_str) # Output will be reversed version of the inputted string.
### List 10 dogs.
->1. Labrador Retriever
->2. Golden Retriever
->3. Beagle
->4. Poodle
->5. German Shepherd
->6. Bulldog
->7. Rottweiler
->8. Boxer
->9. Shih Tzu
->10. Pug
-
-### Write me a poem about the fall of Julius Ceasar into a ceasar salad in iambic pentameter.
->The fall of Julius Ceasar into a ceasar salad
->In iambic pentameter,
->The mighty Roman emperor
->Fell into a salad of lettuce and croutons,
->His empire crumbling around him,
->As he was devoured by the hungry diners.
->The once mighty emperor
->Was now just a salad topping,
->His legacy reduced to a mere garnish,
->As he was consumed by the masses.
->The fall of Julius Ceasar into a ceasar salad,
->In iambic pentameter,
->Was a tragic end to a great man,
->Who had once ruled the world with an iron fist.
->But now, he was just a salad topping,
->His legacy reduced to a mere garnish,
->As he was consumed by the masses.
+> 1. Labrador Retriever
+> 2. Golden Retriever
+> 3. Beagle
+> 4. Poodle
+> 5. German Shepherd
+> 6. Bulldog
+> 7. Rottweiler
+> 8. Boxer
+> 9. Shih Tzu
+> 10. Pug
+
+### Write me a poem about the fall of Julius Caesar into a caesar salad in iambic pentameter.
+
+> The fall of Julius Caesar into a caesar salad
+> In iambic pentameter,
+> The mighty Roman emperor
+> Fell into a salad of lettuce and croutons,
+> His empire crumbling around him,
+> As he was devoured by the hungry diners.
+> The once mighty emperor
+> Was now just a salad topping,
+> His legacy reduced to a mere garnish,
+> As he was consumed by the masses.
+> The fall of Julius Caesar into a caesar salad,
+> In iambic pentameter,
+> Was a tragic end to a great man,
+> Who had once ruled the world with an iron fist.
+> But now, he was just a salad topping,
+> His legacy reduced to a mere garnish,
+> As he was consumed by the masses.
### What is a three word topic describing the following keywords: baseball, football, soccer:
->Sports, athletics, games
+> Sports, athletics, games
## Citation
-If you utilize this reposistory, models or data in a downstream project, please consider citing it with:
+If you utilize this repository, models or data in a downstream project, please consider citing it with:
```
@misc{gpt4all,
author = {Yuvanesh Anand and Zach Nussbaum and Brandon Duderstadt and Benjamin Schmidt and Andriy Mulyar},
From 4af3bccff176b1cf914fc50dfd202080d0ad1ac4 Mon Sep 17 00:00:00 2001
From: Jo Liss
Date: Sun, 2 Apr 2023 19:19:02 +0300
Subject: [PATCH 28/45] Fix `git submodule` instructions
---
README.md | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index 02f55f92..434ac75d 100644
--- a/README.md
+++ b/README.md
@@ -110,9 +110,10 @@ You can reproduce our trained model by doing the following:
Clone the repo
-`git clone --recurse-submodules https://github.com/nomic-ai/gpt4all.git`
-
-`git submodule configure && git submodule update`
+```
+git clone --recurse-submodules https://github.com/nomic-ai/gpt4all.git
+git submodule update --init
+```
Setup the environment
From a782025ab604a93efad46f8693695329c774bb30 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Mon, 3 Apr 2023 01:50:43 -0400
Subject: [PATCH 29/45] Updated Python Bindings
---
README.md | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/README.md b/README.md
index 02f55f92..e3301f7a 100644
--- a/README.md
+++ b/README.md
@@ -4,6 +4,12 @@
:green_book: Technical Report
+
+
+
+:snake: Official Python Bindings
+
+
Discord
@@ -40,6 +46,9 @@ Note: the full model on GPU (16GB of RAM required) performs much better in our q
# Python Client
## CPU Interface
+To run GPT4all in python, see the new [official Python bindings](https://github.com/nomic-ai/pyllamacpp).
+
+The old bindings are still available but now deprecated. They will not work in a notebook environment.
To get running using the python client with the CPU interface, first install the [nomic client](https://github.com/nomic-ai/nomic) using `pip install nomic`
Then, you can use the following script to interact with GPT4All:
```
From 4802a72f52036d05eb13fbf1431dc514eed9cdb2 Mon Sep 17 00:00:00 2001
From: Malik M Alnakhaleh
Date: Mon, 3 Apr 2023 20:09:51 -0400
Subject: [PATCH 30/45] Update README.md
Fixing punctuation and capitalization to maintain consistency within the README file.
---
README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index e3301f7a..bda9c4f6 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
GPT4All
-Demo, data and code to train an assistant-style large language model with ~800k GPT-3.5-Turbo Generations based on LLaMa
+Demo, data, and code to train an assistant-style large language model with ~800k GPT-3.5-Turbo Generations based on LLaMa
:green_book: Technical Report
@@ -46,7 +46,7 @@ Note: the full model on GPU (16GB of RAM required) performs much better in our q
# Python Client
## CPU Interface
-To run GPT4all in python, see the new [official Python bindings](https://github.com/nomic-ai/pyllamacpp).
+To run GPT4All in python, see the new [official Python bindings](https://github.com/nomic-ai/pyllamacpp).
The old bindings are still available but now deprecated. They will not work in a notebook environment.
To get running using the python client with the CPU interface, first install the [nomic client](https://github.com/nomic-ai/nomic) using `pip install nomic`
From 467dc5bc7ed7993a875a0f49f35cb22db9c4a815 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Tue, 4 Apr 2023 23:23:34 -0400
Subject: [PATCH 31/45] Discord Link
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 8b2e6327..cf63a86d 100644
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@
-Discord
+Discord
From d617c54a5eb31a004acc56591a495f674de8db79 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Wed, 5 Apr 2023 12:48:54 -0400
Subject: [PATCH 32/45] GPT4All Compatibility Ecosystem
---
README.md | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/README.md b/README.md
index cf63a86d..6bc2207e 100644
--- a/README.md
+++ b/README.md
@@ -81,6 +81,17 @@ We are working on a GPT4All that does not have this limitation right now.
You can pass any of the [huggingface generation config params](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig) in the config.
+# GPT4All Compatibility Ecosystem
+Edge models in the GPT4All Ecosystem. Please PR as the [community grows](https://huggingface.co/models?sort=modified&search=4bit).
+Feel free to convert this to a more structured table.
+
+- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin)
+- [gpt4all-unfiltered](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin)
+- [ggml-vicuna-7b-4bit](https://huggingface.co/eachadea/ggml-vicuna-7b-4bit)
+- [vicuna-13b-GPTQ-4bit-128g](https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g)
+- [LLaMa-Storytelling-4Bit](https://huggingface.co/GamerUntouch/LLaMa-Storytelling-4Bit)
+
+
# Roadmap
## Short Term
- (IN PROGRESS) Train a GPT4All model based on GPTJ to alleviate llama distribution issues.
From 873a5588256a1767e3a629a67bc706beeb1f2aff Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Wed, 5 Apr 2023 13:03:17 -0400
Subject: [PATCH 33/45] Typescript bindings link
---
README.md | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/README.md b/README.md
index 6bc2207e..05cfcb3c 100644
--- a/README.md
+++ b/README.md
@@ -10,12 +10,17 @@
:snake: Official Python Bindings
+
+:computer: Official Typescript Bindings
+
+
Discord
+
![gpt4all-lora-demo](https://user-images.githubusercontent.com/13879686/228352356-de66ca7a-df70-474e-b929-2e3656165051.gif)
Run on M1 Mac (not sped up!)
From f90831180e95f057706f493de7a7f9a674b08689 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Wed, 5 Apr 2023 13:15:23 -0400
Subject: [PATCH 34/45] Added MD5 signatures to ecosystem links.
---
README.md | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index 05cfcb3c..6a58fe9a 100644
--- a/README.md
+++ b/README.md
@@ -40,6 +40,7 @@ Here's how to get started with the CPU quantized GPT4All model checkpoint:
For custom hardware compilation, see our [llama.cpp](https://github.com/zanussbaum/gpt4all.cpp) fork.
-----------
+Find all compatible models in the GPT4All Ecosystem section.
[Secret Unfiltered Checkpoint](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) - [[Torrent]](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.torrent)
@@ -90,8 +91,9 @@ You can pass any of the [huggingface generation config params](https://huggingfa
Edge models in the GPT4All Ecosystem. Please PR as the [community grows](https://huggingface.co/models?sort=modified&search=4bit).
Feel free to convert this to a more structured table.
-- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin)
-- [gpt4all-unfiltered](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin)
+- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
+ - [gpt4all-ggml-converted](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
+- [gpt4all-unfiltered](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.md5)]
- [ggml-vicuna-7b-4bit](https://huggingface.co/eachadea/ggml-vicuna-7b-4bit)
- [vicuna-13b-GPTQ-4bit-128g](https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g)
- [LLaMa-Storytelling-4Bit](https://huggingface.co/GamerUntouch/LLaMa-Storytelling-4Bit)
From 5977a650e6c3f3aa475e6c307d1b0af75f87a678 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Wed, 5 Apr 2023 13:24:47 -0400
Subject: [PATCH 35/45] Typescript and Langchain bindings
---
README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index 6a58fe9a..14cd8af1 100644
--- a/README.md
+++ b/README.md
@@ -5,13 +5,12 @@
:green_book: Technical Report
-
:snake: Official Python Bindings
-:computer: Official Typescript Bindings
+:computer: Official Typescript Bindings --- Official Langchain Backend 🦜️🔗
@@ -21,6 +20,7 @@
+
![gpt4all-lora-demo](https://user-images.githubusercontent.com/13879686/228352356-de66ca7a-df70-474e-b929-2e3656165051.gif)
Run on M1 Mac (not sped up!)
From 828a3a67fa1871365e88ebe3e20c866c0e8784ce Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Wed, 5 Apr 2023 14:10:00 -0400
Subject: [PATCH 36/45] Formatting Update
---
README.md | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 14cd8af1..9077fd5b 100644
--- a/README.md
+++ b/README.md
@@ -10,9 +10,14 @@
-:computer: Official Typescript Bindings --- Official Langchain Backend 🦜️🔗
+:computer: Official Typescript Bindings
+
+🦜️🔗 Official Langchain Backend
+
+
+
Discord
From 5fa260e18a94e74046bf132782f1bb6a9ec52cfb Mon Sep 17 00:00:00 2001
From: Ben Schmidt
Date: Thu, 6 Apr 2023 11:28:59 -0400
Subject: [PATCH 37/45] Add MIT license.
---
LICENSE.txt | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
create mode 100644 LICENSE.txt
diff --git a/LICENSE.txt b/LICENSE.txt
new file mode 100644
index 00000000..51aef442
--- /dev/null
+++ b/LICENSE.txt
@@ -0,0 +1,19 @@
+Copyright (c) 2023 Nomic, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
From 2052f459ed796060b380382cbded8123b8ccea08 Mon Sep 17 00:00:00 2001
From: Dillon Erb <585865+dte@users.noreply.github.com>
Date: Thu, 6 Apr 2023 18:11:05 -0400
Subject: [PATCH 38/45] Update README.md
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 31b943b8..cd7c43ee 100644
--- a/README.md
+++ b/README.md
@@ -98,7 +98,7 @@ You can pass any of the [huggingface generation config params](https://huggingfa
Edge models in the GPT4All Ecosystem. Please PR as the [community grows](https://huggingface.co/models?sort=modified&search=4bit).
Feel free to convert this to a more structured table.
-- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
+- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin.md5)]
- [gpt4all-ggml-converted](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
- [gpt4all-unfiltered](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.md5)]
- [ggml-vicuna-7b-4bit](https://huggingface.co/eachadea/ggml-vicuna-7b-4bit)
From b8292dd7d0042cf54634ef63dd2495aa950ac214 Mon Sep 17 00:00:00 2001
From: MalikMAlna
Date: Thu, 6 Apr 2023 19:56:49 -0400
Subject: [PATCH 39/45] Slight cleanup of superfluous comment and space after
comma
---
data.py | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/data.py b/data.py
index ff79924c..72dc4574 100644
--- a/data.py
+++ b/data.py
@@ -57,7 +57,6 @@ def load_data(config, tokenizer):
dataset_path = config["dataset_path"]
if os.path.exists(dataset_path):
- # check if path is a directory
if os.path.isdir(dataset_path):
files = glob.glob(os.path.join(dataset_path, "*_clean.jsonl"))
else:
@@ -68,7 +67,7 @@ def load_data(config, tokenizer):
dataset = load_dataset("json", data_files=files, split="train")
else:
- dataset = load_dataset(dataset_path,split='train')
+ dataset = load_dataset(dataset_path, split='train')
dataset = dataset.train_test_split(test_size=.05, seed=config["seed"])
From 334f36d8440b815afe5265b250c5f117f2b2c10d Mon Sep 17 00:00:00 2001
From: MalikMAlna
Date: Thu, 6 Apr 2023 19:57:46 -0400
Subject: [PATCH 40/45] Slight cleanup of superfluous comment and space after
commas
---
data.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/data.py b/data.py
index 72dc4574..4457d93e 100644
--- a/data.py
+++ b/data.py
@@ -86,7 +86,7 @@ def load_data(config, tokenizer):
**kwargs
)
val_dataset = val_dataset.map(
- lambda ele: tokenize_inputs(config, tokenizer, ele),
+ lambda ele: tokenize_inputs(config, tokenizer, ele),
batched=True,
remove_columns=["source", "prompt"],
**kwargs
From 17fb6f668a2a9536c100016e6fe81fe375a6f13b Mon Sep 17 00:00:00 2001
From: MalikMAlna
Date: Thu, 6 Apr 2023 20:07:08 -0400
Subject: [PATCH 41/45] Changing single to double quotes for quote consistency
---
data.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/data.py b/data.py
index 4457d93e..e5a7fb14 100644
--- a/data.py
+++ b/data.py
@@ -31,7 +31,7 @@ def tokenize_inputs(config, tokenizer, examples):
# add target tokens, remove bos
input_ids[i, newline_plus_inputs: newline_plus_inputs + len(target_tokens)] = target_tokens
- # add eos token, enforce stopping if we don't truncate
+ # add eos token, enforce stopping if we don't truncate
# we don't want long code to stop generating if truncated during training
if newline_plus_inputs + len(target_tokens) < max_length:
input_ids[i, newline_plus_inputs + len(target_tokens)] = tokenizer.eos_token_id
@@ -67,7 +67,7 @@ def load_data(config, tokenizer):
dataset = load_dataset("json", data_files=files, split="train")
else:
- dataset = load_dataset(dataset_path, split='train')
+ dataset = load_dataset(dataset_path, split="train")
dataset = dataset.train_test_split(test_size=.05, seed=config["seed"])
From 1195d09fba7dd51c14e15b7dfea6227ca75739e9 Mon Sep 17 00:00:00 2001
From: MalikMAlna
Date: Thu, 6 Apr 2023 20:20:18 -0400
Subject: [PATCH 42/45] Rephrasing comment for clarity
---
data.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/data.py b/data.py
index e5a7fb14..a83ed3d6 100644
--- a/data.py
+++ b/data.py
@@ -31,7 +31,7 @@ def tokenize_inputs(config, tokenizer, examples):
# add target tokens, remove bos
input_ids[i, newline_plus_inputs: newline_plus_inputs + len(target_tokens)] = target_tokens
- # add eos token, enforce stopping if we don't truncate
+ # add eos token; ensure generation stops if inputs aren't truncated
# we don't want long code to stop generating if truncated during training
if newline_plus_inputs + len(target_tokens) < max_length:
input_ids[i, newline_plus_inputs + len(target_tokens)] = tokenizer.eos_token_id
From 98da69119bd70ee4d1868b3ca4304f9ed0426866 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Fri, 7 Apr 2023 10:47:15 -0400
Subject: [PATCH 43/45] Update README.md
---
README.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/README.md b/README.md
index 31b943b8..d231236a 100644
--- a/README.md
+++ b/README.md
@@ -104,6 +104,7 @@ Feel free to convert this to a more structured table.
- [ggml-vicuna-7b-4bit](https://huggingface.co/eachadea/ggml-vicuna-7b-4bit)
- [vicuna-13b-GPTQ-4bit-128g](https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g)
- [LLaMa-Storytelling-4Bit](https://huggingface.co/GamerUntouch/LLaMa-Storytelling-4Bit)
+- [Alpaca Native 4bit](https://huggingface.co/Sosaka/Alpaca-native-4bit-ggml/tree/main)
# Roadmap
From aa39e808335587176d2b98fb0584f4bc31c83b98 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Fri, 7 Apr 2023 10:50:02 -0400
Subject: [PATCH 44/45] Correct MD5 Hash
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index d231236a..b3545fe4 100644
--- a/README.md
+++ b/README.md
@@ -98,7 +98,7 @@ You can pass any of the [huggingface generation config params](https://huggingfa
Edge models in the GPT4All Ecosystem. Please PR as the [community grows](https://huggingface.co/models?sort=modified&search=4bit).
Feel free to convert this to a more structured table.
-- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
+- [gpt4all](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin.md5)]
- [gpt4all-ggml-converted](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin.md5)]
- [gpt4all-unfiltered](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin) [[MD5 Signature](https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-unfiltered-quantized.bin.md5)]
- [ggml-vicuna-7b-4bit](https://huggingface.co/eachadea/ggml-vicuna-7b-4bit)
From a5110c81d984ded511f23063a353af5ffad6b813 Mon Sep 17 00:00:00 2001
From: Andriy Mulyar
Date: Fri, 7 Apr 2023 13:53:47 -0400
Subject: [PATCH 45/45] Updated roadmap and links.
---
README.md | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index b3545fe4..821eb0ff 100644
--- a/README.md
+++ b/README.md
@@ -13,6 +13,10 @@
:computer: Official Typescript Bindings
+
+:speech_balloon: Official Chat Interface
+
+
🦜️🔗 Official Langchain Backend
@@ -111,9 +115,9 @@ Feel free to convert this to a more structured table.
## Short Term
- (IN PROGRESS) Train a GPT4All model based on GPTJ to alleviate llama distribution issues.
- (IN PROGRESS) Create improved CPU and GPU interfaces for this model.
- - (NOT STARTED) Integrate llama.cpp bindings
- - (NOT STARTED) Create a good conversational chat interface for the model.
- - (NOT STARTED) Allow users to opt in and submit their chats for subsequent training runs
+ - (Done) [Integrate llama.cpp bindings](https://github.com/nomic-ai/pyllamacpp)
+ - (Done) [Create a good conversational chat interface for the model.](https://github.com/nomic-ai/gpt4all-ui)
+ - (Done) [Allow users to opt in and submit their chats for subsequent training runs](https://github.com/nomic-ai/gpt4all-ui)
## Medium Term
- (NOT STARTED) Integrate GPT4All with [Atlas](https://atlas.nomic.ai) to allow for document retrieval.