mirror of https://github.com/nomic-ai/gpt4all synced 2024-11-08 07:10:32 +00:00

Go to file

Zach e4e88dff33 fix: data processing		2023-04-06 03:03:34 +00:00
chat	Merge branch 'main' into chat-windows-binary	2023-03-29 10:35:31 -04:00
configs	fix: configs	2023-04-05 20:42:35 +00:00
eval_data	started eval script and added eval data	2023-03-27 21:50:08 +00:00
figs	Merge branch 'main' of https://github.com/nomic-ai/gpt4all into main	2023-03-28 20:23:42 +00:00
peft@098962fa65	feat: peft submodule	2023-03-25 16:23:14 +00:00
transformers@cae78c46d6	feat: transformers submodule, gitignore	2023-03-25 16:16:11 +00:00
.gitignore	fix: ignore env	2023-04-05 02:52:21 +00:00
.gitmodules	feat: peft submodule	2023-03-25 16:23:14 +00:00
clean.py	fix: clean where prompt is randomly 1 char	2023-04-04 20:47:21 +00:00
create_hostname.sh	feat: multinode setup	2023-04-05 02:53:04 +00:00
data.py	fix: data processing	2023-04-06 03:03:34 +00:00
env.yaml	feat: env for conda, pip	2023-03-25 16:16:40 +00:00
eval_figures.py	updated eval	2023-03-28 20:22:48 +00:00
eval_self_instruct.py	added eval code	2023-03-28 18:47:38 +00:00
generate.py	metrics run on configs now	2023-03-28 00:09:47 +00:00
gpt4all-lora-demo.gif	GIF	2023-03-28 15:54:44 -04:00
head_node_setup.sh	feat: multinode setup	2023-04-05 02:53:04 +00:00
read.py	feat: train and clean data	2023-03-25 16:17:48 +00:00
README.md	Qualified number of epochs for LoRa weights	2023-03-29 12:26:47 -04:00
requirements.txt	feat: env for conda, pip	2023-03-25 16:16:40 +00:00
train.py	fix: try except push	2023-04-05 20:42:22 +00:00
TRAINING_LOG.md	Update TRAINING_LOG.md	2023-03-28 13:47:22 -07:00
worker_node_setup.sh	feat: multinode setup	2023-04-05 02:53:04 +00:00

README.md

GPT4All

Demo, data and code to train an assistant-style large language model with ~800k GPT-3.5-Turbo Generations based on LLaMa

📗 Technical Report

Discord

Run on M1 Mac (not sped up!)

Try it yourself

Download the CPU quantized gpt4all model checkpoint: gpt4all-lora-quantized.bin.

Clone this repository down and place the quantized model in the chat directory and start chatting by running:

cd chat;./gpt4all-lora-quantized-OSX-m1 on M1 Mac/OSX
cd chat;./gpt4all-lora-quantized-linux-x86 on Linux
cd chat;./gpt4all-lora-quantized-win64.exe on Windows (PowerShell)
cd chat;./gpt4all-lora-quantized-OSX-intel on Intel Mac/OSX

To compile for custom hardware, see our fork of the Alpaca C++ repo.

Note: the full model on GPU (16GB of RAM required) performs much better in our qualitative evaluations.

Reproducibility

Trained LoRa Weights:

gpt4all-lora (four full epochs of training): https://huggingface.co/nomic-ai/gpt4all-lora
gpt4all-lora-epoch-2 (three full epochs of training) https://huggingface.co/nomic-ai/gpt4all-lora-epoch-2

Raw Data:

Training Data Without P3
- Explorer: https://atlas.nomic.ai/map/gpt4all_data_clean_without_p3
Full Dataset with P3
- Explorer: https://atlas.nomic.ai/map/gpt4all_data_clean

We are not distributing a LLaMa 7B checkpoint.

You can reproduce our trained model by doing the following:

Setup

Clone the repo

git clone --recurse-submodules https://github.com/nomic-ai/gpt4all.git

git submodule configure && git submodule update

Setup the environment

python -m pip install -r requirements.txt

cd transformers
pip install -e . 

cd ../peft
pip install -e .

Training

accelerate launch --dynamo_backend=inductor --num_processes=8 --num_machines=1 --machine_rank=0 --deepspeed_multinode_launcher standard --mixed_precision=bf16  --use_deepspeed --deepspeed_config_file=configs/deepspeed/ds_config.json train.py --config configs/train/finetune-7b.yaml

Generate

python generate.py --config configs/generate/generate.yaml --prompt "Write a script to reverse a string in Python"

Sample Generations

Provide instructions for the given exercise. Leg Raises

Leg Raises

Stand with your feet shoulder-width apart and your knees slightly bent.

Raise your arms straight out in front of you.

Slowly bend your knees and raise your heels off the ground.

Hold this position for a count of 3, then slowly lower your heels back to the ground.

Repeat the exercise 10 times.

Do not let your knees go beyond your toes.

Keep your back straight and your abdominal muscles engaged.

Do not let your hips sag or your back arch.

Keep your arms straight and your elbows locked.

Do not let your head drop or your shoulders rise.

Breathe normally throughout the exercise.

Rest for 30 seconds before repeating the exercise.

Gradually increase the number of repetitions as you become stronger.

A color description has been provided. Find the CSS code associated with that color. A light red color with a medium light shade of pink

The CSS code associated with the color provided is: #FF6347

Come up with an interesting idea for a new movie plot. Your plot should be described with a title and a summary.

The title of your movie plot should be "The Last Stand". 2. The summary of your movie plot should be: A group of mercenaries are hired to protect a small town from a dangerous criminal who is on the run. However, the town's sheriff is not willing to let them take over and decides to take matters into his own hands.

Reverse a string in python.

In Python, you can reverse a list or tuple by using the reversed() function on it. Here's an example of how to use this method with strings:
my_string = "Hello World" # Define your original string here
reversed_str = my_string[::-1]   # Use a slice and negative index for reverse order
print(reversed_str)     # Output will be reversed version of the inputted string.

List 10 dogs.

Labrador Retriever

Golden Retriever

Beagle

Poodle

German Shepherd

Bulldog

Rottweiler

Boxer

Shih Tzu

Pug

Write me a poem about the fall of Julius Ceasar into a ceasar salad in iambic pentameter.

The fall of Julius Ceasar into a ceasar salad In iambic pentameter, The mighty Roman emperor Fell into a salad of lettuce and croutons, His empire crumbling around him, As he was devoured by the hungry diners. The once mighty emperor Was now just a salad topping, His legacy reduced to a mere garnish, As he was consumed by the masses. The fall of Julius Ceasar into a ceasar salad, In iambic pentameter, Was a tragic end to a great man, Who had once ruled the world with an iron fist. But now, he was just a salad topping, His legacy reduced to a mere garnish, As he was consumed by the masses.

What is a three word topic describing the following keywords: baseball, football, soccer:

Sports, athletics, games

If you utilize this reposistory, models or data in a downstream project, please consider citing it with:

@misc{gpt4all,
  author = {Yuvanesh Anand and Zach Nussbaum and Brandon Duderstadt and Benjamin Schmidt and Andriy Mulyar},
  title = {GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nomic-ai/gpt4all}},
}

Alternative Download Locations

gpt4all-lora-quantized.bin Backup Torrent Link

magnet:?xt=urn:btih:1F11A9691EE06C18F0040E359361DCA0479BCB5A&dn=gpt4all-lora-quantized.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce