You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

146 lines
5.9 KiB

<p align="center">
<img src="" width="400"><br>
Decentralized platform for running 100B+ language models<br><br>
<a href="">
<img src="">
<a href="">
<img src="">
2 years ago
## Key features
2 years ago
- Run inference or fine-tune [BLOOM-176B]( by joining compute resources with people all over the Internet. No need to have high-end GPUs.
- One inference step takes ≈ 1 sec — much faster than possible with offloading. Enough for chatbots and other interactive apps.
- Employ any fine-tuning and sampling methods by accessing model's hidden states and changing its control flow — something you can't do in proprietary APIs.
2 years ago
<p align="center">
<b><a href="">[Read paper]</a></b> | <b><a href="">[View website]</a></b>
2 years ago
## How it works?
<p align="center">
<img src="" width="800">
### 🚧 This project is in active development
Be careful: some features may not work, interfaces may change, and we have no detailed docs yet (see [roadmap](
A stable version of the code and a public swarm open to everyone will be released in November 2022. You can [subscribe]( to be emailed when it happens or fill in [this form]( to help the public launch by donating GPU time. In the meantime, you can launch and use your own private swarm.
## Code examples
Solving a sequence classification task via soft prompt tuning of BLOOM-176B:
# Initialize distributed BLOOM with soft prompts
model = AutoModelForPromptTuning.from_pretrained(
# Define optimizer for prompts and linear head
optimizer = torch.optim.AdamW(model.parameters())
for input_ids, labels in data_loader:
# Forward pass with local and remote layers
outputs = model.forward(input_ids)
loss = cross_entropy(outputs.logits, labels)
# Distributed backward w.r.t. local params
loss.backward() # Compute model.prompts.grad
optimizer.step() # Update local params only
## Installation
2 years ago
conda install -y -c conda-forge cudatoolkit-dev==11.3.1 cudatoolkit==11.3.1 cudnn==
pip install torch==1.12.0+cu113 -f
pip install -r requirements.txt
pip install -i bitsandbytes-cuda113
2 years ago
### Basic functionality
All tests is run on localhost
First, run one or more servers like this:
# minimalistic server with non-trained bloom blocks
python -m cli.run_server --converted_model_name_or_path bigscience/test-bloomd-6b3 \
--block_indices 3:5 --torch_dtype float32 --identity_path ./ --host_maddrs /ip4/
# when running multiple servers:
# - give each server a unique --identity_path (or remote --identity_path arg when debugging)
# - if running multiple servers on the same machine, give each a unique port (last integer in --host_maddrs, 0 means random port)
# - when running over the internet, change --host_maddrs according to
# - each server except first should have --initial_peers pointing to one of pre-existing servers
Then open a python notebook or console and run:
import torch
import hivemind
from src import DistributedBloomConfig, get_remote_module
dht = hivemind.DHT(
initial_peers=[TODO_COPY_FULL_ADDRESS_FROM_ANY_OF_THE_SERVERS], # e.g. /ip4/
client_mode=True, start=True,
config = DistributedBloomConfig.from_pretrained("bigscience/test-bloom-6b3")
layer3, layer4 = get_remote_module(dht, ['bigscience/test-bloomd-6b3.3', 'bigscience/test-bloomd-6b3.4'], config)
assert layer3 is not None and layer4 is not None, "one or both layers were not found in DHT"
# test forward/backward, two blocks
outputs = layer4(layer3(torch.randn(1, 64, 4096)))
loss = (outputs * torch.randn_like(outputs)).norm()
# test inference, one block
with layer3.inference_session(max_length=10) as sess:
for i in range(10):
res = sess.step(torch.ones(1, 1, 4096))
### Convert regular BLOOM into distributed
# convert model from HF hub to a distributed format (can take hours depending on your connection!)
python -m cli.convert_model --model bigscience/bloom-6b3 \
--output_path ./converted_model --output_repo bigscience/test-bloomd-6b3 \
--use_auth_token $MY_WRITE_TOKEN # ^-- todo replace output repo with something you have access to
### Test local vs remote block (allclose)
To test distributed inference, run one or more servers, then open a new shell and run pytest with environment variables:
# shell A: serve model
python -m cli.run_server --converted_model_name_or_path bigscience/test-bloomd-6b3 \
--torch_dtype float32 --identity_path ./ --host_maddrs /ip4/
# shell B:
export MODEL_NAME="bigscience/test-bloomd-6b3"
# test individual random blocks for exact match
pytest tests/
# test the full model
pytest tests/
2 years ago
<p align="center">
This project is a part of the <a href="">BigScience</a> research workshop.
<p align="center">
<img src="" width="150">