petals/README.md

# bloom-demo
Early dev prototype for decentralized bloom. Not for public eyes **yet**.

```python
if you.read(this) and you.name not in '@timdettmers @borzunov @mryab @greenfatguy'.split():
  you.go("away")
```


### install


```bash
conda create -y --name bloom-demo python=3.8.12 pip
conda activate bloom-demo

conda install -y -c conda-forge cudatoolkit-dev==11.3.1 cudatoolkit==11.3.1 cudnn==8.2.1.32
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
pip install accelerate==0.10.0 huggingface-hub==0.7.0 hivemind==1.1.0
pip install bitsandbytes-cuda113==0.26.0
pip install https://github.com/huggingface/transformers/archive/6589e510fa4e6c442059de2fab84752535de9b23.zip
```


### run local inference:
No networking whatsoever, used to verify architecture optimizations

```bash
# run one bloom block for a few steps -- on a local machine
python -m cli.inference_one_block --config cli/config.json  # see other args
```

### run distributed inference / training

First, run one or more servers like this:
```bash
# minimalistic server with non-trained bloom blocks
python -m cli.run_server --prefix bloom6b3 --converted_model_name_or_path bigscience/test-bloomd-6b3 \
  --block_indices 3:5 --torch_dtype float32 --identity_path ./server1.id --host_maddrs /ip4/127.0.0.1/tcp/31337
# when running multiple servers:
# - give each server a unique --identity_path (or remote --identity_path arg when debugging)
# - if running multiple servers on the same machine, give each a unique port (last integer in --host_maddrs, 0 means random port)
# - when running over the internet, change --host_maddrs according to https://learning-at-home.readthedocs.io/en/latest/user/dht.html#running-across-the-internet
# - each server except first should have --initial_peers pointing to one of pre-existing servers 
```

Then open a python notebook or console and run:
```python
import torch
import hivemind
from src import get_remote_module


dht = hivemind.DHT(
    initial_peers=[TODO_COPY_FULL_ADDRESS_FROM_ANY_OF_THE_SERVERS],  # e.g. /ip4/127.0.0.1/...
    client_mode=True, start=True,
)

layer3, layer4 = get_remote_module(dht, ['bloom6b3.3', 'bloom6b3.4'])
assert layer3 is not None and layer4 is not None, "one or both layers were not found in DHT"
# test forward/backward, two blocks
outputs, = layer4(*layer3(torch.randn(1, 64, 4096)))
loss = (outputs * torch.randn_like(outputs)).norm()
loss.backward()

# test inference, one block
with layer3.begin_inference_session() as sess:
    for i in range(10):
        res = sess.step(torch.ones(1, 1, 4096))
```


### convert regular bloom to distributed
```bash

# convert model from HF hub to a distributed format (can take hours depending on your connection!)
MY_WRITE_TOKEN=TODO_WRITE_TOKEN_FROM_https://huggingface.co/settings/token
python -m cli.convert_model --model bigscience/bloom-6b3  \
  --output_path ./converted_model --output_repo bigscience/test-bloomd-6b3 \
  --use_auth_token $MY_WRITE_TOKEN  # ^-- todo replace output repo with something you have access to
```


### test local vs remote block (allclose)

To test distributed inference, run one or more servers, then open a new shell and run pytest with environment variables:
```bash
# shell A: serve blocks 3 and 4
python -m cli.run_server --prefix bloom6b3 --converted_model_name_or_path bigscience/test-bloomd-6b3 \
  --block_indices 3:5 --torch_dtype float32 --identity_path ./server1.id --host_maddrs /ip4/127.0.0.1/tcp/31337

# shell B: connect to the swarm and test individual blocks for exact match
export PYTHONPATH=. INITIAL_PEERS="/ip4/TODO_COPY_INITIAL_PEERS_FROM_SERVER_OUTPUT"
BLOCK_UID=bloom6b3.3 pytest tests/test_block_exact_match.py
BLOCK_UID=bloom6b3.4 pytest tests/test_block_exact_match.py

# the test below will fail because server only has layers [3:5)
# BLOCK_UID=bloom6b3.7 pytest tests/test_block_exact_match.py
```
Initial commit 2022-06-12 00:10:27 +00:00			`# bloom-demo`
install script 2022-06-12 01:23:38 +00:00			`Early dev prototype for decentralized bloom. Not for public eyes yet.`
Update README.md 2022-06-12 00:13:40 +00:00
install script 2022-06-12 01:23:38 +00:00			```python
			`if you.read(this) and you.name not in '@timdettmers @borzunov @mryab @greenfatguy'.split():`
			`you.go("away")`
			```



Update README.md 2022-06-20 14:02:29 +00:00			`### install`
install script 2022-06-12 01:23:38 +00:00

			```bash
			`conda create -y --name bloom-demo python=3.8.12 pip`
			`conda activate bloom-demo`

			`conda install -y -c conda-forge cudatoolkit-dev==11.3.1 cudatoolkit==11.3.1 cudnn==8.2.1.32`
			`pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html`
install hivemind from pip 2022-06-22 14:32:13 +00:00			`pip install accelerate==0.10.0 huggingface-hub==0.7.0 hivemind==1.1.0`
install script 2022-06-12 01:23:38 +00:00			`pip install bitsandbytes-cuda113==0.26.0`
push converted model to hub 2022-06-19 16:06:35 +00:00			`pip install https://github.com/huggingface/transformers/archive/6589e510fa4e6c442059de2fab84752535de9b23.zip`
install script 2022-06-12 01:23:38 +00:00			```
add minimalistic benchmarks 2022-06-14 12:18:11 +00:00

Update README.md 2022-06-20 14:02:29 +00:00			`### run local inference:`
instructions to test distributed inference 2022-06-19 19:22:01 +00:00			`No networking whatsoever, used to verify architecture optimizations`
add minimalistic benchmarks 2022-06-14 12:18:11 +00:00
			```bash
instructions to test distributed inference 2022-06-19 19:22:01 +00:00			`# run one bloom block for a few steps -- on a local machine`
add minimalistic benchmarks 2022-06-14 12:18:11 +00:00			`python -m cli.inference_one_block --config cli/config.json # see other args`
instructions to test distributed inference 2022-06-19 19:22:01 +00:00			```
add minimalistic benchmarks 2022-06-14 12:18:11 +00:00
Update README.md 2022-06-20 14:02:29 +00:00			`### run distributed inference / training`
warn about long runtime 2022-06-19 19:14:52 +00:00
instructions to test distributed inference 2022-06-19 19:22:01 +00:00			`First, run one or more servers like this:`
			```bash
add minimalistic benchmarks 2022-06-14 12:18:38 +00:00			`# minimalistic server with non-trained bloom blocks`
fetch a specific bloom block without downloading the entire model 2022-06-20 12:33:17 +00:00			`python -m cli.run_server --prefix bloom6b3 --converted_model_name_or_path bigscience/test-bloomd-6b3 \`
			`--block_indices 3:5 --torch_dtype float32 --identity_path ./server1.id --host_maddrs /ip4/127.0.0.1/tcp/31337`
notes on hosting servers 2022-06-19 16:34:18 +00:00			`# when running multiple servers:`
			`# - give each server a unique --identity_path (or remote --identity_path arg when debugging)`
			`# - if running multiple servers on the same machine, give each a unique port (last integer in --host_maddrs, 0 means random port)`
			`# - when running over the internet, change --host_maddrs according to https://learning-at-home.readthedocs.io/en/latest/user/dht.html#running-across-the-internet`
			`# - each server except first should have --initial_peers pointing to one of pre-existing servers`
instructions to test distributed inference 2022-06-19 19:22:01 +00:00			```

			`Then open a python notebook or console and run:`
			```python
			`import torch`
			`import hivemind`
support hosting multiple instances of the same block 2022-06-22 19:00:55 +00:00			`from src import get_remote_module`
instructions to test distributed inference 2022-06-19 19:22:01 +00:00
basic chained inference (multiple blocks per one RPC call) 2022-06-23 13:33:16 +00:00
instructions to test distributed inference 2022-06-19 19:22:01 +00:00			`dht = hivemind.DHT(`
basic chained inference (multiple blocks per one RPC call) 2022-06-23 13:33:16 +00:00			`initial_peers=[TODO_COPY_FULL_ADDRESS_FROM_ANY_OF_THE_SERVERS], # e.g. /ip4/127.0.0.1/...`
instructions to test distributed inference 2022-06-19 19:22:01 +00:00			`client_mode=True, start=True,`
			`)`

fetch a specific bloom block without downloading the entire model 2022-06-20 12:33:17 +00:00			`layer3, layer4 = get_remote_module(dht, ['bloom6b3.3', 'bloom6b3.4'])`
			`assert layer3 is not None and layer4 is not None, "one or both layers were not found in DHT"`
instructions to test distributed inference 2022-06-19 19:25:57 +00:00			`# test forward/backward, two blocks`
fetch a specific bloom block without downloading the entire model 2022-06-20 12:33:17 +00:00			`outputs, = layer4(*layer3(torch.randn(1, 64, 4096)))`
instructions to test distributed inference 2022-06-19 19:22:01 +00:00			`loss = (outputs * torch.randn_like(outputs)).norm()`
			`loss.backward()`

instructions to test distributed inference 2022-06-19 19:25:57 +00:00			`# test inference, one block`
fetch a specific bloom block without downloading the entire model 2022-06-20 12:33:17 +00:00			`with layer3.begin_inference_session() as sess:`
instructions to test distributed inference 2022-06-19 19:22:01 +00:00			`for i in range(10):`
			`res = sess.step(torch.ones(1, 1, 4096))`
			```


Update README.md 2022-06-20 14:02:29 +00:00			`### convert regular bloom to distributed`
instructions to test distributed inference 2022-06-19 19:22:01 +00:00			```bash

			`# convert model from HF hub to a distributed format (can take hours depending on your connection!)`
			`MY_WRITE_TOKEN=TODO_WRITE_TOKEN_FROM_https://huggingface.co/settings/token`
			`python -m cli.convert_model --model bigscience/bloom-6b3 \`
			`--output_path ./converted_model --output_repo bigscience/test-bloomd-6b3 \`
			`--use_auth_token $MY_WRITE_TOKEN # ^-- todo replace output repo with something you have access to`
add testing guide 2022-06-20 13:50:12 +00:00			```


Update README.md 2022-06-20 14:02:29 +00:00			`### test local vs remote block (allclose)`
add testing guide 2022-06-20 13:50:12 +00:00
			`To test distributed inference, run one or more servers, then open a new shell and run pytest with environment variables:`
			```bash
better testing readme 2022-06-20 13:51:20 +00:00			`# shell A: serve blocks 3 and 4`
add testing guide 2022-06-20 13:50:12 +00:00			`python -m cli.run_server --prefix bloom6b3 --converted_model_name_or_path bigscience/test-bloomd-6b3 \`
			`--block_indices 3:5 --torch_dtype float32 --identity_path ./server1.id --host_maddrs /ip4/127.0.0.1/tcp/31337`

better testing readme 2022-06-20 13:51:20 +00:00			`# shell B: connect to the swarm and test individual blocks for exact match`
add testing guide 2022-06-20 13:50:12 +00:00			`export PYTHONPATH=. INITIAL_PEERS="/ip4/TODO_COPY_INITIAL_PEERS_FROM_SERVER_OUTPUT"`
			`BLOCK_UID=bloom6b3.3 pytest tests/test_block_exact_match.py`
			`BLOCK_UID=bloom6b3.4 pytest tests/test_block_exact_match.py`

			`# the test below will fail because server only has layers [3:5)`
			`# BLOCK_UID=bloom6b3.7 pytest tests/test_block_exact_match.py`
Update README.md 2022-06-20 14:02:29 +00:00			```