- run_server now accepts model name as both positional and keyword argument
- changed names in README to account for interface updates
- moved model conversion from README to a separate wiki page
- updated requirements.txt
@ -78,89 +78,121 @@ This is important because it's technically possible for peers serving model laye
## Installation
🚧 **Note:** These are short instructions for running a private swarm with a test 6B version of BLOOM. We will replace them with instructions involving the full 176B BLOOM and more detailed explanations soon (in a day or two).
Here's how to install the dependencies with conda:
This script uses Anaconda to install cuda-enabled PyTorch.
If you don't have anaconda, you can get it from [here](https://www.anaconda.com/products/distribution).
If you don't want anaconda, you can install PyTorch [any other way](https://pytorch.org/get-started/locally/).
If you want to run models with 8-bit weights, please install **PyTorch with CUDA 11** or newer for compatility with [bitsandbytes](https://github.com/timDettmers/bitsandbytes).
__OS support:__ currently, PETALS only supports Linux operating systems. On Windows 11, you can run PETALS with GPU enabled inside WSL2 ([read more](https://learn.microsoft.com/en-us/windows/ai/directml/gpu-cuda-in-wsl).
For macOS, you can *probably* run everything normally if you manage to install dependencies, but we do not guarantee this.
### Getting Started
This is a toy example running on a local machine without GPU and with a tiny model.
For a more detailed instruction with larger models, see ["Launch your own swarm"](https://github.com/bigscience-workshop/petals/wiki/Launch-your-own-swarm).
First, run a couple of servers, each in a separate shell. First server runs like this
--host_maddrs /ip4/127.0.0.1/tcp/31337 # use port 31337, local connections only
```
### Basic functionality
This server will host 8 (out of 24) layers for [this tiny bloom model](https://huggingface.co/bloom-testing/test-bloomd-560m-main) that was converted for PETALS.
To run a different model, please see [this wiki page](https://github.com/bigscience-workshop/petals/wiki/Run-a-custom-model-with-PETALS).
All tests is run on localhost
Once the server has started, it will print out a ton of information, including an (important) line like this:
```bash
Mon Day 01:23:45.678 [INFO] Running DHT node on ['/ip4/127.0.0.1/tcp/31337/p2p/ALongStringOfCharacters'], initial peers = []
```
First, run one or more servers like this:
You can use this address (/ip4/whatever/else) to connect additional servers. Open another terminal and run:
```bash
# minimalistic server with non-trained bloom blocks
# - give each server a unique --identity_path (or remote --identity_path arg when debugging)
# - if running multiple servers on the same machine, give each a unique port (last integer in --host_maddrs, 0 means random port)
# - when running over the internet, change --host_maddrs according to https://learning-at-home.readthedocs.io/en/latest/user/dht.html#running-across-the-internet
# - each server except first should have --initial_peers pointing to one of pre-existing servers
Of course, this is a simplified code snippet. For actual training, see our example on "deep" prompt-tuning here: [examples/prompt-tuning-personachat.ipynb](./examples/prompt-tuning-personachat.ipynb).
### Test local vs remote block (allclose)
Here's a [more advanced tutorial](https://github.com/bigscience-workshop/petals/wiki/Launch-your-own-swarm) that covers 8-bit quantization and best practices for running PETALS.
To test distributed inference, run one or more servers, then open a new shell and run pytest with environment variables:
The automated tests use a more complex server configuration that can be found [here](https://github.com/bigscience-workshop/petals/blob/main/.github/workflows/run-tests.yaml)
We use [black](https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html) and [isort](https://pycqa.github.io/isort/) for all pull requests.
Before commiting your code, simply run `black . && isort .` and you will be fine.