Commit Graph

7 Commits (041ad2089108449db2201f6860c4d8f916525e1a)

Author SHA1 Message Date
Alexander Borzunov 041ad20891
Check reachability automatically and give advice how to fix it (#155)
1. If we connect to the **public swarm**, the server now **automatically checks its DHT's reachability** from the outside world using API at http://health.petals.ml This is important to disallow unreachable servers to proceed (they create issues for the clients, such as repetitive retries).

    If http://health.petals.ml is down, the server proceeds without the check (so we don't depend on it). However, if health.petals.ml is up and explicitly tells us that we are unrechable, the server shows the reason of that and how to solve it.

    The check may be disabled with the `--skip_reachability_check` option (though I can't imagine cases where someone needs to use it).

2. Added `--port` and `--public_ip` as **simplified convenience options** for users not familiar with `--host_maddrs` and `--announce_maddrs`.
2 years ago
Alexander Borzunov 701ec7e53e
Clean up disk space (#152) 2 years ago
Alexander Borzunov 66f1799d32
Set default --step_timeout to 5 min (#133) 2 years ago
Alexander Borzunov 9dbf5e2e6f
Set dht.num_workers = n_layer, update_period = 150, expiration = 300 (#125) 2 years ago
Alexander Borzunov f72c220404
Suppress quantization warning and fix dtype defaults in compute benchmark (#117) 2 years ago
Alexander Borzunov 643a054170
Make server use smart defaults (#115)
Summary:

```python
parser.add_argument('--attn_cache_size', type=str, default=None,
                    help='The size of GPU memory allocated for storing past attention keys/values between inference steps. '
                         'Examples: 500MB, 1.2GB, 1073741824 (bytes). Note that 1KB != 1KiB here. '
                         'Default: 0.5GiB * num_blocks * hidden_size / 14336. '
                         'The latter is the hidden size of the bigscience/bloom-petals model.')

parser.add_argument('--request_timeout', type=float, required=False, default=3 * 60,
                    help='Timeout (in seconds) for the whole rpc_forward/rpc_backward/rpc_forward_stream/rpc_backward_stream request')
parser.add_argument('--session_timeout', type=float, required=False, default=30 * 60,
                    help='Timeout (in seconds) for the whole inference session')
parser.add_argument('--step_timeout', type=float, required=False, default=60,
                    help="Timeout (in seconds) for waiting the next step's inputs inside an inference session")

parser.add_argument('--load_in_8bit', type=bool, default=None,
                    help="Convert the loaded model into mixed-8bit quantized model. Default: True if GPU is available")
```

Co-authored-by: justheuristic <justheuristic@gmail.com>
2 years ago
Alexander Borzunov 7bd5916744
Make Petals a pip-installable package (attempt 2) (#102)
1. Petals can be now installed using `pip install git+https://github.com/bigscience-workshop/petals`
    - In case if you already cloned the repo, you can do `pip install .` or `pip install .[dev]`
2. Moved `src` => `src/petals`
    - Replaced `from src.smth import smth` with `from petals.smth import smth`
3. Moved `cli` => `src/petals/cli`
    - Replaced `python -m cli.run_smth` with `python -m petals.cli.run_smth` (all utilities are now available right after pip installation)
4. Moved the `requirements*.txt` contents to `setup.cfg` (`requirements.txt` for packages is not supported well by modern packaging utils)
5. Increased the package version from `0.2` to `1.0alpha1`
2 years ago