This PR makes both clients and servers work on macOS. Specifically, it:
- Follows https://github.com/learning-at-home/hivemind/pull/586 to run a macOS-compatible `p2pd` binary (both x86-64 and ARM64 are supported)
- Fixes forking issues and tests on macOS, Python 3.10+
- Introduces basic support for serving model blocks on Apple M1/M2 GPUs (torch.mps)
- Increases max number of open files by default (it's not enough on Linux and is really small on macOS)
As our Discord community growths, we found it difficult to look for open and resolved issues in **#running-a-client** and **#running-a-server** channels, as well as navigate through interleaving conversations happening there. That's why we recreated these channels as Discord forums, where different discussions are separated into different posts.
* Hide GeneratorExit in _iterate_inference_steps()
* Update README.md about `--public_name`
* Use .from_pretrained(..., use_auth_token=token) instead of token=token
until it's fully supported across HF libs
* Use default network speed 25 Mbit/s
* Apply relay penalty in max-throughput routing
* Replace RPS with "tokens/sec per block" in logs
* Increase default expiration
Since `petals.ml` DNS record is still unavailable, we're switching everything to https://petals.dev
Co-authored-by: Aleksandr Borzunov <hxrussia@gmail.com>
This PR:
1. **Extracts `SequenceManagerConfig` and `SequenceManagerState` subclasses.**
The config is provided by caller and never changed from inside `RemoteSequenceManager`. The state is a part of the `RemoteSequenceManager`'s state shared between the main manager and its slices. We fix some slicing bugs along the way.
2. **Removes `dht_prefix` and `p2p` arguments, makes `dht` argument optional.**
`dht_prefix` can always be overridden using `config.dht_prefix`. `p2p` actually needed only under the hood of `RemoteSequenceManager`, so it can extract it by itself without exposing this low-level class to callers. If strictly necessary, a caller can provide `p2p` as a part of `SequenceManagerState`. `dht` is also needed only by `RemoteSequenceManager`, so we can make it optional in the parent classes and create it automatically when it's not provided.
3. **Simplifies retry logic.**
Previously, we could have "nested" retry loops: one in `._update()`, another in inference/forward/backward steps. The loop in `._update()` could introduce issues to concurrent inference/forward/backward calls, since it blocks the entire class if its delay period becomes too high. Now this logic is simplified: `._update()` performs only one attempt to fetch the DHT info, any retries are triggered by the inference/forward/backward steps.
4. **Removes deprecated `RemoteTransformerBlock`.**
`RemoteTransformerBlock` was deprecated a long time ago, before Petals 1.0.0. Its removal is long due.
5. **Removes `dht_utils.get_remote_module()`, `dht_utils.get_remote_sequence()`.**
This functions duplicate the functionality of the `RemoteSequential` constructor.
6. (minor) **Removes `RemoteSequential.is_subsequence` flag.**
This flag worked incorrectly and was never used. I am removing it for the sake of simplicity.
1. Added `from petals.client import *` to `petals/__init__.py`, so you can write just that:
```python
from petals import DistributedBloomForCausalLM
```
I didn't do the same with server, since its classes are supposed to by used by `petals.cli.run_server`, not end-users. Though it's still possible to do `from petals.server.smth import smth` if necessary.
2. Fixed one more logging issue: log lines from hivemind were shown twice due to a bug in #156.
3. Removed unused `runtime.py`, since the server actually uses `hivemind.moe.Runtime`, and `runtime.py` has no significant changes comparing to it.