Commit Graph

12 Commits (main)

Author SHA1 Message Date
Alexander Borzunov 5ce4f1a159
Store (start_block, end_block) in each DHT record for reliability (#510)
This PR fixes gaps in the DHT server info caused by unavailable DHT keys. Now, one DHT key is enough to get info about all blocks hosted by a server - so we'll see info until all keys are unavailable.

Also, this PR refactors `petals.client.routing` and `petals.server.block_selection` modules to use the common `compute_spans()` function (defined in `petals.utils.dht`) and `RemoteSpanInfo` class (defined in `petals.data_structures`).
8 months ago
Alexander Borzunov 6ef6bf5fa2
Create model index in DHT (#491)
This PR creates an index of models hosted in the swarm - it is useful to know which custom models users run and display them at https://health.petals.dev as "not officially supported" models.
8 months ago
Alexander Borzunov 063e94b4c8
Move SequenceManagerConfig -> ClientConfig, petals.dht_utils -> petals.utils.dht (#463) 9 months ago
Alexander Borzunov 057a2fb5de
Support Llama 2 (#379) 10 months ago
Alexander Borzunov 11f0d992d7
Report inference, forward, and network RPS separately (#358)
Inference RPS may be very different from forward RPS. E.g., currently bnb uses a completely different algorithm for NF4 inference. We report detailed RPS info that can be then used for shortest-path routing for inference.
10 months ago
Alexander Borzunov 81c4a45ca2
Make a server ping next servers (#356)
This PR makes a server ping potential next servers in a chain and report the RTTs to DHT. This will be used for shortest-path routing.
10 months ago
Alexander Borzunov 2c8959e713
Share more info about a server in DHT (#355) 10 months ago
Artem Chumachenko b9f0a5467f
Support peft LoRA adapters (#335)
Implement an option to deploy PEFT adapters to a server. Clients can set active_adapter=... to use these adapters.

---------

Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>
Co-authored-by: justheuristic <justheuristic@gmail.com>
10 months ago
Alexander Borzunov 6137b1b4b0
Replace .make_sequence(..., mode="random") with mode="max_throughput" (#313)
We need to sample the next server using its throughput as the weight to actually achieve max throughput for fine-tuning.

As an example, imagine a situation where we have 3 servers with throughputs [1000, 500, 1] hosting the same blocks, then compare the uniform and weighted sampling strategies.
1 year ago
justheuristic c4938bc23e
Merge inference pools into one to increase inference speed (#225)
It turns out using a separate pool for each block has led to significant slowdown, see #224 for details.
1 year ago
justheuristic ae9e71fe8e
Add local tensor-parallel fwd/bwd (#143)
This pull request adds an option to run Petals server on multiple local GPUs. It uses https://github.com/BlackSamorez/tensor_parallel

- 8bit approximation error same as in main (mean~=2% q0.9~=5%)
    - TP=1, 2, 3 (see screenshots above)
- forward, grad w.r.t. input and inference exact match with main with TP=1
- `>=`80% GPU utilization with 3x 1080ti, batch = 8 tokens
- throughput measured with and without TP
- TP on 1080Tis has near-linear speedup comparable to the benchmarks (see first message)


Co-authored-by: Iaroslav Lisniak <yalisnyak@nes.ru>
Co-authored-by: Andrei Panferov <andrei@blacksamorez.ru>
Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>
1 year ago
Alexander Borzunov 7bd5916744
Make Petals a pip-installable package (attempt 2) (#102)
1. Petals can be now installed using `pip install git+https://github.com/bigscience-workshop/petals`
    - In case if you already cloned the repo, you can do `pip install .` or `pip install .[dev]`
2. Moved `src` => `src/petals`
    - Replaced `from src.smth import smth` with `from petals.smth import smth`
3. Moved `cli` => `src/petals/cli`
    - Replaced `python -m cli.run_smth` with `python -m petals.cli.run_smth` (all utilities are now available right after pip installation)
4. Moved the `requirements*.txt` contents to `setup.cfg` (`requirements.txt` for packages is not supported well by modern packaging utils)
5. Increased the package version from `0.2` to `1.0alpha1`
1 year ago