Commit Graph

495 Commits

Author SHA1 Message Date
Alexander Borzunov
3c523ab0d2
Fix TP crashing when hypo_ids are used (#249) 2023-02-02 22:04:19 +03:00
justheuristic
b8a6788490
Fix examples/sst, add cls_model embeddings (#248) 2023-02-02 00:32:27 +06:00
justheuristic
8766a14d28
Minor changes to examples/prompt-tuning notebooks (#247)
Minor code changes required to run the notebook in a clean python environment
2023-02-01 14:10:45 +03:00
Alexander Borzunov
5367523df8
Fix typo in prompt-tuning-sst2.ipynb (#245) 2023-01-31 19:06:51 +06:00
Alexander Borzunov
b03efb1ef5
Bump version to 1.1.2 (#244) 2023-01-31 02:17:38 +06:00
Alexander Borzunov
5d7395e1b5
Prompt-tuning notebooks: suggest to use a smaller model for faster prototyping (#234) 2023-01-24 10:01:31 +04:00
Artem Chumachenko
d4c687daca
Fix dtype error in fine-tuning notebooks (#231) 2023-01-23 05:09:14 +04:00
Muhtasham Oblokulov
0ebf6de117
Add citation to readme (#219)
Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>
2023-01-21 07:05:41 +04:00
justheuristic
c4938bc23e
Merge inference pools into one to increase inference speed (#225)
It turns out using a separate pool for each block has led to significant slowdown, see #224 for details.
2023-01-19 19:38:21 +04:00
Shuchang Zhou
3189b395f0
Fix a typo in error message (#227)
By the code context, it can be inferred that do_sample==False when control reaches this point.
2023-01-19 18:38:43 +04:00
Alexander Borzunov
fa5ac6e3b4
Mention BLOOMZ in readme (#221) 2023-01-18 03:23:21 +04:00
Alexander Borzunov
e651d73f11
Add one more link to the "Getting started" tutorial (#218)
Some people miss the "Try now in Colab" link or don't understand that it leads to the comprehensive tutorial, so I added one more explicit link.
2023-01-16 04:35:06 +04:00
Alexander Borzunov
af3da5bb04
Choose --num_blocks automatically for all models (#217) 2023-01-16 01:53:09 +04:00
Alexander Borzunov
cea83d3356
Bump version to 1.1.1 (#214) 2023-01-14 00:34:46 +04:00
Alexander Borzunov
702bb5a2c2
CI: Update deprecated actions, don't measure network RPS (#215)
* CI: Switch to actions/cache@v3 (v2 is deprecated)
* Don't run measure_network_rps() in tests since it doesn't work well in
CI
2023-01-13 20:16:31 +04:00
Alexander Borzunov
825f5dbf2d
CI: Convert model only when convert_model.py or setup.cfg change (#213)
This reduces the test running time by 2 times, unless convert_model.py or setup.cfg are changed.
2023-01-13 19:53:57 +04:00
Alexander Borzunov
5ff250bee9
Improve errors in case of missing blocks, suggest to join your own server (#212) 2023-01-13 17:53:00 +04:00
Alexander Borzunov
6ba63c6cc8
Fix output shape when resuming generation (#211)
Before this PR, `model.generate()` returned one excess token when resuming generation with an existing (the last token of the previous session, `session.last_token_id`). This is an unexpected behavior not convenient for the downstream apps, so this PR changes it until it's too late.
2023-01-13 16:27:10 +04:00
Alexander Borzunov
cc5e5d32c0
Don't switch blocks if it makes swarm disjoint (#210)
Even if the swarm seems to have at least 2 servers for each block, turning off on one of the servers could break it. That's because once a server is turned off, others may move to a better position, creating a significant downtime on their way. This PR prohibits switching blocks if it would make the swarm disjoint along the way.
2023-01-13 08:45:53 +04:00
Alexander Borzunov
6b12b0d050
Report server version and dht.client_mode in rpc_info(), check for updates on startup (#209)
This PR:

1. Shows the current Petals version and checks for updates on startup.
2. Reports the current version and DHT mode in `rpc_info()`, so it can be shown on http://health.petals.ml or used on clients for efficient routing.
2023-01-13 07:46:10 +04:00
justheuristic
771ca590e7
Add service checking direct reachability from peers (#195)
Servers joining from behind NATs/firewalls usually take several minutes to join a libp2p relay before they become accessible from the outside Internet. Moreover, requests to such servers are slower and more likely to fail (e.g., if the server switches a relay at the moment). If such servers host certain DHT keys, the swarm may occasionally lose read/write access to these keys, which results in:

- Clients being unable to find any servers hosting a certain block.
- All servers starting rebalancing to the same place to close the alleged "gap" in the swarm.

This PRs modifies servers so that DHT keys are only hosted on **directly reachable** servers (the ones who aren't behind NAT/firewall). This way, DHT becomes more stable and works faster. Of course, trhe servers behind NATs/firewalls still accept requests for running inference/forward/backward for blocks they hold (it's more acceptable for this kind of requests to be slower or fail).

Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>
2023-01-13 03:05:39 +04:00
justheuristic
5f58f00649
Return available cache size in rpc_info() (#191)
This PR makes servers return their free cache (in tokens * layers to make it compression-agnostic)

To be used when calling make_sequence(optimize="inference")
2023-01-12 06:49:41 +03:00
Alexander Borzunov
37373a66c3 Update Anaconda installation commands (#205) 2023-01-11 21:45:54 +00:00
justheuristic
012f840f7e
Use length-weighted sampling in routing for inference (#204)
This pull-request implements a simple (1) greedy (2) latency-agnostic routing optimization that should speed up both our use cases.

Why this exists: our effort to merge full routing (ping-aware, throughut-aware, dijkstra) is in a sorry state between several branches; merging it into main would take many days.

Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>
2023-01-11 23:26:09 +03:00
Alexander Borzunov
42d1bbb568
Fix --no_auto_relay help (#199) 2023-01-11 22:27:14 +04:00
justheuristic
c2cb6d19ae
Increase tolerances in test_tp_block (#196)
deflapify tests
2023-01-11 17:54:24 +03:00
Alexander Borzunov
b4f3224cda
Make client ignore blacklist if all servers holding a block are blacklisted (#197)
If all servers holding a certain block are blacklisted, we should display errors from them instead of raising `No peers holding blocks`.

Indeed, if the error is client-caused, the client should learn its reason from the latest error messages. In turn, if the error is server/network-caused and we only have a few servers, we'd better know the error instead of banning all the servers and making the user think that no servers are available.
2023-01-11 16:50:24 +04:00
Alexander Borzunov
127cf66bee
Ignore network RPS if we failed to measure it (#198) 2023-01-11 16:37:30 +04:00
Alexander Borzunov
487411e87e
Fix fine-tuning notebooks intros (#194)
The notebook intros were outdated and mentioned the 6B model, while the actual code already runs the 176B model. This led to confusion among our users in Discord.
2023-01-11 02:28:49 +04:00
Alexander Borzunov
82c9f93ce6
Bump version to 1.1.0 (#190) 2023-01-10 15:47:58 +04:00
Alexander Borzunov
a617ce3cfa
Fix psutil-related AccessDenied crash, disable --load_in_8bit by default in case of TP (#188)
* Don't count open fds since it leads to AccessDenied crashes on some machines
* Use --load_in_8bit=False by default in case of tensor parallelism
* Install petals from PyPI in fine-tuning tutorials
2023-01-10 13:04:52 +04:00
Egiazarian Vage
93bed7da5a
Support libp2p relays for NAT traversal (#186)
- Added relay options to servers
- Enabled relay options by default
- Changed hivemind version to 1.1.5
- Moved reachability check to be performed after blocks are loaded

Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>
2023-01-09 20:41:23 +04:00
Alexander Borzunov
16b69d6050
Fix GiBs in the "insufficient disk space" message (#187) 2023-01-09 09:54:44 +04:00
Alexander Borzunov
391c855208
Add readme subsections (#185) 2023-01-08 09:59:07 +04:00
Alexander Borzunov
f344c7801b
Add link to health.petals.ml to readme (#184) 2023-01-08 08:41:47 +04:00
Alexander Borzunov
27406a9377
Add more links to BLOOM to readme (#183) 2023-01-07 10:48:51 +04:00
Alexander Borzunov
0f6464103d
Remove protobuf from requirements (#182)
A correct protobuf version should be already installed by hivemind.

This also resolves version conflict on Colab, where protobuf versions required by Petals were different from the ones required by pre-installed tensorflow and tensorboard packages.

Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
2023-01-07 01:55:40 +04:00
Alexander Borzunov
712f5a330f
Remove backup bootstrap peer 2023-01-06 10:51:12 +04:00
justheuristic
d1fa5eb260
hotfix: add initial peer that did not crash :) (#181)
add hotfix initial peer (@borzunov's peers are down)
2023-01-06 04:57:49 +03:00
Alexander Borzunov
6dd9a938bd
Import bitsandbytes only if it's going to be used (#180) 2023-01-05 10:34:52 +04:00
Alexander Borzunov
e27706358c
Use slightly less memory in .generate() (#177) 2023-01-05 09:34:03 +04:00
Alexander Borzunov
55698381d0
Disable chunked_forward() on AVX512 CPUs (#179) 2023-01-04 23:28:16 +04:00
Alexander Borzunov
6948a0c5ee
Allow to disable chunked forward (#176) 2023-01-04 06:42:03 +04:00
Alexander Borzunov
356e099c3d
Make Docker command more visible (#175) 2023-01-04 02:18:45 +04:00
justheuristic
ae9e71fe8e
Add local tensor-parallel fwd/bwd (#143)
This pull request adds an option to run Petals server on multiple local GPUs. It uses https://github.com/BlackSamorez/tensor_parallel

- 8bit approximation error same as in main (mean~=2% q0.9~=5%)
    - TP=1, 2, 3 (see screenshots above)
- forward, grad w.r.t. input and inference exact match with main with TP=1
- `>=`80% GPU utilization with 3x 1080ti, batch = 8 tokens
- throughput measured with and without TP
- TP on 1080Tis has near-linear speedup comparable to the benchmarks (see first message)


Co-authored-by: Iaroslav Lisniak <yalisnyak@nes.ru>
Co-authored-by: Andrei Panferov <andrei@blacksamorez.ru>
Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>
2023-01-03 18:35:51 +03:00
Alexander Borzunov
779959bc70
Add link to PyPI (#173) 2022-12-31 02:51:52 +04:00
Alexander Borzunov
cdc3b6a25a
Add PyPI badge, update instructions and links in readme (#172) 2022-12-31 02:22:40 +04:00
Aleksandr Borzunov
ff8ade8d3b Bump version to 1.0.0 2022-12-30 21:52:57 +00:00
justheuristic
4014442a0f
Fix instruction for developers (#170) 2022-12-30 23:42:07 +04:00
Alexander Borzunov
26e6120288
Fix code example in readme (#169)
Makes it closer to runnable code, except for imports and defining tokenizer & data loader.
2022-12-30 19:50:38 +04:00