petals

Commit Graph

Author	SHA1	Message	Date
Alexander Borzunov	0a313bf6c5	Update hivemind to 1.1.8, enable efficient bfloat16 encoding (#311 ) This PR: 1. Updates hivemind to 1.1.8 (includes https://github.com/learning-at-home/hivemind/pull/565) 2. Enables efficient bfloat16 serialization by default (`USE_LEGACY_BFLOAT16 = False`) 3. Removes logging code that was included to hivemind in https://github.com/learning-at-home/hivemind/pull/542	1 year ago
Alexander Borzunov	454c193863	Fix OOMs happening in case of accelerate >= 0.16.0 (#310 ) - After #285, `load_pretrained_block()` uses `accelerate.utils.set_module_tensor_to_device()` - In accelerate>=0.16.0, it saves the tensor in the dtype previously used by the model instead of dtype of the weights (https://github.com/huggingface/accelerate/pull/920) - Because of that, blocks and attention caches used float32, which caused OOMs - This PR makes `load_pretrained_block()` respect `torch_dtype` (default: `"auto"`, which means reading `torch_dtype` from `config.json`)	1 year ago
Alexander Borzunov	98be9ffe4c	Relax the rest of Hugging Face dependencies (#305 )	1 year ago
Alexander Borzunov	35662b4a16	Require bitsandbytes == 0.38.0.post2, hivemind == 1.1.7 (#302 ) In particular, this PR fixes 8-bit support on nvidia16 GPUs (such as 1660) by including https://github.com/TimDettmers/bitsandbytes/pull/292. This support was requested multiple times on Discord.	1 year ago
Alexander Borzunov	2116df08bc	Fix deps, enable 8-bit by default for TP (#298 ) This PR fixes issues of #290: - hivemind bfloat16 codec crashed on dummy tensors (with 0 elements), see https://github.com/learning-at-home/hivemind/pull/560 (this PR makes Petals depend on the latest hivemind version from the repo, it's temporary) - transformers version check mismatched with the version allowed in `setup.cfg` Also: - This PR enables 8-bit by default for TP. Even though TP in 8-bit may be slower, we currently prefer to host more blocks to increase the network's stability.	1 year ago
justheuristic	987f4d2b2f	Update bitsandbytes, hivemind, transformers (#290 ) - new bitsandbytes supports newer and older GPUs - new hivemind supports a better bfloat16 codec Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>	1 year ago
Alexander Borzunov	a7d3d02194	Fix invalid author email in setup.cfg (#287 )	1 year ago
Alexander Borzunov	6ba63c6cc8	Fix output shape when resuming generation (#211 ) Before this PR, `model.generate()` returned one excess token when resuming generation with an existing (the last token of the previous session, `session.last_token_id`). This is an unexpected behavior not convenient for the downstream apps, so this PR changes it until it's too late.	1 year ago
Alexander Borzunov	6b12b0d050	Report server version and dht.client_mode in rpc_info(), check for updates on startup (#209 ) This PR: 1. Shows the current Petals version and checks for updates on startup. 2. Reports the current version and DHT mode in `rpc_info()`, so it can be shown on http://health.petals.ml or used on clients for efficient routing.	1 year ago
Alexander Borzunov	82c9f93ce6	Bump version to 1.1.0 (#190 )	1 year ago
Egiazarian Vage	93bed7da5a	Support libp2p relays for NAT traversal (#186 ) - Added relay options to servers - Enabled relay options by default - Changed hivemind version to 1.1.5 - Moved reachability check to be performed after blocks are loaded Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>	1 year ago
Alexander Borzunov	0f6464103d	Remove protobuf from requirements (#182 ) A correct protobuf version should be already installed by hivemind. This also resolves version conflict on Colab, where protobuf versions required by Petals were different from the ones required by pre-installed tensorflow and tensorboard packages. Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>	1 year ago
Alexander Borzunov	55698381d0	Disable chunked_forward() on AVX512 CPUs (#179 )	1 year ago
justheuristic	ae9e71fe8e	Add local tensor-parallel fwd/bwd (#143 ) This pull request adds an option to run Petals server on multiple local GPUs. It uses https://github.com/BlackSamorez/tensor_parallel - 8bit approximation error same as in main (mean~=2% q0.9~=5%) - TP=1, 2, 3 (see screenshots above) - forward, grad w.r.t. input and inference exact match with main with TP=1 - `>=`80% GPU utilization with 3x 1080ti, batch = 8 tokens - throughput measured with and without TP - TP on 1080Tis has near-linear speedup comparable to the benchmarks (see first message) Co-authored-by: Iaroslav Lisniak <yalisnyak@nes.ru> Co-authored-by: Andrei Panferov <andrei@blacksamorez.ru> Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>	1 year ago
Aleksandr Borzunov	ff8ade8d3b	Bump version to 1.0.0	1 year ago
justheuristic	91898c3c90	Switch to speedtest-cli (#157 ) This pullrequest removes custom speed_test code in favour of speedtest-cli module. This is necessary to ensure that random warnings / print-outs do not mess with our outputs. Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>	2 years ago
justheuristic	b04982c1a2	Bump transformers to 4.25.1 (#151 ) - latest accelerate, transformers, huggingface_hub - rearrange attention caches to support https://github.com/huggingface/transformers/pull/18344 - remove unused code - fix edge case where session crashes when receiving seq length 0 - assert transformer version when importing WrappedBloomBlock Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>	2 years ago
Alexander Borzunov	b8e1c1b7f5	Revert to hivemind==1.1.3 for stability (#129 )	2 years ago
Alexander Borzunov	893987ebf8	Require hivemind==1.1.4 with p2pd v0.3.13 (#121 )	2 years ago
Alexander Borzunov	7bd5916744	Make Petals a pip-installable package (attempt 2) (#102 ) 1. Petals can be now installed using `pip install git+https://github.com/bigscience-workshop/petals` - In case if you already cloned the repo, you can do `pip install .` or `pip install .[dev]` 2. Moved `src` => `src/petals` - Replaced `from src.smth import smth` with `from petals.smth import smth` 3. Moved `cli` => `src/petals/cli` - Replaced `python -m cli.run_smth` with `python -m petals.cli.run_smth` (all utilities are now available right after pip installation) 4. Moved the `requirements*.txt` contents to `setup.cfg` (`requirements.txt` for packages is not supported well by modern packaging utils) 5. Increased the package version from `0.2` to `1.0alpha1`	2 years ago

20 Commits (675bacb592bac7145d38ded2ea746da2b9b6c391)