petals

Commit Graph

Author	SHA1	Message	Date
Alexander Borzunov	1ba721d51e	Merge branch 'main' into amd-gpus	9 months ago
Alexander Borzunov	2a150770a4	Prefer longer servers for fine-tuning, exclude unreachable (#448 ) We choose longer servers to minimize the number of hops but leave some randomization to distribute the load. We also exclude servers known to be unreachable.	9 months ago
Alexander Borzunov	00d48dcbe1	Override float32 in config to bfloat16 (#431 )	9 months ago
justheuristic	ac9b546706	[Refactor] extract block forward, backward and inference into a separate file (#435 ) This PR does not change any functionality. It merely moves stuff around. List of changes: handler.py/_rpc_forward became block_methods/rpc_forward handler.py/_rpc_backward became block_methods/rpc_backward the math bits of rpc_inference were extracted into block_methods/iterate_rpc_inference --------- Co-authored-by: Your Name <you@example.com> Co-authored-by: artek0chumak <artek.chumak@gmail.com> Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>	9 months ago
Alexander Borzunov	593d980ad8	Use bitsandbytes 0.41.1 (#442 )	9 months ago
Alexander Borzunov	32fbab5192	Remove deprecated comment in fine-tuning notebook (#443 )	9 months ago
Aleksandr Borzunov	fe3b8d6e66	Append .amd to reported version	9 months ago
Alexander Borzunov	b58141ef66	Remove distracting links from readme (#441 )	9 months ago
Alexander Borzunov	679397df0c	Update Discord links from channels to forums (#440 ) As our Discord community growths, we found it difficult to look for open and resolved issues in #running-a-client and #running-a-server channels, as well as navigate through interleaving conversations happening there. That's why we recreated these channels as Discord forums, where different discussions are separated into different posts.	9 months ago
Vadim Peretokin	d0b5af34cd	Fix typo and make blocks message more informative (#437 ) The message really doesn't tell me much as a user, since I never touched update_period to begin with: ``` Aug 06 09:43:07.287 [WARN] [petals.server.server.run:701] Declaring blocs to DHT takes more than --update_period, consider increasing it ``` Made it better and more informative.	9 months ago
Aleksandr Borzunov	b6e31c6d0f	Fix "import peft" in tests	9 months ago
Aleksandr Borzunov	d8298faa00	Remove --adapters from tests	9 months ago
Aleksandr Borzunov	753f8df594	Don't use NF4 default	9 months ago
Aleksandr Borzunov	203a1b3a24	Use bitsandbytes-rocm	9 months ago
Aleksandr Borzunov	6b38bc89ef	Remove peft dependency for AMD GPUs	9 months ago
Alexander Borzunov	a1f7791d5e	Fix petals.utils.ping for servers with client-mode DHT (#430 ) Fix #429.	10 months ago
Alexander Borzunov	351e96bc46	Penalize servers that use relays during rebalancing (#428 ) Servers accessible only via relays may introduce issues if they are the only type of servers holding certain blocks. Specifically, a connection to such servers may be unstable or opened after a certain delay. This PR changes their self-reported throughput, so that the rebalancing algorithm prefers to put directly available servers for hosting each block.	10 months ago
Alexander Borzunov	6a1b8a6a90	Add Stable Beluga 2 to readme (#424 )	10 months ago
Alexander Borzunov	44fefa5e54	Add connect_timeout (#423 )	10 months ago
Alexander Borzunov	cdc0f70653	Add Discord badge and more Discord links to readme (#422 )	10 months ago
Guocheng	8072cd9d1b	Fix stale link (#418 )	10 months ago
Alexander Borzunov	f3fafd14a4	Bump version to 2.0.1 (#411 )	10 months ago
Alexander Borzunov	fd19c21859	Update --update_period and --expiration defaults (#410 )	10 months ago
Alexander Borzunov	ffb20b585c	Update commands for hosting Llama 2 in readme (#409 )	10 months ago
Alexander Borzunov	48c6b6d963	Update README.md (#407 )	10 months ago
Alexander Borzunov	c153cba1fa	Add Llama 2, WSL instructions to readme (#406 )	10 months ago
justheuristic	5af04524dd	Split long sequences into chunks (#403 ) This PR is designed to avoid OOMs when processing long sequences that happen due to the huge attention logits matrices. Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>	10 months ago
Alexander Borzunov	30b94ef18b	If speedtest fails, assume network speed of 100 Mbit/s (#404 ) The value is chosen as some safe value below average at https://health.petals.dev/ Note that if a server uses relays, the effective throughput will be further divided by 2 (see #399).	10 months ago
Alexander Borzunov	8666653cf5	Fix routing through relay, default network RPS, --token, logging, readme (#399 ) * Hide GeneratorExit in _iterate_inference_steps() * Update README.md about `--public_name` * Use .from_pretrained(..., use_auth_token=token) instead of token=token until it's fully supported across HF libs * Use default network speed 25 Mbit/s * Apply relay penalty in max-throughput routing * Replace RPS with "tokens/sec per block" in logs * Increase default expiration	10 months ago
Alexander Borzunov	eb0664b993	Support Python 3.11 (#393 )	10 months ago
Alexander Borzunov	6e4ebb94d2	Fix deadlocks in MemoryCache (#396 ) - Fix deadlocks in MemoryCache - Set default --alloc_timeout to 1 until the MemoryCache update	10 months ago
Alexander Borzunov	b6b3ae964f	Fix --attn_cache_tokens default (#392 )	10 months ago
Alexander Borzunov	d49d9ad0cf	Bump version to 2.0.0.post3 (#391 )	10 months ago
justheuristic	e51e84631d	Update to petals.dev (#390 ) Since `petals.ml` DNS record is still unavailable, we're switching everything to https://petals.dev Co-authored-by: Aleksandr Borzunov <hxrussia@gmail.com>	10 months ago
Aleksandr Borzunov	ddcda02b06	Hardcode IPs until DNS issues get resolved	10 months ago
Alexander Borzunov	b1ff8bdd6c	Bump version to 2.0.0.post1 (#384 )	10 months ago
Alexander Borzunov	e9a20e7e53	Require accelerate>=0.20.3 as transformers do (#383 )	10 months ago
Alexander Borzunov	057a2fb5de	Support Llama 2 (#379 )	10 months ago
Alexander Borzunov	3218534745	Fix --token arg (#378 )	10 months ago
justheuristic	398a384075	Inherit bitsandbytes compute dtype correctly (override peft quirk) (#377 )	10 months ago
justheuristic	5a8de2f1f8	Fix handler memory leak, get rid of mp.Manager (#373 ) This PR removes the memory leak from somewhere within handler.py that has something to do with mp.SyncManager.	10 months ago
Alexander Borzunov	895327a0ae	Fix readme code example, require Python < 3.11 until supported (#374 ) * Fix readme code example * Require Python < 3.11 until it's supported	10 months ago
Alexander Borzunov	c735dd7ba3	Update transformers to 4.31.0 and peft to 0.4.0 (#371 )	10 months ago
justheuristic	1ab35c2826	Typo in inference_session.py	10 months ago
Alexander Borzunov	a6fdfc0556	Fix AssertionError on rebalancing (#370 )	10 months ago
Alexander Borzunov	f97582fb5f	Require transformers < 4.31.0 until we're compatible (#369 )	10 months ago
Alexander Borzunov	3b300c32e4	Update readme to show new models (#365 )	10 months ago
Alexander Borzunov	62d9ed5ce7	Implement shortest-path routing for inference (#362 ) This PR: 1. Adds shortest path routing for inference. We build a graph with client-server and server-server latencies and compute costs, as well as empirically measured overheads. For client-server latencies, we ping possible first and last servers in a sequence in `SequenceManager.update()`. We penalize servers who may not have enough cache for our request. This uses info added to DHT in #355, #356, #358. 2. Makes a server ping neighboring servers in addition to next ones. This is to get an opportunity to change the server even before we use all its blocks (e.g., because a neighboring server is faster). This feature is not enabled though, since it increases graph size for N servers to O(N^2) - but we may enable it if needed. 3. Fixes a `SequenceManager` bug with the first `update()`. Previously, this update was likely to produce incorrect information and cause to `MissingBlocksErrors` until the next update happens.	10 months ago
Ikko Eltociear Ashimine	fd30f7ce10	Fix typo in generation_algorithms.py (#364 )	10 months ago
Alexander Borzunov	11f0d992d7	Report inference, forward, and network RPS separately (#358 ) Inference RPS may be very different from forward RPS. E.g., currently bnb uses a completely different algorithm for NF4 inference. We report detailed RPS info that can be then used for shortest-path routing for inference.	10 months ago

1 2 3 4 5 ...

463 Commits (amd-gpus) All Branches Search

463 Commits (amd-gpus)

All Branches