mirror of
https://github.com/bigscience-workshop/petals
synced 2024-10-31 09:20:41 +00:00
11f0d992d7
Inference RPS may be very different from forward RPS. E.g., currently bnb uses a completely different algorithm for NF4 inference. We report detailed RPS info that can be then used for shortest-path routing for inference. |
||
---|---|---|
.. | ||
scripts | ||
conftest.py | ||
test_aux_functions.py | ||
test_block_exact_match.py | ||
test_chained_calls.py | ||
test_dtype.py | ||
test_full_model.py | ||
test_peft.py | ||
test_priority_pool.py | ||
test_remote_sequential.py | ||
test_sequence_manager.py | ||
test_server_stats.py | ||
test_tensor_parallel.py | ||
test_utils.py | ||
test.id |