You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ae9e71fe8e
This pull request adds an option to run Petals server on multiple local GPUs. It uses https://github.com/BlackSamorez/tensor_parallel - 8bit approximation error same as in main (mean~=2% q0.9~=5%) - TP=1, 2, 3 (see screenshots above) - forward, grad w.r.t. input and inference exact match with main with TP=1 - `>=`80% GPU utilization with 3x 1080ti, batch = 8 tokens - throughput measured with and without TP - TP on 1080Tis has near-linear speedup comparable to the benchmarks (see first message) Co-authored-by: Iaroslav Lisniak <yalisnyak@nes.ru> Co-authored-by: Andrei Panferov <andrei@blacksamorez.ru> Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com> |
1 year ago | |
---|---|---|
.. | ||
scripts | 2 years ago | |
conftest.py | 2 years ago | |
test.id | 2 years ago | |
test_aux_functions.py | 1 year ago | |
test_block_exact_match.py | 1 year ago | |
test_chained_calls.py | 2 years ago | |
test_full_model.py | 2 years ago | |
test_linear8bitlt.py | 2 years ago | |
test_priority_pool.py | 2 years ago | |
test_remote_sequential.py | 1 year ago | |
test_sequence_manager.py | 2 years ago | |
test_tensor_parallel.py | 1 year ago | |
test_utils.py | 2 years ago |