petals/tests
Alexander Borzunov 056f22515a
Prioritize short inference, unmerge pools for long inference (#458)
Right now, long inference requests may occupy Runtime for a few seconds without giving it away to process short (most latency-sensitive requests). This PR fixes it by disallowing the merged pool for long requests and prioritizing the short ones.
2023-08-11 09:24:33 +04:00
..
bootstrap.id Test Llama, rebalancing, throughput eval, and all CLI scripts (#452) 2023-08-08 19:10:27 +04:00
conftest.py Fix logging: do not duplicate lines, enable colors in Colab (#156) 2022-12-15 09:12:18 +04:00
server2.id Test Llama, rebalancing, throughput eval, and all CLI scripts (#452) 2023-08-08 19:10:27 +04:00
test_aux_functions.py Test Llama, rebalancing, throughput eval, and all CLI scripts (#452) 2023-08-08 19:10:27 +04:00
test_block_exact_match.py Prioritize short inference, unmerge pools for long inference (#458) 2023-08-11 09:24:33 +04:00
test_chained_calls.py Test Llama, rebalancing, throughput eval, and all CLI scripts (#452) 2023-08-08 19:10:27 +04:00
test_dtype.py Add LLaMA support (#323) 2023-06-23 15:46:10 +04:00
test_full_model.py Test Llama, rebalancing, throughput eval, and all CLI scripts (#452) 2023-08-08 19:10:27 +04:00
test_peft.py Support peft LoRA adapters (#335) 2023-07-12 15:22:28 +03:00
test_priority_pool.py Fix issues related to petals as a module (#159) 2022-12-16 09:09:06 +04:00
test_remote_sequential.py Prioritize short inference, unmerge pools for long inference (#458) 2023-08-11 09:24:33 +04:00
test_sequence_manager.py Test Llama, rebalancing, throughput eval, and all CLI scripts (#452) 2023-08-08 19:10:27 +04:00
test_server_stats.py Test Llama, rebalancing, throughput eval, and all CLI scripts (#452) 2023-08-08 19:10:27 +04:00
test_tensor_parallel.py Test Llama, rebalancing, throughput eval, and all CLI scripts (#452) 2023-08-08 19:10:27 +04:00
test_utils.py Support peft LoRA adapters (#335) 2023-07-12 15:22:28 +03:00