petals

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

History

Alexander Borzunov 056f22515a Prioritize short inference, unmerge pools for long inference (#458 ) Right now, long inference requests may occupy Runtime for a few seconds without giving it away to process short (most latency-sensitive requests). This PR fixes it by disallowing the merged pool for long requests and prioritizing the short ones.		10 months ago
..
bootstrap.id	Test Llama, rebalancing, throughput eval, and all CLI scripts (#452 )	10 months ago
conftest.py	Fix logging: do not duplicate lines, enable colors in Colab (#156 )	1 year ago
server2.id	Test Llama, rebalancing, throughput eval, and all CLI scripts (#452 )	10 months ago
test_aux_functions.py	Test Llama, rebalancing, throughput eval, and all CLI scripts (#452 )	10 months ago
test_block_exact_match.py	Prioritize short inference, unmerge pools for long inference (#458 )	10 months ago
test_chained_calls.py	Test Llama, rebalancing, throughput eval, and all CLI scripts (#452 )	10 months ago
test_dtype.py	Add LLaMA support (#323 )	11 months ago
test_full_model.py	Test Llama, rebalancing, throughput eval, and all CLI scripts (#452 )	10 months ago
test_peft.py	Support peft LoRA adapters (#335 )	11 months ago
test_priority_pool.py	Fix issues related to `petals` as a module (#159 )	1 year ago
test_remote_sequential.py	Prioritize short inference, unmerge pools for long inference (#458 )	10 months ago
test_sequence_manager.py	Test Llama, rebalancing, throughput eval, and all CLI scripts (#452 )	10 months ago
test_server_stats.py	Test Llama, rebalancing, throughput eval, and all CLI scripts (#452 )	10 months ago
test_tensor_parallel.py	Test Llama, rebalancing, throughput eval, and all CLI scripts (#452 )	10 months ago
test_utils.py	Support peft LoRA adapters (#335 )	11 months ago