petals

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

History

Alexander Borzunov 056f22515a Prioritize short inference, unmerge pools for long inference (#458 ) Right now, long inference requests may occupy Runtime for a few seconds without giving it away to process short (most latency-sensitive requests). This PR fixes it by disallowing the merged pool for long requests and prioritizing the short ones.		10 months ago
..
__init__.py	Make Petals a pip-installable package (attempt 2) (#102 )	2 years ago
backend.py	Split long sequences into chunks (#403 )	10 months ago
block_functions.py	Prioritize short inference, unmerge pools for long inference (#458 )	10 months ago
block_selection.py	Use get_logger(__name__) instead of get_logger(__file__) (#265 )	1 year ago
block_utils.py	Override float32 in config to bfloat16 (#431 )	10 months ago
from_pretrained.py	Fix routing through relay, default network RPS, --token, logging, readme (#399 )	10 months ago
handler.py	Prioritize short inference, unmerge pools for long inference (#458 )	10 months ago
memory_cache.py	Fix deadlocks in MemoryCache (#396 )	10 months ago
reachability.py	Update to petals.dev (#390 )	10 months ago
server.py	Prioritize short inference, unmerge pools for long inference (#458 )	10 months ago
task_pool.py	Use get_logger(__name__) instead of get_logger(__file__) (#265 )	1 year ago
task_prioritizer.py	Prioritize short inference, unmerge pools for long inference (#458 )	10 months ago
throughput.py	Fix missing torch.cuda.synchronize for computing throughput (#456 )	10 months ago