You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
petals/src/petals/client
Alexander Borzunov 351e96bc46
Penalize servers that use relays during rebalancing (#428)
Servers accessible only via relays may introduce issues if they are the only type of servers holding certain blocks. Specifically, a connection to such servers may be unstable or opened after a certain delay.

This PR changes their self-reported throughput, so that the rebalancing algorithm prefers to put directly available servers for hosting each block.
10 months ago
..
routing Penalize servers that use relays during rebalancing (#428) 10 months ago
__init__.py Add LLaMA support (#323) 11 months ago
from_pretrained.py Add LLaMA support (#323) 11 months ago
inference_session.py Add connect_timeout (#423) 10 months ago
lm_head.py Fix llama's lm_head.weight.requires_grad (#330) 11 months ago
ptune.py Fix llama's lm_head.weight.requires_grad (#330) 11 months ago
remote_forward_backward.py Add connect_timeout (#423) 10 months ago
remote_generation.py Raise error for unexpected .generate() kwargs (#315) 1 year ago
remote_sequential.py Support peft LoRA adapters (#335) 11 months ago
sequential_autograd.py Add connect_timeout (#423) 10 months ago