You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
petals/src/petals/client/routing
justheuristic 012f840f7e
Use length-weighted sampling in routing for inference (#204)
This pull-request implements a simple (1) greedy (2) latency-agnostic routing optimization that should speed up both our use cases.

Why this exists: our effort to merge full routing (ping-aware, throughut-aware, dijkstra) is in a sorry state between several branches; merging it into main would take many days.

Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>
1 year ago
..
__init__.py Optimize RemoteSequenceManager (#106) 2 years ago
sequence_info.py Fix logging: do not duplicate lines, enable colors in Colab (#156) 2 years ago
sequence_manager.py Use length-weighted sampling in routing for inference (#204) 1 year ago
spending_policy.py Optimize RemoteSequenceManager (#106) 2 years ago