Commit Graph

3 Commits (568f21dc3b8b32ee7d93fad9256fee4e71b9f268)

Author SHA1 Message Date
Artem Chumachenko 568f21dc3b
Add customizable input tensors (#445) 10 months ago
Alexander Borzunov 056f22515a
Prioritize short inference, unmerge pools for long inference (#458)
Right now, long inference requests may occupy Runtime for a few seconds without giving it away to process short (most latency-sensitive requests). This PR fixes it by disallowing the merged pool for long requests and prioritizing the short ones.
10 months ago
justheuristic ac9b546706
[Refactor] extract block forward, backward and inference into a separate file (#435)
This PR does not change any functionality. It merely moves stuff around.
List of changes:

handler.py/_rpc_forward became block_methods/rpc_forward
handler.py/_rpc_backward became block_methods/rpc_backward
the math bits of rpc_inference were extracted into block_methods/iterate_rpc_inference

---------

Co-authored-by: Your Name <you@example.com>
Co-authored-by: artek0chumak <artek.chumak@gmail.com>
Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>
10 months ago