petals/src
justheuristic f3984b192a
Make attention cache wait until memory is freed (#53)
Previously, attempting to allocate with MemoryCache that does not have enough space would throw AllocationFailed.

PR changes this behavior to the following:
- by default, wait until memory is freed by other tenants (FIFO)
- if could not allocate within timeout, throw AllocationFailed
- if allocated size is too big to fit even in empty cache, throw AllocationFailed

- [x] passes existing tests
- [x] passes manual load tests

p.s. if anyone wondered: using mp.Condition will not make the code simpler, their lock behavior is slightly different to what we need here

Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>
Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>
2022-09-07 02:14:34 +03:00
..
bloom [Fix] make distributed seq cls to not create the full bloom model (#49) 2022-08-28 20:20:51 +03:00
client Fix calling rpc_info multiple times (#60) 2022-09-07 01:41:23 +03:00
server Make attention cache wait until memory is freed (#53) 2022-09-07 02:14:34 +03:00
utils Convert actual model weights (#46) 2022-08-17 23:32:14 +03:00
__init__.py Measure and cache network & compute throughput (#21) 2022-07-13 05:46:26 +04:00
data_structures.py Implement RemoteSequential slicing and extra repr, add tests (#30) 2022-07-19 04:28:04 +03:00
dht_utils.py remove transformer block, implement as sequential of size 1 (#54) 2022-09-01 04:26:31 +03:00