You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
petals/src
justheuristic f3984b192a
Make attention cache wait until memory is freed (#53)
Previously, attempting to allocate with MemoryCache that does not have enough space would throw AllocationFailed.

PR changes this behavior to the following:
- by default, wait until memory is freed by other tenants (FIFO)
- if could not allocate within timeout, throw AllocationFailed
- if allocated size is too big to fit even in empty cache, throw AllocationFailed

- [x] passes existing tests
- [x] passes manual load tests

p.s. if anyone wondered: using mp.Condition will not make the code simpler, their lock behavior is slightly different to what we need here

Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>
Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>
2 years ago
..
bloom [Fix] make distributed seq cls to not create the full bloom model (#49) 2 years ago
client Fix calling rpc_info multiple times (#60) 2 years ago
server Make attention cache wait until memory is freed (#53) 2 years ago
utils Convert actual model weights (#46) 2 years ago
__init__.py Measure and cache network & compute throughput (#21) 2 years ago
data_structures.py Implement RemoteSequential slicing and extra repr, add tests (#30) 2 years ago
dht_utils.py remove transformer block, implement as sequential of size 1 (#54) 2 years ago