petals

mirror of https://github.com/bigscience-workshop/petals synced 2024-10-31 09:20:41 +00:00

History

justheuristic f3984b192a Make attention cache wait until memory is freed (#53 ) Previously, attempting to allocate with MemoryCache that does not have enough space would throw AllocationFailed. PR changes this behavior to the following: - by default, wait until memory is freed by other tenants (FIFO) - if could not allocate within timeout, throw AllocationFailed - if allocated size is too big to fit even in empty cache, throw AllocationFailed - [x] passes existing tests - [x] passes manual load tests p.s. if anyone wondered: using mp.Condition will not make the code simpler, their lock behavior is slightly different to what we need here Co-authored-by: Alexander Borzunov <hxrussia@gmail.com> Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>		2022-09-07 02:14:34 +03:00
..
bloom	[Fix] make distributed seq cls to not create the full bloom model (#49 )	2022-08-28 20:20:51 +03:00
client	Fix calling rpc_info multiple times (#60 )	2022-09-07 01:41:23 +03:00
server	Make attention cache wait until memory is freed (#53 )	2022-09-07 02:14:34 +03:00
utils	Convert actual model weights (#46 )	2022-08-17 23:32:14 +03:00
__init__.py	Measure and cache network & compute throughput (#21 )	2022-07-13 05:46:26 +04:00
data_structures.py	Implement RemoteSequential slicing and extra repr, add tests (#30 )	2022-07-19 04:28:04 +03:00
dht_utils.py	remove transformer block, implement as sequential of size 1 (#54 )	2022-09-01 04:26:31 +03:00