You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
petals/src/petals
justheuristic c08d09c4d3
Rewrite MemoryCache alloc_timeout logic (#434)
-    rpc_inference: server will now accept allocation timeout from user, defaults to no timeout
-    bugfix: inference timeout is now measured from the moment the request is received
    -    previously, you would have to wait for your timeout plus the time it takes to sort through the queue (other users' timeout)
    -    now, you get AllocationFailed if you had to wait for over (timeout) seconds - regardless of other users
-    a request for inference with no timeout will now fail instantly if there is not enough memory available
-    dtype number of bytes is now correctly determined for int, bool & other types


---------

Co-authored-by: Your Name <you@example.com>
Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>
Co-authored-by: Aleksandr Borzunov <hxrussia@gmail.com>
9 months ago
..
cli Rewrite MemoryCache alloc_timeout logic (#434) 9 months ago
client Make client compatible with transformers' GenerationMixin (#464) 10 months ago
models Make client compatible with transformers' GenerationMixin (#464) 10 months ago
server Rewrite MemoryCache alloc_timeout logic (#434) 9 months ago
utils Rewrite MemoryCache alloc_timeout logic (#434) 9 months ago
__init__.py Fix requiring transformers>=4.32.0 (#480) 9 months ago
constants.py Update to petals.dev (#390) 11 months ago
data_structures.py Move SequenceManagerConfig -> ClientConfig, petals.dht_utils -> petals.utils.dht (#463) 10 months ago
dht_utils.py Move SequenceManagerConfig -> ClientConfig, petals.dht_utils -> petals.utils.dht (#463) 10 months ago