You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
c08d09c4d3
- rpc_inference: server will now accept allocation timeout from user, defaults to no timeout - bugfix: inference timeout is now measured from the moment the request is received - previously, you would have to wait for your timeout plus the time it takes to sort through the queue (other users' timeout) - now, you get AllocationFailed if you had to wait for over (timeout) seconds - regardless of other users - a request for inference with no timeout will now fail instantly if there is not enough memory available - dtype number of bytes is now correctly determined for int, bool & other types --------- Co-authored-by: Your Name <you@example.com> Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Aleksandr Borzunov <hxrussia@gmail.com> |
9 months ago | |
---|---|---|
.. | ||
bootstrap.id | 10 months ago | |
conftest.py | 1 year ago | |
server2.id | 10 months ago | |
test_aux_functions.py | 10 months ago | |
test_block_exact_match.py | 10 months ago | |
test_cache.py | 9 months ago | |
test_chained_calls.py | 10 months ago | |
test_dtype.py | 11 months ago | |
test_full_model.py | 10 months ago | |
test_peft.py | 11 months ago | |
test_priority_pool.py | 1 year ago | |
test_remote_sequential.py | 10 months ago | |
test_sequence_manager.py | 10 months ago | |
test_server_stats.py | 10 months ago | |
test_tensor_parallel.py | 10 months ago | |
test_utils.py | 11 months ago |