You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
d6f4f80f3f
This PR fixes problems related to #569: - block initialization - throughput calculation and cache usage - mixtral in tests Beam search is removed for Mixtral and Llama for now. Those models use DynamicCache, which requires special function to change: (see https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L161) --------- Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> |
2 weeks ago | |
---|---|---|
.. | ||
bootstrap.id | 9 months ago | |
conftest.py | 1 year ago | |
server2.id | 9 months ago | |
test_aux_functions.py | 9 months ago | |
test_block_exact_match.py | 9 months ago | |
test_cache.py | 8 months ago | |
test_chained_calls.py | 2 weeks ago | |
test_dtype.py | 10 months ago | |
test_full_model.py | 2 weeks ago | |
test_optimized_layers.py | 2 weeks ago | |
test_peft.py | 10 months ago | |
test_priority_pool.py | 8 months ago | |
test_remote_sequential.py | 8 months ago | |
test_sequence_manager.py | 9 months ago | |
test_server_stats.py | 9 months ago | |
test_tensor_parallel.py | 9 months ago | |
test_utils.py | 10 months ago |