You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
d6f4f80f3f
This PR fixes problems related to #569: - block initialization - throughput calculation and cache usage - mixtral in tests Beam search is removed for Mixtral and Llama for now. Those models use DynamicCache, which requires special function to change: (see https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L161) --------- Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> |
2 weeks ago | |
---|---|---|
.. | ||
check-style.yaml | 1 year ago | |
push-docker-image.yaml | 2 months ago | |
run-tests.yaml | 2 weeks ago |