This PR fixes problems related to #569:
- block initialization
- throughput calculation and cache usage
- mixtral in tests
Beam search is removed for Mixtral and Llama for now. Those models use DynamicCache, which requires special function to change: (see https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L161)
---------
Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>