You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
d6f4f80f3f
This PR fixes problems related to #569: - block initialization - throughput calculation and cache usage - mixtral in tests Beam search is removed for Mixtral and Llama for now. Those models use DynamicCache, which requires special function to change: (see https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L161) --------- Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> |
2 weeks ago | |
---|---|---|
.. | ||
__init__.py | 9 months ago | |
asyncio.py | 1 year ago | |
auto_config.py | 8 months ago | |
convert_block.py | 9 months ago | |
cuda_graphs.py | 5 months ago | |
dht.py | 7 months ago | |
disk_cache.py | 7 months ago | |
hf_auth.py | 9 months ago | |
logging.py | 11 months ago | |
misc.py | 2 weeks ago | |
packaging.py | 9 months ago | |
peft.py | 2 weeks ago | |
ping.py | 9 months ago | |
random.py | 9 months ago | |
version.py | 10 months ago |