You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
petals/src/petals/utils
Artem Chumachenko d6f4f80f3f
Fix Mixtral-related issues (#570)
This PR fixes problems related to #569:
- block initialization
- throughput calculation and cache usage
- mixtral in tests

Beam search is removed for Mixtral and Llama for now. Those models use DynamicCache, which requires special function to change: (see https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L161)

---------

Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
2 weeks ago
..
__init__.py Move SequenceManagerConfig -> ClientConfig, petals.dht_utils -> petals.utils.dht (#463) 9 months ago
asyncio.py Shield alloc & free from cancellation (#163) 1 year ago
auto_config.py Add Falcon support (#499) 8 months ago
convert_block.py Support Llama 2 (#379) 9 months ago
cuda_graphs.py Optimize LLaMA for inference (#513) 5 months ago
dht.py Store (start_block, end_block) in each DHT record for reliability (#510) 7 months ago
disk_cache.py Fix file locks in NFS-mounted directories (#517) 7 months ago
hf_auth.py Support Llama 2 (#379) 9 months ago
logging.py Remove unused imports and attributes (#324) 11 months ago
misc.py Fix Mixtral-related issues (#570) 2 weeks ago
packaging.py Add customizable input tensors (#445) 9 months ago
peft.py Fix Mixtral-related issues (#570) 2 weeks ago
ping.py Fix petals.utils.ping for servers with client-mode DHT (#430) 9 months ago
random.py Implement shortest-path routing for inference (#362) 9 months ago
version.py Add LLaMA support (#323) 10 months ago