petals

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

History

Artem Chumachenko d6f4f80f3f Fix Mixtral-related issues (#570 ) This PR fixes problems related to #569: - block initialization - throughput calculation and cache usage - mixtral in tests Beam search is removed for Mixtral and Llama for now. Those models use DynamicCache, which requires special function to change: (see https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L161) --------- Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>		2 weeks ago
..
__init__.py	Move SequenceManagerConfig -> ClientConfig, petals.dht_utils -> petals.utils.dht (#463 )	9 months ago
asyncio.py	Shield alloc & free from cancellation (#163 )	1 year ago
auto_config.py	Add Falcon support (#499 )	8 months ago
convert_block.py	Support Llama 2 (#379 )	9 months ago
cuda_graphs.py	Optimize LLaMA for inference (#513 )	5 months ago
dht.py	Store (start_block, end_block) in each DHT record for reliability (#510 )	7 months ago
disk_cache.py	Fix file locks in NFS-mounted directories (#517 )	7 months ago
hf_auth.py	Support Llama 2 (#379 )	9 months ago
logging.py	Remove unused imports and attributes (#324 )	11 months ago
misc.py	Fix Mixtral-related issues (#570 )	2 weeks ago
packaging.py	Add customizable input tensors (#445 )	9 months ago
peft.py	Fix Mixtral-related issues (#570 )	2 weeks ago
ping.py	Fix petals.utils.ping for servers with client-mode DHT (#430 )	9 months ago
random.py	Implement shortest-path routing for inference (#362 )	9 months ago
version.py	Add LLaMA support (#323 )	10 months ago