You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
petals/src/petals/client
Artem Chumachenko d6f4f80f3f
Fix Mixtral-related issues (#570)
This PR fixes problems related to #569:
- block initialization
- throughput calculation and cache usage
- mixtral in tests

Beam search is removed for Mixtral and Llama for now. Those models use DynamicCache, which requires special function to change: (see https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L161)

---------

Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
3 weeks ago
..
routing Store (start_block, end_block) in each DHT record for reliability (#510) 8 months ago
__init__.py Move SequenceManagerConfig -> ClientConfig, petals.dht_utils -> petals.utils.dht (#463) 9 months ago
config.py Improve default arguments for clients and servers (#530) 6 months ago
from_pretrained.py Improve default arguments for clients and servers (#530) 6 months ago
inference_session.py Bump transformers and accelerate versions (#554) 2 months ago
lm_head.py Improve default arguments for clients and servers (#530) 6 months ago
ptune.py Make client compatible with transformers' GenerationMixin (#464) 8 months ago
remote_forward_backward.py Move SequenceManagerConfig -> ClientConfig, petals.dht_utils -> petals.utils.dht (#463) 9 months ago
remote_generation.py Fix Mixtral-related issues (#570) 3 weeks ago
remote_sequential.py Make client compatible with transformers' GenerationMixin (#464) 8 months ago
sequential_autograd.py Make client compatible with transformers' GenerationMixin (#464) 8 months ago