You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
37fdcb3fe0
Currently, each `TransformerBackend.inference_step` looks for adapters and sets the correct adapter type for each block. This is not very expensive, but it can measurably affect inference time. This pull request uses faster adapter switching with just one variable assignment, without iterating over block.modules(). |
11 months ago | |
---|---|---|
.. | ||
petals | 11 months ago |