mirror of
https://github.com/bigscience-workshop/petals
synced 2024-10-31 09:20:41 +00:00
21c3526ec1
**Why?** - We'd like to avoid excess threads for the original sequence manager in case if we only use its slices (e.g. when we add adapters or need only a subset of model blocks): - If we create a sequence manager just before a fork (e.g. in a web app backend or a multi-thread benchmark), we'd like to avoid excess threads in the original process and only use this thread in child processes where we actually call `.make_sequence()`. |
||
---|---|---|
.. | ||
scripts | ||
conftest.py | ||
test_aux_functions.py | ||
test_block_exact_match.py | ||
test_chained_calls.py | ||
test_full_model.py | ||
test_priority_pool.py | ||
test_remote_sequential.py | ||
test_sequence_manager.py | ||
test_server_stats.py | ||
test_tensor_parallel.py | ||
test_utils.py | ||
test.id |