petals

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

History

justheuristic 617d70f7dc Support --load_in_8bit on pre-Turing GPUs (#113 ) - Linear8bitLt now supports for pre-turing GPUs by temporarily upcasting quantized weights. - added a test for linear8bitlt accuracy with the new fallback, the accuracy is similar than the real thing, (slightly better due to non-quantized A) - performance is roughly halfway between the default mode and memory_efficient_backward Alternatives considered: - cupy - slow, casting to float internally - triton - fast but unstable af. every 3rd attempt to matmul is a segfault - bnb.functional.igemm (no lt) - "CuBLAS Error 8" on old GPUs Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>		2 years ago
..
scripts	Reduce vocabulary size in test model, fix bug in routing when overlapped (#45 )	2 years ago
conftest.py	Implement RemoteSequential slicing and extra repr, add tests (#30 )	2 years ago
test.id	Add automated tests (#23 )	2 years ago
test_block_exact_match.py	Make Petals a pip-installable package (attempt 2) (#102 )	2 years ago
test_chained_calls.py	Make Petals a pip-installable package (attempt 2) (#102 )	2 years ago
test_full_model.py	Make Petals a pip-installable package (attempt 2) (#102 )	2 years ago
test_linear8bitlt.py	Support --load_in_8bit on pre-Turing GPUs (#113 )	2 years ago
test_priority_pool.py	Make Petals a pip-installable package (attempt 2) (#102 )	2 years ago
test_remote_sequential.py	Optimize RemoteSequenceManager (#106 )	2 years ago
test_sequence_manager.py	Hotfix span selection (#110 )	2 years ago
test_utils.py	Implement RemoteSequential slicing and extra repr, add tests (#30 )	2 years ago