You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
617d70f7dc
- Linear8bitLt now supports for pre-turing GPUs by temporarily upcasting quantized weights. - added a test for linear8bitlt accuracy with the new fallback, the accuracy is similar than the real thing, (slightly better due to non-quantized A) - performance is roughly halfway between the default mode and memory_efficient_backward Alternatives considered: - cupy - slow, casting to float internally - triton - fast but unstable af. every 3rd attempt to matmul is a segfault - bnb.functional.igemm (no lt) - "CuBLAS Error 8" on old GPUs Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com> |
2 years ago | |
---|---|---|
.. | ||
scripts | 2 years ago | |
conftest.py | 2 years ago | |
test.id | 2 years ago | |
test_block_exact_match.py | 2 years ago | |
test_chained_calls.py | 2 years ago | |
test_full_model.py | 2 years ago | |
test_linear8bitlt.py | 2 years ago | |
test_priority_pool.py | 2 years ago | |
test_remote_sequential.py | 2 years ago | |
test_sequence_manager.py | 2 years ago | |
test_utils.py | 2 years ago |