mirror of
https://github.com/bigscience-workshop/petals
synced 2024-11-19 21:25:38 +00:00
617d70f7dc
- Linear8bitLt now supports for pre-turing GPUs by temporarily upcasting quantized weights. - added a test for linear8bitlt accuracy with the new fallback, the accuracy is similar than the real thing, (slightly better due to non-quantized A) - performance is roughly halfway between the default mode and memory_efficient_backward Alternatives considered: - cupy - slow, casting to float internally - triton - fast but unstable af. every 3rd attempt to matmul is a segfault - bnb.functional.igemm (no lt) - "CuBLAS Error 8" on old GPUs Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com> |
||
---|---|---|
.. | ||
scripts | ||
conftest.py | ||
test_block_exact_match.py | ||
test_chained_calls.py | ||
test_full_model.py | ||
test_linear8bitlt.py | ||
test_priority_pool.py | ||
test_remote_sequential.py | ||
test_sequence_manager.py | ||
test_utils.py | ||
test.id |