petals/tests
justheuristic 617d70f7dc
Support --load_in_8bit on pre-Turing GPUs (#113)
- Linear8bitLt now supports for pre-turing GPUs by temporarily upcasting quantized weights.
- added a test for linear8bitlt accuracy with the new fallback, the accuracy is similar than the real thing, (slightly better due to non-quantized A)
- performance is roughly halfway between the default mode and memory_efficient_backward

Alternatives considered:
- cupy - slow, casting to float internally
- triton - fast but unstable af. every 3rd attempt to matmul is a segfault
- bnb.functional.igemm (no lt) - "CuBLAS Error 8" on old GPUs

Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>
2022-12-02 15:10:24 +03:00
..
scripts Reduce vocabulary size in test model, fix bug in routing when overlapped (#45) 2022-08-17 18:50:52 +03:00
conftest.py Implement RemoteSequential slicing and extra repr, add tests (#30) 2022-07-19 04:28:04 +03:00
test_block_exact_match.py Make Petals a pip-installable package (attempt 2) (#102) 2022-11-30 10:41:13 +04:00
test_chained_calls.py Make Petals a pip-installable package (attempt 2) (#102) 2022-11-30 10:41:13 +04:00
test_full_model.py Make Petals a pip-installable package (attempt 2) (#102) 2022-11-30 10:41:13 +04:00
test_linear8bitlt.py Support --load_in_8bit on pre-Turing GPUs (#113) 2022-12-02 15:10:24 +03:00
test_priority_pool.py Make Petals a pip-installable package (attempt 2) (#102) 2022-11-30 10:41:13 +04:00
test_remote_sequential.py Optimize RemoteSequenceManager (#106) 2022-12-01 10:25:55 +03:00
test_sequence_manager.py Hotfix span selection (#110) 2022-12-01 11:21:10 +03:00
test_utils.py Implement RemoteSequential slicing and extra repr, add tests (#30) 2022-07-19 04:28:04 +03:00
test.id Add automated tests (#23) 2022-07-16 01:59:23 +03:00