You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
petals/tests
justheuristic 617d70f7dc
Support --load_in_8bit on pre-Turing GPUs (#113)
- Linear8bitLt now supports for pre-turing GPUs by temporarily upcasting quantized weights.
- added a test for linear8bitlt accuracy with the new fallback, the accuracy is similar than the real thing, (slightly better due to non-quantized A)
- performance is roughly halfway between the default mode and memory_efficient_backward

Alternatives considered:
- cupy - slow, casting to float internally
- triton - fast but unstable af. every 3rd attempt to matmul is a segfault
- bnb.functional.igemm (no lt) - "CuBLAS Error 8" on old GPUs

Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>
2 years ago
..
scripts Reduce vocabulary size in test model, fix bug in routing when overlapped (#45) 2 years ago
conftest.py Implement RemoteSequential slicing and extra repr, add tests (#30) 2 years ago
test.id Add automated tests (#23) 2 years ago
test_block_exact_match.py Make Petals a pip-installable package (attempt 2) (#102) 2 years ago
test_chained_calls.py Make Petals a pip-installable package (attempt 2) (#102) 2 years ago
test_full_model.py Make Petals a pip-installable package (attempt 2) (#102) 2 years ago
test_linear8bitlt.py Support --load_in_8bit on pre-Turing GPUs (#113) 2 years ago
test_priority_pool.py Make Petals a pip-installable package (attempt 2) (#102) 2 years ago
test_remote_sequential.py Optimize RemoteSequenceManager (#106) 2 years ago
test_sequence_manager.py Hotfix span selection (#110) 2 years ago
test_utils.py Implement RemoteSequential slicing and extra repr, add tests (#30) 2 years ago