petals

mirror of https://github.com/bigscience-workshop/petals synced 2024-10-31 09:20:41 +00:00

History

justheuristic 617d70f7dc Support --load_in_8bit on pre-Turing GPUs (#113 ) - Linear8bitLt now supports for pre-turing GPUs by temporarily upcasting quantized weights. - added a test for linear8bitlt accuracy with the new fallback, the accuracy is similar than the real thing, (slightly better due to non-quantized A) - performance is roughly halfway between the default mode and memory_efficient_backward Alternatives considered: - cupy - slow, casting to float internally - triton - fast but unstable af. every 3rd attempt to matmul is a segfault - bnb.functional.igemm (no lt) - "CuBLAS Error 8" on old GPUs Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>		2022-12-02 15:10:24 +03:00
..
scripts	Reduce vocabulary size in test model, fix bug in routing when overlapped (#45 )	2022-08-17 18:50:52 +03:00
conftest.py	Implement RemoteSequential slicing and extra repr, add tests (#30 )	2022-07-19 04:28:04 +03:00
test_block_exact_match.py	Make Petals a pip-installable package (attempt 2) (#102 )	2022-11-30 10:41:13 +04:00
test_chained_calls.py	Make Petals a pip-installable package (attempt 2) (#102 )	2022-11-30 10:41:13 +04:00
test_full_model.py	Make Petals a pip-installable package (attempt 2) (#102 )	2022-11-30 10:41:13 +04:00
test_linear8bitlt.py	Support --load_in_8bit on pre-Turing GPUs (#113 )	2022-12-02 15:10:24 +03:00
test_priority_pool.py	Make Petals a pip-installable package (attempt 2) (#102 )	2022-11-30 10:41:13 +04:00
test_remote_sequential.py	Optimize RemoteSequenceManager (#106 )	2022-12-01 10:25:55 +03:00
test_sequence_manager.py	Hotfix span selection (#110 )	2022-12-01 11:21:10 +03:00
test_utils.py	Implement RemoteSequential slicing and extra repr, add tests (#30 )	2022-07-19 04:28:04 +03:00
test.id	Add automated tests (#23 )	2022-07-16 01:59:23 +03:00