You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ab41223b17
This PR: 1. Makes inference/forward/backward calls on client remember the dtype and device of source tensors, then move/cast the outputs to the same dtype/device. This way: - Users don't need to make changes in the code launching `RemoteSequential` to make it run on a different device. - `model.generate()` also starts to support both CPU and GPU. 2. Sets default `low_cpu_mem_usage=True`, client's request timeout to 20 sec. 3. Removes excess casts to float32 left in Dmitry's code. 4. (minor) Improves error messages. |
1 year ago | |
---|---|---|
.. | ||
__init__.py | 2 years ago | |
block.py | 2 years ago | |
from_pretrained.py | 2 years ago | |
model.py | 1 year ago | |
ops.py | 2 years ago |