petals

Commit Graph

Author	SHA1	Message	Date
Alexander Borzunov	643a054170	Make server use smart defaults (#115 ) Summary: ```python parser.add_argument('--attn_cache_size', type=str, default=None, help='The size of GPU memory allocated for storing past attention keys/values between inference steps. ' 'Examples: 500MB, 1.2GB, 1073741824 (bytes). Note that 1KB != 1KiB here. ' 'Default: 0.5GiB * num_blocks * hidden_size / 14336. ' 'The latter is the hidden size of the bigscience/bloom-petals model.') parser.add_argument('--request_timeout', type=float, required=False, default=3 * 60, help='Timeout (in seconds) for the whole rpc_forward/rpc_backward/rpc_forward_stream/rpc_backward_stream request') parser.add_argument('--session_timeout', type=float, required=False, default=30 * 60, help='Timeout (in seconds) for the whole inference session') parser.add_argument('--step_timeout', type=float, required=False, default=60, help="Timeout (in seconds) for waiting the next step's inputs inside an inference session") parser.add_argument('--load_in_8bit', type=bool, default=None, help="Convert the loaded model into mixed-8bit quantized model. Default: True if GPU is available") ``` Co-authored-by: justheuristic <justheuristic@gmail.com>	2 years ago
Alexander Borzunov	1ea44b0d3c	Measure throughput for different configs, devices, and dtypes separately (#114 )	2 years ago
Alexander Borzunov	43ac6016ac	Fix dtypes in backend schemas (#99 ) Currently, the schemas use `torch.float32`, so all inputs and outputs converted to float32 before sending and after receiving on both servers and clients. This creates a huge slowdown for the system. * This PR makes the schemas use the server's `--torch_dtype` argument (default is `torch.bloat16` for BLOOM-176B) * an option for client to request a specific output compression. Use case 1: client sends quantized inputs and expects quantized inputs in return. Use case 2: client uses quantization for gradients w.r.t. activations, but keeps grads w.r.t. __prompts__ as is for greater precision. * a comment explaining the purpose of NoSpendingPolicy - since we likely won't have it for the workshop * a test with custom compression (janky implementation for testing purposes) Co-authored-by: justheuristic <justheuristic@gmail.com>	2 years ago
Alexander Borzunov	7bd5916744	Make Petals a pip-installable package (attempt 2) (#102 ) 1. Petals can be now installed using `pip install git+https://github.com/bigscience-workshop/petals` - In case if you already cloned the repo, you can do `pip install .` or `pip install .[dev]` 2. Moved `src` => `src/petals` - Replaced `from src.smth import smth` with `from petals.smth import smth` 3. Moved `cli` => `src/petals/cli` - Replaced `python -m cli.run_smth` with `python -m petals.cli.run_smth` (all utilities are now available right after pip installation) 4. Moved the `requirements*.txt` contents to `setup.cfg` (`requirements.txt` for packages is not supported well by modern packaging utils) 5. Increased the package version from `0.2` to `1.0alpha1`	2 years ago

4 Commits (643a0541708e8ada830e195453e9a340fa821a40)