petals

Commit Graph

Author	SHA1	Message	Date
Alexander Borzunov	ee4e69c254	Enable rebalancing by default (#84 )	2 years ago
Alexander Borzunov	f64eb3a665	Update hivemind to 1.1.2, mark `model` argument as required (#81 )	2 years ago
Alexander Borzunov	149f433763	Rebalance swarm when necessary (#34 )	2 years ago
justheuristic	8caf1145a8	Quality of life changes: update readme, simplify run_server interface (#75 ) - run_server now accepts model name as both positional and keyword argument - changed names in README to account for interface updates - moved model conversion from README to a separate wiki page - updated requirements.txt	2 years ago
justheuristic	e92487e5d2	Update dependency versions (#71 ) * update dependency versions * install bitsandbytes cpuonly from pip * remove deprecated API from task pool * clearer startup logs Co-authored-by: Tim Dettmers <dettmers@cs.washington.edu>	2 years ago
Pavel Samygin	50535a8435	Priority tasks (#47 ) * priority in handlers and backend pools * simple points system on server side * priortize task in handler before submit task * fix tests * s/expert/block/g Co-authored-by: justheuristic <justheuristic@gmail.com>	2 years ago
justheuristic	d271b75dd4	Let users specify sequence length instead of assuming 2048 (#52 ) - Maximum length is now provided in `.inference_session(max_length=100)` - previously, we would always assume max length = 2048 - added a generic way to forward *kwargs to inference session - for compatibility with #47 - Note to @borzunov : it does not* pass them arbitrarily, but instead checks for kwarg names at the bottom level - run_server can be started with a custom max_length for inference - renamed --cache_size_bytes to --attention_cache_bytes (to avoid collision with --cache_dir) - --attn_cache_bytes can now support humane file sizes (e.g. 300MB instead of 314572800) - made some server-side errors more human-readable to user (e.g. when max length is exceeded) Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>	2 years ago
Dmitry Baranchuk	11a424837f	integrate mixed-8bit model (#39 ) * integrate mixed-8bit model * Fix bug with model duplication in RAM * set throughput=1.0 to fix zero throughput problem * add revision support * update hivemind and bitsandbytes * update deploy scripts * update installation instructions	2 years ago
Dmitry Baranchuk	f114a6d417	set default num_handlers=16	2 years ago
Alexander Borzunov	75856e4769	Measure and cache network & compute throughput (#21 )	2 years ago
Alexander Borzunov	aba43f1308	Implement block selection on servers (#20 )	2 years ago
justheuristic	9c492bbe8c	Infer prefix by defaukt	2 years ago
justheuristic	6e3db6bed6	black-isort	2 years ago
justheuristic	d5c410bb1f	fix auth token	2 years ago
justheuristic	1ab5fb1630	fetch a specific bloom block without downloading the entire model	2 years ago
justheuristic	3b9351de1c	isort	2 years ago
justheuristic	20497f81d1	switch to hivemind-master	2 years ago
Pavel Samygin	57f4e0a899	add identity path	2 years ago
justheuristic	a798ea04a6	add minimalistic benchmarks	2 years ago
justheuristic	14e316b52a	black-isort	2 years ago
justheuristic	7ce7cd7a97	basic backend	2 years ago
justheuristic	1c49bcb741	basic backend	2 years ago

22 Commits (c07a7e081260376e0729b71641d23b441b4d5791)