petals

Commit Graph

Author	SHA1	Message	Date
Alexander Borzunov	c6e1b5a8e5	Add various server timeouts, lower --max_batch_size and --inference_max_length defaults (#97 )	2 years ago
Alexander Borzunov	8a73b41a42	Make ServerState announcements work better (#93 ) - Before this PR, `ServerState.JOINING` was announced only once. This announcement quickly expires in case of the full-size BLOOM, since loading blocks takes several minutes. This PR fixes it, so `ServerState.JOINING` is announced periodically in a thread until blocks are loaded. - This PR also makes the `Server` class a non-thread, so it runs in the main thread and can catch `KeyboardInterrupt`. This is important, since if we are downloading blocks right now, we need to stop it and send the `ServerState.OFFLINE` message. Note that `ModuleContainer` is still a thread. - (minor) For the sake of readability, I moved the `ModuleContainer.create()` definition, so it is now defined before `Server.__init__()` (this is because `.create()` is invoked first).	2 years ago
Alexander Borzunov	dc71574a63	Use public swarm by default (#92 ) This PR makes servers and clients use public swarm's bootstrap peers if no other initial peers are specified. If you'd like a server to start a new swarm, provide the `--new_swarm` CLI argument.	2 years ago
Alexander Borzunov	ee4e69c254	Enable rebalancing by default (#84 )	2 years ago
Alexander Borzunov	f64eb3a665	Update hivemind to 1.1.2, mark `model` argument as required (#81 )	2 years ago
Alexander Borzunov	149f433763	Rebalance swarm when necessary (#34 )	2 years ago
justheuristic	8caf1145a8	Quality of life changes: update readme, simplify run_server interface (#75 ) - run_server now accepts model name as both positional and keyword argument - changed names in README to account for interface updates - moved model conversion from README to a separate wiki page - updated requirements.txt	2 years ago
justheuristic	e92487e5d2	Update dependency versions (#71 ) * update dependency versions * install bitsandbytes cpuonly from pip * remove deprecated API from task pool * clearer startup logs Co-authored-by: Tim Dettmers <dettmers@cs.washington.edu>	2 years ago
Pavel Samygin	50535a8435	Priority tasks (#47 ) * priority in handlers and backend pools * simple points system on server side * priortize task in handler before submit task * fix tests * s/expert/block/g Co-authored-by: justheuristic <justheuristic@gmail.com>	2 years ago
justheuristic	d271b75dd4	Let users specify sequence length instead of assuming 2048 (#52 ) - Maximum length is now provided in `.inference_session(max_length=100)` - previously, we would always assume max length = 2048 - added a generic way to forward *kwargs to inference session - for compatibility with #47 - Note to @borzunov : it does not* pass them arbitrarily, but instead checks for kwarg names at the bottom level - run_server can be started with a custom max_length for inference - renamed --cache_size_bytes to --attention_cache_bytes (to avoid collision with --cache_dir) - --attn_cache_bytes can now support humane file sizes (e.g. 300MB instead of 314572800) - made some server-side errors more human-readable to user (e.g. when max length is exceeded) Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>	2 years ago
justheuristic	a2634001e9	Reduce vocabulary size in test model, fix bug in routing when overlapped (#45 ) This PR reduces this vocabulary size to save memory during conversion, keeping only the first 50k tokens As a result, * tests that load client-side embeddings need significantly less RAM * we can now run CI tests with 4 servers instead of 2 - needed to test routing - see bugs uncovered * some of the servers now use load balancing * CI convert_model now takes 4-5 minutes (was 6-7)	2 years ago
Dmitry Baranchuk	11a424837f	integrate mixed-8bit model (#39 ) * integrate mixed-8bit model * Fix bug with model duplication in RAM * set throughput=1.0 to fix zero throughput problem * add revision support * update hivemind and bitsandbytes * update deploy scripts * update installation instructions	2 years ago
Dmitry Baranchuk	6573076883	Sequential and parallel forward / backward (#36 )	2 years ago
justheuristic	e2711a033b	Add automated tests (#23 ) This PR will run basic tests automatically on each subsequent PR - convert a small model on every PR - run existing tests on every PR - enforce black / isort - require checks on merge - make sure tests are not flappy Co-authored-by: Alexander Borzunov <hxrussia@gmail.com> Co-authored-by: Dmitry Baranchuk <dmitrybaranchuk@gmail.com>	2 years ago
Dmitry Baranchuk	f114a6d417	set default num_handlers=16	2 years ago
Alexander Borzunov	75856e4769	Measure and cache network & compute throughput (#21 )	2 years ago
Alexander Borzunov	aba43f1308	Implement block selection on servers (#20 )	2 years ago
Dmitry Baranchuk	f055135b08	rm prefix	2 years ago
justheuristic	5695897620	fix imports	2 years ago
justheuristic	de556c99be	straighten import order	2 years ago
justheuristic	90d65e58aa	set default DHT prefix	2 years ago
justheuristic	41e5a95e8e	set client branch to main by default; remove the concept of base branch (redundant)	2 years ago
justheuristic	899cefe588	set client branch to main by default; remove the concept of base branch (redundant)	2 years ago
justheuristic	4695071ad2	WIP: make DistributedBloom compliant with HF interface	2 years ago
justheuristic	4ad845bce3	black-isort	2 years ago
Dmitry Baranchuk	be83e6d0cb	refactoring	2 years ago
Dmitry Baranchuk	d969172208	set requires_grad=False, lm_layer -> h @ word_embeddings, rm lm_layer from comverted_model	2 years ago
justheuristic	d42e8abd38	Merge branch 'client' into main	2 years ago
Dmitry Baranchuk	d7baa9997d	rm deprecated files	2 years ago
Dmitry Baranchuk	f60a7dd183	deploy swarm on local & remote machines	2 years ago
justheuristic	9c492bbe8c	Infer prefix by defaukt	2 years ago
justheuristic	6e3db6bed6	black-isort	2 years ago
justheuristic	894cd5d586	Merge branch 'fix-auth-token' into main	2 years ago
Dmitry Baranchuk	4d27080ccc	add scripts to deploy a swarm	2 years ago
justheuristic	d5c410bb1f	fix auth token	2 years ago
justheuristic	ce9556dcb0	deprecation warning	2 years ago
justheuristic	83cd4412a1	black-isort	2 years ago
justheuristic	1ab5fb1630	fetch a specific bloom block without downloading the entire model	2 years ago
justheuristic	6047a2ffe0	push config and tokenizer separately	2 years ago
justheuristic	b6f3bbfd97	black	2 years ago
justheuristic	84de19fb1a	better status logs	2 years ago
justheuristic	1555d98f66	push converted model to hub	2 years ago
justheuristic	736f1d1085	push converted model to hub	2 years ago
justheuristic	e8241d2915	black everything	2 years ago
justheuristic	3b9351de1c	isort	2 years ago
justheuristic	20497f81d1	switch to hivemind-master	2 years ago
Pavel Samygin	57f4e0a899	add identity path	2 years ago
justheuristic	a798ea04a6	add minimalistic benchmarks	2 years ago
justheuristic	14e316b52a	black-isort	2 years ago
justheuristic	7ce7cd7a97	basic backend	2 years ago

1 2

55 Commits (ab41223b17c17dd1035a42318b03d4b92decd063)