justheuristic
8caf1145a8
Quality of life changes: update readme, simplify run_server interface ( #75 )
...
- run_server now accepts model name as both positional and keyword argument
- changed names in README to account for interface updates
- moved model conversion from README to a separate wiki page
- updated requirements.txt
2022-09-20 03:51:57 +03:00
Alexander Borzunov
54ad745bed
Warn that current instructions involve 6B model but we will replace them soon ( #63 )
2022-09-05 15:05:59 +04:00
Alexander Borzunov
5f0c5329d4
Update readme with arxiv link and more discussions ( #62 )
...
Co-authored-by: justheuristic <justheuristic@gmail.com>
2022-09-05 12:04:50 +04:00
Alexander Borzunov
9bea7b9ea8
Update bullet points with feedback from Tim and other people ( #61 )
...
Co-authored-by: Tim Dettmers <tim.dettmers@gmail.com>
2022-09-03 06:38:18 +04:00
Alexander Borzunov
7653562aa1
Use latest version of Petals scheme, shrink Petals logo ( #59 )
2022-09-02 15:38:04 +04:00
Alexander Borzunov
2eb5843852
Update readme for the 1st public release ( #57 )
2022-09-01 08:41:49 +04:00
Pavel Samygin
0be21775af
remove transformer block, implement as sequential of size 1 ( #54 )
...
* remove transformer block, implement as sequence size 1
* reimplement get_remote_module
* fix readme
Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>
Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>
2022-09-01 04:26:31 +03:00
justheuristic
d271b75dd4
Let users specify sequence length instead of assuming 2048 ( #52 )
...
- Maximum length is now provided in `.inference_session(max_length=100)`
- previously, we would always assume max length = 2048
- added a generic way to forward **kwargs to inference session
- for compatibility with #47
- Note to @borzunov : it does *not* pass them arbitrarily, but instead checks for kwarg names at the bottom level
- run_server can be started with a custom max_length for inference
- renamed --cache_size_bytes to --attention_cache_bytes (to avoid collision with --cache_dir)
- --attn_cache_bytes can now support humane file sizes (e.g. 300MB instead of 314572800)
- made some server-side errors more human-readable to user (e.g. when max length is exceeded)
Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>
Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>
2022-08-29 21:04:37 +03:00
Dmitry Baranchuk
11a424837f
integrate mixed-8bit model ( #39 )
...
* integrate mixed-8bit model
* Fix bug with model duplication in RAM
* set throughput=1.0 to fix zero throughput problem
* add revision support
* update hivemind and bitsandbytes
* update deploy scripts
* update installation instructions
2022-08-04 09:57:37 +03:00
Alexander Borzunov
7d39d46966
Use "PETALS" as the readme title ( #40 )
...
Since we've chosen the system name, let's use it in the repo name and the readme title.
2022-08-02 18:48:54 +04:00
justheuristic
ccdcefe405
Add instructions to test the full model ( #25 )
...
add instructions to test the full model
2022-07-16 02:45:10 +03:00
justheuristic
eb0a6be716
Clean up readme ( #24 )
...
Remove some deprecated sections of README and turns on CI for main branch
2022-07-16 02:11:17 +03:00
Alexander Borzunov
7e9f337a63
Remove excess line from readme
2022-07-14 22:27:17 +04:00
Alexander Borzunov
aba43f1308
Implement block selection on servers ( #20 )
2022-07-12 14:42:30 +04:00
justheuristic
4eadd00a2c
rm prefix from tests
2022-07-08 19:11:55 +03:00
justheuristic
2e90ac30a0
use default prefix in readme
2022-07-07 13:22:03 +03:00
justheuristic
4695071ad2
WIP: make DistributedBloom compliant with HF interface
2022-07-07 03:11:28 +03:00
justheuristic
8de7c1687b
list latest additions
2022-07-01 03:57:21 +03:00
justheuristic
d688cb0d22
stupid, slow, fragile, but correct full model inference
2022-07-01 03:53:08 +03:00
Aleksandr Borzunov
b78d713347
refactor, add swarm info
2022-06-29 14:26:47 +03:00
justheuristic
ca3c08acc1
Update README.md
2022-06-24 10:05:39 +03:00
justheuristic
1cdf8a77fb
basic chained inference (multiple blocks per one RPC call)
2022-06-23 16:33:16 +03:00
justheuristic
0e7afea026
Merge remote-tracking branch 'origin/main' into main
2022-06-22 22:02:27 +03:00
justheuristic
2eb47cbedd
support hosting multiple instances of the same block
2022-06-22 22:00:55 +03:00
justheuristic
14b6d04b0f
install hivemind from pip
2022-06-22 17:32:13 +03:00
justheuristic
f3722d52cf
Update README.md
2022-06-20 17:02:29 +03:00
justheuristic
aaaf0c2dad
better testing readme
2022-06-20 16:51:20 +03:00
justheuristic
2bf83b42e5
add testing guide
2022-06-20 16:50:12 +03:00
justheuristic
1ab5fb1630
fetch a specific bloom block without downloading the entire model
2022-06-20 15:33:17 +03:00
justheuristic
2d55e6e4fe
instructions to test distributed inference
2022-06-19 22:25:57 +03:00
justheuristic
9be7c81b78
instructions to test distributed inference
2022-06-19 22:22:01 +03:00
justheuristic
cc9a76625d
warn about long runtime
2022-06-19 22:14:52 +03:00
justheuristic
82214699f2
notes on hosting servers
2022-06-19 19:34:18 +03:00
justheuristic
1555d98f66
push converted model to hub
2022-06-19 19:13:48 +03:00
justheuristic
736f1d1085
push converted model to hub
2022-06-19 19:06:35 +03:00
justheuristic
7fba411dff
extended run_serverexample
2022-06-17 11:40:49 +03:00
justheuristic
5a15c13ca7
switch to hivemind-master
2022-06-17 10:36:34 +03:00
justheuristic
35310698f0
newer hivemind version
2022-06-17 10:12:09 +03:00
justheuristic
8959727dea
add minimalistic benchmarks
2022-06-14 15:18:38 +03:00
justheuristic
a798ea04a6
add minimalistic benchmarks
2022-06-14 15:18:11 +03:00
justheuristic
99059ae667
install script
2022-06-12 04:23:38 +03:00
justheuristic
c1a908dc66
Update README.md
2022-06-12 03:13:40 +03:00
justheuristic
af04479cf9
Initial commit
2022-06-12 03:10:27 +03:00