speculative_inference
main
hf_quantization_integration
borzunov-patch-3
test_set_position
test-with-jf160m
step_metadata
speculative_test
test_branch
forward_backward
fix-docker
forward_kwargs
bump
test_main
fix-inference-retry
lora_from_hub
payload-size
partial_rollback
qkv_merge
no_qkv_merge
wip_triton
hivemind-dht-fork-process
repetition-penalty
amd-gpus
bnb-0-41-1
lru
beat-docker-into-submission
measurements
debug-leak
fix-nf4-and-dtypes
declare_adapters
empty-weights
download_8bit_weights
no-cpufeature
versions
test_opt_serving
borzunov-patch-2
borzunov-patch-1
processing_attention
yozh-dev-branch
server-increase-startup-timeout
vectorized_beam_search
friendly-timeout-errors
hivemind-1.1.4
fix3
hotfix_bnb
fix-ptune
server-dtypes
pip-installable-v2
pip-installable
diff-compression
client-convenience
server-timeouts
server-logging
beamsearch
fix-protobuf
fix-requirements
fix-joining-announce
bootstrap-peers
fault-tolerant-inference
examples_fix_hivemind
forward-backward-timeouts
fix-rebalancing-issues
add-sst2-example
enable-rebalancing
update_example_1
fix-too-many-open-files
update-hivemind
extract-module-container
instruction-readability-style
readme-clarifications
justheuristic-patch-5
fix-readme
ptune-example-personachat
rtfd
fix-pb2
investigate-segfault
upd-deps
priority-tasks
justheuristic-patch-4
cache
justheuristic-patch-3
generation-inference
deep_prompt_inference
warn-about-6b-instructions
update-readme-disclaimers-faq
justheuristic-patch-2
update-bullet-points
update-readme-pics
readme-release
remove-remote-block
prompt-inference
fix-cache
optimize_seq
fix-seq-backward-recovery
fix-distr-seq-cls
justheuristic-patch-1
fix-convert-8bit
memory_savings
distributed-deep-ptune
ptune-wip
pytest-verbose
rename-test-model
8bit_backward
8bit-model
8bit_model_inference
petals-readme-title
support-backend-dtypes
deep-prompt-tuning
mockup
efficient-forward-backward
fix-branch-name
dbaranchuk-patch-1
get_sequence
generation
fix-ci
fix-master-ci
test-push
facelift
CI
prompt-tuning
client-attempt2
measure-throughput
lm_head
load-balancing
sequence
demo-1
standardize
diff
rpc
update-model
client
fix-auth-token
multiple-experts
8bit_blocks
inference_chain
main_fix
v1.0.0
v1.1.0
v1.1.1
v1.1.2
v1.1.3
v1.1.4
v1.1.5
v2.0.0.post1
v2.0.0.post2
v2.0.0.post3
v2.0.1
v2.0.1.post1
v2.0.1.post2
v2.1.0
v2.2.0
${ noResults }
1 Commits (c0a4d2e3d5ad713cbcb5005148535669b1931420)
Author | SHA1 | Message | Date |
---|---|---|---|
Alexander Borzunov |
8c546d988a
|
Test Llama, rebalancing, throughput eval, and all CLI scripts (#452)
This PR extends CI to: 1. Test Llama code using [TinyLlama-v0](https://huggingface.co/Maykeye/TinyLLama-v0). 2. Test rebalancing (sets up a situation where the 1st server needs to change its original position). 3. Check if benchmark scripts run (in case someone breaks its code). Note that the benchmark results are meaningless here (since they're measured on a tiny swarm of CPU servers, with low `--n_steps`). 4. Test `petals.cli.run_dht`. 5. Increase swap space and watch free RAM (a common issue is that actions are cancelled without explanation if there's not enough RAM - so it's a useful reminder + debug tool). 6. Fix flapping tests for bloom-560m by increasing tolerance. Other minor changes: fix `--help` messages to show defaults, fix docs, tune rebalancing constants. |
1 year ago |