speculative_inference
main
hf_quantization_integration
borzunov-patch-3
test_set_position
test-with-jf160m
step_metadata
speculative_test
test_branch
forward_backward
fix-docker
forward_kwargs
bump
test_main
fix-inference-retry
lora_from_hub
payload-size
partial_rollback
qkv_merge
no_qkv_merge
wip_triton
hivemind-dht-fork-process
repetition-penalty
amd-gpus
bnb-0-41-1
lru
beat-docker-into-submission
measurements
debug-leak
fix-nf4-and-dtypes
declare_adapters
empty-weights
download_8bit_weights
no-cpufeature
versions
test_opt_serving
borzunov-patch-2
borzunov-patch-1
processing_attention
yozh-dev-branch
server-increase-startup-timeout
vectorized_beam_search
friendly-timeout-errors
hivemind-1.1.4
fix3
hotfix_bnb
fix-ptune
server-dtypes
pip-installable-v2
pip-installable
diff-compression
client-convenience
server-timeouts
server-logging
beamsearch
fix-protobuf
fix-requirements
fix-joining-announce
bootstrap-peers
fault-tolerant-inference
examples_fix_hivemind
forward-backward-timeouts
fix-rebalancing-issues
add-sst2-example
enable-rebalancing
update_example_1
fix-too-many-open-files
update-hivemind
extract-module-container
instruction-readability-style
readme-clarifications
justheuristic-patch-5
fix-readme
ptune-example-personachat
rtfd
fix-pb2
investigate-segfault
upd-deps
priority-tasks
justheuristic-patch-4
cache
justheuristic-patch-3
generation-inference
deep_prompt_inference
warn-about-6b-instructions
update-readme-disclaimers-faq
justheuristic-patch-2
update-bullet-points
update-readme-pics
readme-release
remove-remote-block
prompt-inference
fix-cache
optimize_seq
fix-seq-backward-recovery
fix-distr-seq-cls
justheuristic-patch-1
fix-convert-8bit
memory_savings
distributed-deep-ptune
ptune-wip
pytest-verbose
rename-test-model
8bit_backward
8bit-model
8bit_model_inference
petals-readme-title
support-backend-dtypes
deep-prompt-tuning
mockup
efficient-forward-backward
fix-branch-name
dbaranchuk-patch-1
get_sequence
generation
fix-ci
fix-master-ci
test-push
facelift
CI
prompt-tuning
client-attempt2
measure-throughput
lm_head
load-balancing
sequence
demo-1
standardize
diff
rpc
update-model
client
fix-auth-token
multiple-experts
8bit_blocks
inference_chain
main_fix
v1.0.0
v1.1.0
v1.1.1
v1.1.2
v1.1.3
v1.1.4
v1.1.5
v2.0.0.post1
v2.0.0.post2
v2.0.0.post3
v2.0.1
v2.0.1.post1
v2.0.1.post2
v2.1.0
v2.2.0
${ noResults }
2 Commits (c0a4d2e3d5ad713cbcb5005148535669b1931420)
Author | SHA1 | Message | Date |
---|---|---|---|
Artem Chumachenko |
b9f0a5467f
|
Support peft LoRA adapters (#335)
Implement an option to deploy PEFT adapters to a server. Clients can set active_adapter=... to use these adapters. --------- Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com> Co-authored-by: justheuristic <justheuristic@gmail.com> |
1 year ago |
justheuristic |
f0c7383181
|
Implement RemoteSequential slicing and extra repr, add tests (#30)
- finish renaming RemoteSequenceInfo -> RemoteSequenceManager (why: if it was an *Info, user would expect it to be similar - to a dataclass; whereas in actuality, the class is doing heavy network interactions on its own) - implement RemoteSequenceManager.make_sequence (from https://pastebin.com/uXgy2U8B ) - make RemoteSequentialInferenceSession use RemoteSequenceManager.make_sequence - make tests pass again - make it possible to create inference session without RemoteTransformerBlock - make a standalone test for RemoteSequential - rollback convert-model Co-authored-by: Tim Dettmers <tim.dettmers@gmail.com> |
2 years ago |