petals

Commit Graph

Author	SHA1	Message	Date
Your Name	cc67c332a6	the (still) reasonable version	10 months ago
Alexander Borzunov	11f0d992d7	Report inference, forward, and network RPS separately (#358 ) Inference RPS may be very different from forward RPS. E.g., currently bnb uses a completely different algorithm for NF4 inference. We report detailed RPS info that can be then used for shortest-path routing for inference.	11 months ago
Alexander Borzunov	1a78638c02	Test that bitsandbytes is not imported when it's not used (#351 ) We avoid importing bitsandbytes when it's not used, since bitsandbytes doesn't always find correct CUDA libs and may raise exceptions because of that.	11 months ago
Artem Chumachenko	b9f0a5467f	Support peft LoRA adapters (#335 ) Implement an option to deploy PEFT adapters to a server. Clients can set active_adapter=... to use these adapters. --------- Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com> Co-authored-by: justheuristic <justheuristic@gmail.com>	11 months ago
Alexander Borzunov	de930918a0	Support loading blocks in 4-bit (QLoRA NF4 format, disabled by default) (#333 )	11 months ago
Alexander Borzunov	cb3f018f9f	Add LLaMA support (#323 ) This PR: 1. Abolishes the model conversion procedure. Now, models are downloaded directly from original repositories like https://huggingface.co/bigscience/bloom. Servers download only shards with blocks to be hosted, and clients download only shards with input/output embeddings and layernorms. - BLOOM is loaded from `bigscience/bloom`, but we use the DHT prefix `bigscience/bloom-petals` for backward compatibility. Same with smaller BLOOMs and BLOOMZ. - LLaMA can be loaded from any repo like `username/llama-65b-hf`, but we use the DHT prefix `llama-65b-hf` (without the username) to accomodate blocks from different repos (there're a few of them with minor differences, such as `Llama` vs. `LLaMA` in the class name). 2. Refactors the client to generalize it for multiple models. Now, we have `petals.models` packages that contain model-specific code (e.g. `petals.models.bloom`, `petals.models.llama`). General code (e.g. CPU-efficient LM head, p-tuning) is kept in `petals.client`. 3. Introduces `WrappedLlamaBlock`, `DistributedLlamaConfig`, `DistributedLlamaForCausalLM`, `DistributedLlamaForSequenceClassification`, and `DistributedLlamaModel` compatible with Petals functionality (p-tuning, adapters, etc.). 4. Introduces `AutoDistributedConfig` that automatically chooses the correct config class (`DistributedLlamaConfig` or `DistributedBloomConfig`). The refactored configs contain all model-specific info for both clients and servers. Upgrade instructions: - Remove disk caches for blocks in old (converted) format to save disk space. That is, remove `~/.cache/petals/model--bigscience--bloom-petals` and `~/.cache/petals/model--bigscience--bloomz-petals` directories (if present).	11 months ago
Max Ryabinin	c839173e57	Determine block dtype in a unified manner (#325 ) * Extract backend_dtype, remove duplicate DTYPE_MAP * Use bfloat16 as the default dtype, resolve dtype in load_pretrained_block	12 months ago
Max Ryabinin	3e7ae5116d	Remove unused imports and attributes (#324 ) * Remove unused imports and attributes	12 months ago
Alexander Borzunov	6137b1b4b0	Replace .make_sequence(..., mode="random") with mode="max_throughput" (#313 ) We need to sample the next server using its throughput as the weight to actually achieve max throughput for fine-tuning. As an example, imagine a situation where we have 3 servers with throughputs [1000, 500, 1] hosting the same blocks, then compare the uniform and weighted sampling strategies.	1 year ago
Alexander Borzunov	8f6342a861	Refactor RemoteSequenceManager (#309 ) This PR: 1. Extracts `SequenceManagerConfig` and `SequenceManagerState` subclasses. The config is provided by caller and never changed from inside `RemoteSequenceManager`. The state is a part of the `RemoteSequenceManager`'s state shared between the main manager and its slices. We fix some slicing bugs along the way. 2. Removes `dht_prefix` and `p2p` arguments, makes `dht` argument optional. `dht_prefix` can always be overridden using `config.dht_prefix`. `p2p` actually needed only under the hood of `RemoteSequenceManager`, so it can extract it by itself without exposing this low-level class to callers. If strictly necessary, a caller can provide `p2p` as a part of `SequenceManagerState`. `dht` is also needed only by `RemoteSequenceManager`, so we can make it optional in the parent classes and create it automatically when it's not provided. 3. Simplifies retry logic. Previously, we could have "nested" retry loops: one in `._update()`, another in inference/forward/backward steps. The loop in `._update()` could introduce issues to concurrent inference/forward/backward calls, since it blocks the entire class if its delay period becomes too high. Now this logic is simplified: `._update()` performs only one attempt to fetch the DHT info, any retries are triggered by the inference/forward/backward steps. 4. Removes deprecated `RemoteTransformerBlock`. `RemoteTransformerBlock` was deprecated a long time ago, before Petals 1.0.0. Its removal is long due. 5. Removes `dht_utils.get_remote_module()`, `dht_utils.get_remote_sequence()`. This functions duplicate the functionality of the `RemoteSequential` constructor. 6. (minor) Removes `RemoteSequential.is_subsequence` flag. This flag worked incorrectly and was never used. I am removing it for the sake of simplicity.	1 year ago
Alexander Borzunov	21c3526ec1	Start SequenceManager's thread only after first .make_sequence() (#301 ) Why? - We'd like to avoid excess threads for the original sequence manager in case if we only use its slices (e.g. when we add adapters or need only a subset of model blocks): - If we create a sequence manager just before a fork (e.g. in a web app backend or a multi-thread benchmark), we'd like to avoid excess threads in the original process and only use this thread in child processes where we actually call `.make_sequence()`.	1 year ago
Alexander Borzunov	892fa2386a	Remove CustomLinear8bitLt (#297 ) This became a part of https://github.com/TimDettmers/bitsandbytes/releases/tag/0.37.0.	1 year ago
Max Ryabinin	793726b041	Speed up loading blocks using init with meta weights (#285 ) * Init WrappedBloomBlock with meta weights --------- Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>	1 year ago
Alexander Borzunov	fee19e9b9b	Use get_logger(__name__) instead of get_logger(__file__) (#265 )	1 year ago
Alexander Borzunov	702bb5a2c2	CI: Update deprecated actions, don't measure network RPS (#215 ) * CI: Switch to actions/cache@v3 (v2 is deprecated) * Don't run measure_network_rps() in tests since it doesn't work well in CI	1 year ago
justheuristic	5f58f00649	Return available cache size in rpc_info() (#191 ) This PR makes servers return their free cache (in tokens * layers to make it compression-agnostic) To be used when calling make_sequence(optimize="inference")	1 year ago
justheuristic	012f840f7e	Use length-weighted sampling in routing for inference (#204 ) This pull-request implements a simple (1) greedy (2) latency-agnostic routing optimization that should speed up both our use cases. Why this exists: our effort to merge full routing (ping-aware, throughut-aware, dijkstra) is in a sorry state between several branches; merging it into main would take many days. Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>	1 year ago
justheuristic	c2cb6d19ae	Increase tolerances in test_tp_block (#196 ) deflapify tests	1 year ago
justheuristic	ae9e71fe8e	Add local tensor-parallel fwd/bwd (#143 ) This pull request adds an option to run Petals server on multiple local GPUs. It uses https://github.com/BlackSamorez/tensor_parallel - 8bit approximation error same as in main (mean~=2% q0.9~=5%) - TP=1, 2, 3 (see screenshots above) - forward, grad w.r.t. input and inference exact match with main with TP=1 - `>=`80% GPU utilization with 3x 1080ti, batch = 8 tokens - throughput measured with and without TP - TP on 1080Tis has near-linear speedup comparable to the benchmarks (see first message) Co-authored-by: Iaroslav Lisniak <yalisnyak@nes.ru> Co-authored-by: Andrei Panferov <andrei@blacksamorez.ru> Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>	1 year ago
Alexander Borzunov	523a7cad33	Fix issues related to `petals` as a module (#159 ) 1. Added `from petals.client import *` to `petals/__init__.py`, so you can write just that: ```python from petals import DistributedBloomForCausalLM ``` I didn't do the same with server, since its classes are supposed to by used by `petals.cli.run_server`, not end-users. Though it's still possible to do `from petals.server.smth import smth` if necessary. 2. Fixed one more logging issue: log lines from hivemind were shown twice due to a bug in #156. 3. Removed unused `runtime.py`, since the server actually uses `hivemind.moe.Runtime`, and `runtime.py` has no significant changes comparing to it.	1 year ago
justheuristic	91898c3c90	Switch to speedtest-cli (#157 ) This pullrequest removes custom speed_test code in favour of speedtest-cli module. This is necessary to ensure that random warnings / print-outs do not mess with our outputs. Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>	1 year ago
Alexander Borzunov	668b736031	Fix logging: do not duplicate lines, enable colors in Colab (#156 )	1 year ago
Max Ryabinin	bd91be27ea	Add missing methods for SamplingAlgorithm, fix docstrings (#107 ) * Add missing methods for SamplingAlgorithm, fix docstrings * Add SamplingAlgorithm to _choose_sample_algorithm * Add test_sampling * Add a warning if sampling options were passed, but do_sample=False * Skip the sampling test for now Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>	1 year ago
Max Ryabinin	a0e8bbd28d	Fix arguments in remove_old_models.py (#153 ) * Fix arguments in remove_old_models.py * Remove unnecessary args.author * Fix the GitHub Action as well	1 year ago
justheuristic	b04982c1a2	Bump transformers to 4.25.1 (#151 ) - latest accelerate, transformers, huggingface_hub - rearrange attention caches to support https://github.com/huggingface/transformers/pull/18344 - remove unused code - fix edge case where session crashes when receiving seq length 0 - assert transformer version when importing WrappedBloomBlock Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>	1 year ago
justheuristic	617d70f7dc	Support --load_in_8bit on pre-Turing GPUs (#113 ) - Linear8bitLt now supports for pre-turing GPUs by temporarily upcasting quantized weights. - added a test for linear8bitlt accuracy with the new fallback, the accuracy is similar than the real thing, (slightly better due to non-quantized A) - performance is roughly halfway between the default mode and memory_efficient_backward Alternatives considered: - cupy - slow, casting to float internally - triton - fast but unstable af. every 3rd attempt to matmul is a segfault - bnb.functional.igemm (no lt) - "CuBLAS Error 8" on old GPUs Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>	2 years ago
justheuristic	01838f9a99	Fix Linear8bitlt state config, update tests (#112 ) * fix state initializer * update tests to actually use new code * keep bias during quantization	2 years ago
justheuristic	088713912d	Patch Linear8bit to enable CxB backward (#111 ) A patch to bitsandbytes 0.34.0 that introduces an option to run backward pass in default (fast) matrix layout. Authors: cxb inversion by @borzunov, original 8bit code by @timdettmers * optimized layout inversion code by @borzunov ([original code](https://colab.research.google.com/drive/1EJ0MKifajXSSVq7O2_QGwtb0l6gRAGrh?usp=sharing)) to use less forward calls * implemented CustomLinear8bitLt, a child of Linear8bitLt that can do backward without CB * added exact match tests for layouts and linear layers: see tests/test_linear8bitlt.py * switched petals to the new layer type Core idea: layouts apply the same permutation to every tile in the matrix. We can treat this as (batched) gather ops. Reshape input tensor so that ij-th gather operation op will apply to ij-th elements in each tile. Prototype: Layout info: https://github.com/TimDettmers/bitsandbytes/blob/main/csrc/kernels.cu#L2130-L2136 Co-authored-by: Alexander Borzunov <hxrussia@gmail.com> Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Tim Dettmers <tim.dettmers@gmail.com>	2 years ago
justheuristic	8dc0f513ba	Hotfix span selection (#110 ) Fix an issue in span selection that was introduced in #106	2 years ago
justheuristic	a2066a4096	Optimize RemoteSequenceManager (#106 ) - [x] made RemoteSequenceManager into a background thread that pre-fetches information instead of running just in time - [x] moved routing-related stuff to petals.client.routing - [x] extract remote peer routing information to RemoteSequenceInfo - [x] made sure that the code survives continued use (e.g. one hour) - [x] updated every spot where update_ is called manually - [x] modified get_sequence to check that the thread is alive, warn if not - [x] removed max_retries, switched rpc_info to exponential backoff - [x] fixed a bg that causes RemoteSeq* to lose user-defined hyperparameters (e.g. timeout) upon subsequencing (sequential[3:5]) - [x] moved client-side points strategy to client.routing - [x] ensured that RemoteSequenceManager thread created in get_remote_module properly shuts down when the module is destroyed - [x] resolved minor affected todos - [x] modified tests to no longer use PYTHONPATH - [x] worked around protocol error in rpc_info Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Artem Chumachenko <artek.chumak@gmail.com>	2 years ago
Alexander Borzunov	43ac6016ac	Fix dtypes in backend schemas (#99 ) Currently, the schemas use `torch.float32`, so all inputs and outputs converted to float32 before sending and after receiving on both servers and clients. This creates a huge slowdown for the system. * This PR makes the schemas use the server's `--torch_dtype` argument (default is `torch.bloat16` for BLOOM-176B) * an option for client to request a specific output compression. Use case 1: client sends quantized inputs and expects quantized inputs in return. Use case 2: client uses quantization for gradients w.r.t. activations, but keeps grads w.r.t. __prompts__ as is for greater precision. * a comment explaining the purpose of NoSpendingPolicy - since we likely won't have it for the workshop * a test with custom compression (janky implementation for testing purposes) Co-authored-by: justheuristic <justheuristic@gmail.com>	2 years ago
Alexander Borzunov	7bd5916744	Make Petals a pip-installable package (attempt 2) (#102 ) 1. Petals can be now installed using `pip install git+https://github.com/bigscience-workshop/petals` - In case if you already cloned the repo, you can do `pip install .` or `pip install .[dev]` 2. Moved `src` => `src/petals` - Replaced `from src.smth import smth` with `from petals.smth import smth` 3. Moved `cli` => `src/petals/cli` - Replaced `python -m cli.run_smth` with `python -m petals.cli.run_smth` (all utilities are now available right after pip installation) 4. Moved the `requirements*.txt` contents to `setup.cfg` (`requirements.txt` for packages is not supported well by modern packaging utils) 5. Increased the package version from `0.2` to `1.0alpha1`	2 years ago
Artem Chumachenko	fdb3583a8c	Add Beam Search decoding algorithm (#87 ) Add beam_search	2 years ago
Alexander Borzunov	11d6ba683c	Make inference, forward, and backward fully fault-tolerant (#91 )	2 years ago
Pavel Samygin	50535a8435	Priority tasks (#47 ) * priority in handlers and backend pools * simple points system on server side * priortize task in handler before submit task * fix tests * s/expert/block/g Co-authored-by: justheuristic <justheuristic@gmail.com>	2 years ago
Pavel Samygin	0be21775af	remove transformer block, implement as sequential of size 1 (#54 ) * remove transformer block, implement as sequence size 1 * reimplement get_remote_module * fix readme Co-authored-by: Alexander Borzunov <hxrussia@gmail.com> Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com>	2 years ago
justheuristic	d271b75dd4	Let users specify sequence length instead of assuming 2048 (#52 ) - Maximum length is now provided in `.inference_session(max_length=100)` - previously, we would always assume max length = 2048 - added a generic way to forward *kwargs to inference session - for compatibility with #47 - Note to @borzunov : it does not* pass them arbitrarily, but instead checks for kwarg names at the bottom level - run_server can be started with a custom max_length for inference - renamed --cache_size_bytes to --attention_cache_bytes (to avoid collision with --cache_dir) - --attn_cache_bytes can now support humane file sizes (e.g. 300MB instead of 314572800) - made some server-side errors more human-readable to user (e.g. when max length is exceeded) Co-authored-by: Aleksandr Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>	2 years ago
justheuristic	a2634001e9	Reduce vocabulary size in test model, fix bug in routing when overlapped (#45 ) This PR reduces this vocabulary size to save memory during conversion, keeping only the first 50k tokens As a result, * tests that load client-side embeddings need significantly less RAM * we can now run CI tests with 4 servers instead of 2 - needed to test routing - see bugs uncovered * some of the servers now use load balancing * CI convert_model now takes 4-5 minutes (was 6-7)	2 years ago
Dmitry Baranchuk	6095f58681	Deep distributed prompt tuning (#42 ) * implemented an option to add learnable prompts to intermediate layers * added support for prompts (as input) in rpc_forward and rpc_backward * added a test to check that RemoteSequential works correctly with deep prompts Co-authored-by: justheuristic <justheuristic@gmail.com>	2 years ago
Artem Chumachenko	d989b94614	Pack of Inference Changes (#37 ) * Return multibatch mode * Add tests * fixes	2 years ago
justheuristic	f0cffbf67e	Miscellaneous fixes to automatic tests (#35 ) 1. __Reduce memory usage in in test_full_model__ - previously, loading the full model would consistently fail IF github is enforcing memory limit [example](https://github.com/bigscience-workshop/distributed-bloom/runs/7473920049?check_suite_focus=true) - the new version uses accelerate to save 2GB of peak memory, that was previously used when loading both reference model AND its state dict at the same time - only to load that state dict :) 2. __Safer delays when creating servers__ - run-tests will now wait for a few seconds after creating the first server - and before creating the second one, so as to make sure that the first server creates a DHT instance that subsequent servers can connect to. - also increased the wait time after creating servers by 30 seconds to make sure we load the model in time even when bumping into slow remotes on HF side 3. __Fix environment variables in CI to avoid build conflicts__ - the previous code was using a wrong environment variable that was always "main". The current one will correctly resolve branch name, both in main and on pull request. - For reference, below you can find sample environments when running CI in both cases: on pull request and on push to main. <details> <summary> Environment variables when building this branch (on pull request) </summary> SELENIUM_JAR_PATH=/usr/share/java/selenium-server.jar GOROOT_1_17_X64=/opt/hostedtoolcache/go/1.17.12/x64 CONDA=/usr/share/miniconda GITHUB_WORKSPACE=/home/runner/work/distributed-bloom/distributed-bloom JAVA_HOME_11_X64=/usr/lib/jvm/temurin-11-jdk-amd64 GITHUB_PATH=/home/runner/work/_temp/_runner_file_commands/add_path_0aba811a-a04b-40a2-ba42-79efb2723e9e GITHUB_ACTION=__run_2 JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64 GITHUB_RUN_NUMBER=98 RUNNER_NAME=GitHub Actions 3 GRADLE_HOME=/usr/share/gradle-7.5 XDG_CONFIG_HOME=/home/runner/.config DOTNET_SKIP_FIRST_TIME_EXPERIENCE=1 ANT_HOME=/usr/share/ant JAVA_HOME_8_X64=/usr/lib/jvm/temurin-8-jdk-amd64 HOMEBREW_PREFIX=/home/linuxbrew/.linuxbrew pythonLocation=/opt/hostedtoolcache/Python/3.9.13/x64 GITHUB_REF_TYPE=branch HOMEBREW_CLEANUP_PERIODIC_FULL_DAYS=3650 BOOTSTRAP_HASKELL_NONINTERACTIVE=1 * PIPX_BIN_DIR=/opt/pipx_bin DEPLOYMENT_BASEPATH=/opt/runner GITHUB_ACTIONS=true ANDROID_NDK_LATEST_HOME=/usr/local/lib/android/sdk/ndk/24.0.8215888 GITHUB_SHA=3b457e8a14e5ecb0d65d6e4c0e9161f7756a8861 POWERSHELL_DISTRIBUTION_CHANNEL=GitHub-Actions-ubuntu20 DOTNET_MULTILEVEL_LOOKUP=0 GITHUB_REF=refs/pull/35/merge RUNNER_OS=Linux GITHUB_REF_PROTECTED=false HOME=/home/runner GITHUB_API_URL=https://api.github.com/ LANG=C.UTF-8 BLOOM_TESTING_WRITE_TOKEN=* RUNNER_TRACKING_ID=github_cc9b46e4-56a1-40c5-ba08-5a91e21f0f95 STATS_KEEPALIVE=false RUNNER_ARCH=X64 RUNNER_TEMP=/home/runner/work/_temp EDGEWEBDRIVER=/usr/local/share/edge_driver GITHUB_ENV=/home/runner/work/_temp/_runner_file_commands/set_env_0aba811a-a04b-40a2-ba42-79efb2723e9e GITHUB_EVENT_PATH=/home/runner/work/_temp/_github_workflow/event.json INVOCATION_ID=8f0072e74f2847c0851e7ff9b5e4af7c GITHUB_EVENT_NAME=pull_request GITHUB_RUN_ID=2720198689 JAVA_HOME_17_X64=/usr/lib/jvm/temurin-17-jdk-amd64 ANDROID_NDK_HOME=/usr/local/lib/android/sdk/ndk-bundle GITHUB_STEP_SUMMARY=/home/runner/work/_temp/_runner_file_commands/step_summary_0aba811a-a04b-40a2-ba42-79efb2723e9e HOMEBREW_NO_AUTO_UPDATE=1 GITHUB_ACTOR=justheuristic NVM_DIR=/home/runner/.nvm SGX_AESM_ADDR=1 GITHUB_RUN_ATTEMPT=1 ANDROID_HOME=/usr/local/lib/android/sdk GITHUB_GRAPHQL_URL=https://api.github.com/graphql ACCEPT_EULA=Y RUNNER_USER=runner USER=runner GITHUB_SERVER_URL=https://github.com/ HOMEBREW_CELLAR=/home/linuxbrew/.linuxbrew/Cellar PIPX_HOME=/opt/pipx GECKOWEBDRIVER=/usr/local/share/gecko_driver CHROMEWEBDRIVER=/usr/local/share/chrome_driver SHLVL=0 ANDROID_SDK_ROOT=/usr/local/lib/android/sdk VCPKG_INSTALLATION_ROOT=/usr/local/share/vcpkg HOMEBREW_REPOSITORY=/home/linuxbrew/.linuxbrew/Homebrew RUNNER_TOOL_CACHE=/opt/hostedtoolcache ImageVersion=20220717.1 DOTNET_NOLOGO=1 GITHUB_REF_NAME=35/merge STATS_PFS=true GRAALVM_11_ROOT=/usr/local/graalvm/graalvm-ce-java11-22.1.0 GITHUB_JOB=convert-model LD_LIBRARY_PATH=/opt/hostedtoolcache/Python/3.9.13/x64/lib XDG_RUNTIME_DIR=/run/user/1001 AZURE_EXTENSION_DIR=/opt/az/azcliextensions PERFLOG_LOCATION_SETTING=RUNNER_PERFLOG GITHUB_REPOSITORY=bigscience-workshop/distributed-bloom ANDROID_NDK_ROOT=/usr/local/lib/android/sdk/ndk-bundle CHROME_BIN=/usr/bin/google-chrome GOROOT_1_18_X64=/opt/hostedtoolcache/go/1.18.4/x64 GITHUB_RETENTION_DAYS=90 JOURNAL_STREAM=8:23653 RUNNER_WORKSPACE=/home/runner/work/distributed-bloom LEIN_HOME=/usr/local/lib/lein LEIN_JAR=/usr/local/lib/lein/self-installs/leiningen-2.9.8-standalone.jar GITHUB_ACTION_REPOSITORY= PATH=/opt/hostedtoolcache/Python/3.9.13/x64/bin:/opt/hostedtoolcache/Python/3.9.13/x64:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/snap/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin RUNNER_PERFLOG=/home/runner/perflog GITHUB_BASE_REF=main CI=true SWIFT_PATH=/usr/share/swift/usr/bin ImageOS=ubuntu20 GITHUB_REPOSITORY_OWNER=bigscience-workshop GITHUB_HEAD_REF=fix-branch-name GITHUB_ACTION_REF= GITHUB_WORKFLOW=Tests DEBIAN_FRONTEND=noninteractive AGENT_TOOLSDIRECTORY=/opt/hostedtoolcache GOROOT_1_16_X64=/opt/hostedtoolcache/go/1.16.15/x64 _=/usr/bin/env </details> <details> <summary> Environment variables when building in main (on push) </summary> SELENIUM_JAR_PATH=/usr/share/java/selenium-server.jar GOROOT_1_17_X64=/opt/hostedtoolcache/go/1.17.11/x64 CONDA=/usr/share/miniconda GITHUB_WORKSPACE=/home/runner/work/distributed-bloom/distributed-bloom JAVA_HOME_11_X64=/usr/lib/jvm/temurin-11-jdk-amd64 GITHUB_PATH=/home/runner/work/_temp/_runner_file_commands/add_path_cd6c1ed2-0d0f-496d-b7a6-ffa476dcc144 GITHUB_ACTION=__run_2 JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64 GITHUB_RUN_NUMBER=53 RUNNER_NAME=GitHub Actions 3 GRADLE_HOME=/usr/share/gradle-7.4.2 XDG_CONFIG_HOME=/home/runner/.config DOTNET_SKIP_FIRST_TIME_EXPERIENCE=1 ANT_HOME=/usr/share/ant JAVA_HOME_8_X64=/usr/lib/jvm/temurin-8-jdk-amd64 HOMEBREW_PREFIX=/home/linuxbrew/.linuxbrew pythonLocation=/opt/hostedtoolcache/Python/3.9.13/x64 GITHUB_REF_TYPE=branch HOMEBREW_CLEANUP_PERIODIC_FULL_DAYS=3650 BOOTSTRAP_HASKELL_NONINTERACTIVE=1 * PIPX_BIN_DIR=/opt/pipx_bin DEPLOYMENT_BASEPATH=/opt/runner GITHUB_ACTIONS=true ANDROID_NDK_LATEST_HOME=/usr/local/lib/android/sdk/ndk/24.0.8215888 GITHUB_SHA=49242d81006454d687ff3293c49f6bf234793627 POWERSHELL_DISTRIBUTION_CHANNEL=GitHub-Actions-ubuntu20 DOTNET_MULTILEVEL_LOOKUP=0 GITHUB_REF=refs/heads/main RUNNER_OS=Linux GITHUB_REF_PROTECTED=true HOME=/home/runner GITHUB_API_URL=https://api.github.com/ LANG=C.UTF-8 BLOOM_TESTING_WRITE_TOKEN=* RUNNER_TRACKING_ID=github_7668f06a-99e1-4ed1-81e9-46d75fab3f33 STATS_KEEPALIVE=false RUNNER_ARCH=X64 RUNNER_TEMP=/home/runner/work/_temp EDGEWEBDRIVER=/usr/local/share/edge_driver GITHUB_ENV=/home/runner/work/_temp/_runner_file_commands/set_env_cd6c1ed2-0d0f-496d-b7a6-ffa476dcc144 GITHUB_EVENT_PATH=/home/runner/work/_temp/_github_workflow/event.json INVOCATION_ID=3dadac48981b4a679a33224db89be1ed GITHUB_EVENT_NAME=push GITHUB_RUN_ID=2680158280 JAVA_HOME_17_X64=/usr/lib/jvm/temurin-17-jdk-amd64 ANDROID_NDK_HOME=/usr/local/lib/android/sdk/ndk-bundle GITHUB_STEP_SUMMARY=/home/runner/work/_temp/_runner_file_commands/step_summary_cd6c1ed2-0d0f-496d-b7a6-ffa476dcc144 HOMEBREW_NO_AUTO_UPDATE=1 GITHUB_ACTOR=justheuristic NVM_DIR=/home/runner/.nvm SGX_AESM_ADDR=1 GITHUB_RUN_ATTEMPT=1 ANDROID_HOME=/usr/local/lib/android/sdk GITHUB_GRAPHQL_URL=https://api.github.com/graphql ACCEPT_EULA=Y RUNNER_USER=runner USER=runner GITHUB_SERVER_URL=https://github.com/ HOMEBREW_CELLAR=/home/linuxbrew/.linuxbrew/Cellar PIPX_HOME=/opt/pipx GECKOWEBDRIVER=/usr/local/share/gecko_driver CHROMEWEBDRIVER=/usr/local/share/chrome_driver SHLVL=0 ANDROID_SDK_ROOT=/usr/local/lib/android/sdk VCPKG_INSTALLATION_ROOT=/usr/local/share/vcpkg HOMEBREW_REPOSITORY=/home/linuxbrew/.linuxbrew/Homebrew RUNNER_TOOL_CACHE=/opt/hostedtoolcache ImageVersion=20220710.1 DOTNET_NOLOGO=1 GITHUB_REF_NAME=main STATS_PFS=true GRAALVM_11_ROOT=/usr/local/graalvm/graalvm-ce-java11-22.1.0 GITHUB_JOB=convert-model LD_LIBRARY_PATH=/opt/hostedtoolcache/Python/3.9.13/x64/lib XDG_RUNTIME_DIR=/run/user/1001 AZURE_EXTENSION_DIR=/opt/az/azcliextensions PERFLOG_LOCATION_SETTING=RUNNER_PERFLOG GITHUB_REPOSITORY=bigscience-workshop/distributed-bloom CHROME_BIN=/usr/bin/google-chrome ANDROID_NDK_ROOT=/usr/local/lib/android/sdk/ndk-bundle GOROOT_1_18_X64=/opt/hostedtoolcache/go/1.18.3/x64 GITHUB_RETENTION_DAYS=90 JOURNAL_STREAM=8:22000 RUNNER_WORKSPACE=/home/runner/work/distributed-bloom LEIN_HOME=/usr/local/lib/lein LEIN_JAR=/usr/local/lib/lein/self-installs/leiningen-2.9.8-standalone.jar GITHUB_ACTION_REPOSITORY= PATH=/opt/hostedtoolcache/Python/3.9.13/x64/bin:/opt/hostedtoolcache/Python/3.9.13/x64:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/snap/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin RUNNER_PERFLOG=/home/runner/perflog GITHUB_BASE_REF= CI=true SWIFT_PATH=/usr/share/swift/usr/bin ImageOS=ubuntu20 GITHUB_REPOSITORY_OWNER=bigscience-workshop GITHUB_HEAD_REF= GITHUB_ACTION_REF= GITHUB_WORKFLOW=Tests DEBIAN_FRONTEND=noninteractive AGENT_TOOLSDIRECTORY=/opt/hostedtoolcache GOROOT_1_16_X64=/opt/hostedtoolcache/go/1.16.15/x64 _=/usr/bin/env </details> Co-authored-by: Dmitry Baranchuk <dmitrybaranchuk@gmail.com>	2 years ago
justheuristic	f0c7383181	Implement RemoteSequential slicing and extra repr, add tests (#30 ) - finish renaming RemoteSequenceInfo -> RemoteSequenceManager (why: if it was an *Info, user would expect it to be similar - to a dataclass; whereas in actuality, the class is doing heavy network interactions on its own) - implement RemoteSequenceManager.make_sequence (from https://pastebin.com/uXgy2U8B ) - make RemoteSequentialInferenceSession use RemoteSequenceManager.make_sequence - make tests pass again - make it possible to create inference session without RemoteTransformerBlock - make a standalone test for RemoteSequential - rollback convert-model Co-authored-by: Tim Dettmers <tim.dettmers@gmail.com>	2 years ago
justheuristic	e2711a033b	Add automated tests (#23 ) This PR will run basic tests automatically on each subsequent PR - convert a small model on every PR - run existing tests on every PR - enforce black / isort - require checks on merge - make sure tests are not flappy Co-authored-by: Alexander Borzunov <hxrussia@gmail.com> Co-authored-by: Dmitry Baranchuk <dmitrybaranchuk@gmail.com>	2 years ago
justheuristic	4eadd00a2c	rm prefix from tests	2 years ago
justheuristic	e32208c954	black-isort	2 years ago
justheuristic	4ad845bce3	black-isort	2 years ago
Dmitry Baranchuk	4cb986f680	add chained rpc_forward & rpc_backward	2 years ago
Dmitry Baranchuk	e66ab6f1f2	design interface & refactoring	2 years ago
Dmitry Baranchuk	6a603f9cd6	set requires_grad=False, lm_layer -> h @ word_embeddings, rm lm_layer from comverted_model	2 years ago
Dmitry Baranchuk	d969172208	set requires_grad=False, lm_layer -> h @ word_embeddings, rm lm_layer from comverted_model	2 years ago

1 2

58 Commits (lru)