diff --git a/.github/workflows/check-style.yaml b/.github/workflows/check-style.yaml index 29a0f82..94b9517 100644 --- a/.github/workflows/check-style.yaml +++ b/.github/workflows/check-style.yaml @@ -2,7 +2,7 @@ name: Check style on: push: - branches: [ master ] + branches: [ main ] pull_request: jobs: diff --git a/.github/workflows/run-tests.yaml b/.github/workflows/run-tests.yaml index e6bb8ea..a4ca723 100644 --- a/.github/workflows/run-tests.yaml +++ b/.github/workflows/run-tests.yaml @@ -2,7 +2,7 @@ name: Tests on: push: - branches: [ master ] + branches: [ main ] pull_request: jobs: diff --git a/README.md b/README.md index 21182b6..7ffac41 100644 --- a/README.md +++ b/README.md @@ -1,38 +1,24 @@ -# bloom-demo -Early dev prototype for decentralized bloom. Not for public eyes **yet**. +# Distributed BLOOM -Roadmap: [issue #12](https://github.com/learning-at-home/bloom-demo/issues/12) +Run the largest open language model by collaborating over the internet. -Latest news @ main branch (max 5): -- [Jul 4] @dbaranchuk implemented chained rpc_forward and rpc_backward (for prompt tuning) -- [Jul 3] @dbaranchuk optimized DistributedBloom to reduce embeddings/logits RAM usage -- [Jul 1] @yozh added RemoteSequential and test for full model exact match -- [June 28] @dbaranchunk added quick deployment scripts for testnet +__[EARLY PROTOTYPE]__ - this project is a work in progress. Stuff breaks and gets fixed every day. Docs are nonexistent. +If you want us to wake you up when it's ready, click Watch -> Custom and tick "Releases". -### install +Roadmap: [__issue #12__](https://github.com/learning-at-home/bloom-demo/issues/12) +### Installation ```bash -conda create -y --name bloom-demo python=3.8.12 pip -conda activate bloom-demo - conda install -y -c conda-forge cudatoolkit-dev==11.3.1 cudatoolkit==11.3.1 cudnn==8.2.1.32 -pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html -pip install accelerate==0.10.0 huggingface-hub==0.7.0 hivemind==1.1.0 -pip install bitsandbytes-cuda113==0.26.0 -pip install https://github.com/huggingface/transformers/archive/6589e510fa4e6c442059de2fab84752535de9b23.zip +pip install torch==1.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html +pip install -r requirements.txt ``` -### run local inference: -No networking whatsoever, used to verify architecture optimizations - -```bash -# run one bloom block for a few steps -- on a local machine -python -m cli.inference_one_block --config cli/config.json # see other args -``` +### Basic functionality -### run distributed inference / training +All tests is run on localhost First, run one or more servers like this: ```bash @@ -72,7 +58,7 @@ with layer3.inference_session() as sess: ``` -### convert regular bloom to distributed +### Convert regular BLOOM into distributed ```bash # convert model from HF hub to a distributed format (can take hours depending on your connection!) @@ -83,7 +69,7 @@ python -m cli.convert_model --model bigscience/bloom-6b3 \ ``` -### test local vs remote block (allclose) +### Test local vs remote block (allclose) To test distributed inference, run one or more servers, then open a new shell and run pytest with environment variables: ```bash