From 75e516a8c1e3b4dfb8c4519903bdbc565dd48c14 Mon Sep 17 00:00:00 2001 From: Alexander Borzunov Date: Mon, 28 Aug 2023 19:09:13 +0400 Subject: [PATCH] Refactor readme (#482) --- README.md | 80 +++++++++++++++++++++---------------------------------- 1 file changed, 31 insertions(+), 49 deletions(-) diff --git a/README.md b/README.md index 255e624..ef3ca3f 100644 --- a/README.md +++ b/README.md @@ -14,14 +14,14 @@ Generate text with distributed **Llama 2 (70B)**, **Stable Beluga 2**, **Guanaco from transformers import AutoTokenizer from petals import AutoDistributedModelForCausalLM +# Choose any model available at https://health.petals.dev model_name = "petals-team/StableBeluga2" -# You can also use "meta-llama/Llama-2-70b-hf", "meta-llama/Llama-2-70b-chat-hf", -# repos with Llama-65B, "bigscience/bloom", or "bigscience/bloomz" +# Connect to a distributed network hosting model layers tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoDistributedModelForCausalLM.from_pretrained(model_name) -# Embeddings & prompts are on your device, transformer blocks are distributed across the Internet +# Run the model as if it were on your computer inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"] outputs = model.generate(inputs, max_new_tokens=5) print(tokenizer.decode(outputs[0])) # A cat sat on a mat... @@ -33,17 +33,15 @@ print(tokenizer.decode(outputs[0])) # A cat sat on a mat... πŸ¦™ **Want to run Llama 2?** Request access to its weights at the ♾️ [Meta AI website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and πŸ€— [Model Hub](https://huggingface.co/meta-llama/Llama-2-70b-hf), then run `huggingface-cli login` in the terminal before loading the model. Or just try it in our [chatbot app](https://chat.petals.dev). -πŸ“‹ **Terms of use.** Make sure you follow the model license (see [Llama 2](https://bit.ly/llama2-license), [Stable Beluga 2](https://huggingface.co/stabilityai/StableBeluga2/blob/main/LICENSE.txt), [Llama](https://bit.ly/llama-license), and [BLOOM](https://bit.ly/bloom-license)). - πŸ” **Privacy.** Your data will be processed by other people in the public swarm. Learn more about privacy [here](https://github.com/bigscience-workshop/petals/wiki/Security,-privacy,-and-AI-safety). For sensitive data, you can set up a [private swarm](https://github.com/bigscience-workshop/petals/wiki/Launch-your-own-swarm) among people you trust. πŸ’¬ **Any questions?** Ping us in [our Discord](https://discord.gg/KdThf2bWVU)! -### Connect your GPU and increase Petals capacity +## Connect your GPU and increase Petals capacity -Petals is a community-run system — we rely on people sharing their GPUs. You can check out available servers on our [swarm monitor](https://health.petals.dev) and connect your GPU to help serving one of the models! +Petals is a community-run system — we rely on people sharing their GPUs. You can check out [available models](https://health.petals.dev) and help serving one of them! As an example, here is how to host a part of [Stable Beluga 2](https://huggingface.co/stabilityai/StableBeluga2) on your GPU: -🐍 **Linux + Anaconda.** Run these commands: +🐧 **Linux + Anaconda.** Run these commands for NVIDIA GPUs (or follow [this](https://github.com/bigscience-workshop/petals/wiki/Running-on-AMD-GPU) for AMD): ```bash conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia @@ -51,49 +49,28 @@ pip install git+https://github.com/bigscience-workshop/petals python -m petals.cli.run_server petals-team/StableBeluga2 ``` -πŸͺŸ **Windows + WSL.** Follow the guide on our [Wiki](https://github.com/bigscience-workshop/petals/wiki/Run-Petals-server-on-Windows). +πŸͺŸ **Windows + WSL.** Follow [this guide](https://github.com/bigscience-workshop/petals/wiki/Run-Petals-server-on-Windows) on our Wiki. -πŸ‹ **Any OS + Docker.** Run our [Docker](https://www.docker.com) image: +πŸ‹ **Any OS + Docker.** Run our [Docker](https://www.docker.com) image for NVIDIA GPUs (or follow [this](https://github.com/bigscience-workshop/petals/wiki/Running-on-AMD-GPU) for AMD): ```bash -sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm learningathome/petals:main \ +sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm \ + learningathome/petals:main \ python -m petals.cli.run_server --port 31330 petals-team/StableBeluga2 ``` -These commands will host a part of [Stable Beluga 2](https://huggingface.co/stabilityai/StableBeluga2) on your machine. You can also host `meta-llama/Llama-2-70b-hf`, `meta-llama/Llama-2-70b-chat-hf`, repos with Llama-65B, `bigscience/bloom`, `bigscience/bloomz`, and other compatible models from πŸ€— [Model Hub](https://huggingface.co/models), or [add support](https://github.com/bigscience-workshop/petals/wiki/Run-a-custom-model-with-Petals) for new model architectures. - -πŸ¦™ **Want to host Llama 2?** Request access to its weights at the ♾️ [Meta AI website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and πŸ€— [Model Hub](https://huggingface.co/meta-llama/Llama-2-70b-hf), generate an πŸ”‘ [access token](https://huggingface.co/settings/tokens), then use this command for `petals.cli.run_server`: - -```bash -python -m petals.cli.run_server meta-llama/Llama-2-70b-chat-hf --token YOUR_TOKEN_HERE -``` +

+ πŸ“š  Learn more (using multiple GPUs, starting on boot, etc.) +            + πŸ’¬  Ask for help in Discord +

-πŸ’¬ **FAQ.** Check out our [Wiki](https://github.com/bigscience-workshop/petals/wiki/FAQ:-Frequently-asked-questions#running-a-server) to learn how to use multple GPUs, restart the server on reboot, etc. If you have any issues, ping us in [our Discord](https://discord.gg/X7DgtxgMhc)! +πŸ¦™ **Want to host Llama 2?** Request access to its weights at the ♾️ [Meta AI website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and πŸ€— [Model Hub](https://huggingface.co/meta-llama/Llama-2-70b-hf), generate an πŸ”‘ [access token](https://huggingface.co/settings/tokens), then add `--token YOUR_TOKEN_HERE` to the `python -m petals.cli.run_server` command. πŸ”’ **Security.** Hosting a server does not allow others to run custom code on your computer. Learn more [here](https://github.com/bigscience-workshop/petals/wiki/Security,-privacy,-and-AI-safety). πŸ† **Thank you!** Once you load and host 10+ blocks, we can show your name or link on the [swarm monitor](https://health.petals.dev) as a way to say thanks. You can specify them with `--public_name YOUR_NAME`. -### Check out tutorials, examples, and more - -Basic tutorials: - -- Getting started: [tutorial](https://colab.research.google.com/drive/1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing) -- Prompt-tune Llama-65B for text semantic classification: [tutorial](https://colab.research.google.com/github/bigscience-workshop/petals/blob/main/examples/prompt-tuning-sst2.ipynb) -- Prompt-tune BLOOM to create a personified chatbot: [tutorial](https://colab.research.google.com/github/bigscience-workshop/petals/blob/main/examples/prompt-tuning-personachat.ipynb) - -Useful tools and advanced guides: - -- [Chatbot web app](https://chat.petals.dev) (connects to Petals via an HTTP/WebSocket endpoint): [source code](https://github.com/petals-infra/chat.petals.dev) -- [Monitor](https://health.petals.dev) for the public swarm: [source code](https://github.com/petals-infra/health.petals.dev) -- Launch your own swarm: [guide](https://github.com/bigscience-workshop/petals/wiki/Launch-your-own-swarm) -- Run a custom foundation model: [guide](https://github.com/bigscience-workshop/petals/wiki/Run-a-custom-model-with-Petals) - -Learning more: - -- Frequently asked questions: [FAQ](https://github.com/bigscience-workshop/petals/wiki/FAQ:-Frequently-asked-questions) -- In-depth system description: [paper](https://arxiv.org/abs/2209.01188) - ## How does it work? - Petals runs large language models like [Llama](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) and [BLOOM](https://huggingface.co/bigscience/bloom) **collaboratively** β€” you load a small part of the model, then join people serving the other parts to run inference or fine-tuning. @@ -105,23 +82,28 @@ Learning more:

- πŸ“š  See FAQ -            πŸ“œ  Read paper +            + πŸ“š  See FAQ

-## Installation +## πŸ“š Tutorials, examples, and more -Here's how to install Petals with [Anaconda](https://www.anaconda.com/products/distribution) on Linux: +Basic tutorials: -```bash -conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia -pip install git+https://github.com/bigscience-workshop/petals -``` +- Getting started: [tutorial](https://colab.research.google.com/drive/1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing) +- Prompt-tune Llama-65B for text semantic classification: [tutorial](https://colab.research.google.com/github/bigscience-workshop/petals/blob/main/examples/prompt-tuning-sst2.ipynb) +- Prompt-tune BLOOM to create a personified chatbot: [tutorial](https://colab.research.google.com/github/bigscience-workshop/petals/blob/main/examples/prompt-tuning-personachat.ipynb) + +Useful tools: + +- [Chatbot web app](https://chat.petals.dev) (connects to Petals via an HTTP/WebSocket endpoint): [source code](https://github.com/petals-infra/chat.petals.dev) +- [Monitor](https://health.petals.dev) for the public swarm: [source code](https://github.com/petals-infra/health.petals.dev) -If you don't use Anaconda, you can install PyTorch in [any other way](https://pytorch.org/get-started/locally/). If you want to run models with 8-bit weights, please install PyTorch with CUDA 11.x or newer for compatility with [bitsandbytes](https://github.com/timDettmers/bitsandbytes). +Advanced guides: -See the instructions for macOS and Windows, the full requirements, and troubleshooting advice in our [FAQ](https://github.com/bigscience-workshop/petals/wiki/FAQ:-Frequently-asked-questions#running-a-client). +- Launch a private swarm: [guide](https://github.com/bigscience-workshop/petals/wiki/Launch-your-own-swarm) +- Run a custom model: [guide](https://github.com/bigscience-workshop/petals/wiki/Run-a-custom-model-with-Petals) ## Benchmarks