petals/src/petals/__init__.py

import os
import platform

os.environ.setdefault("BITSANDBYTES_NOWELCOME", "1")

if platform.system() == "Darwin":
    # Necessary for forks to work properly on macOS, see https://github.com/kevlened/pytest-parallel/issues/93
    os.environ.setdefault("no_proxy", "*")
    os.environ.setdefault("OBJC_DISABLE_INITIALIZE_FORK_SAFETY", "YES")

import hivemind
import transformers
from packaging import version

from petals.client import *
from petals.models import *
from petals.utils import *
from petals.utils.logging import initialize_logs as _initialize_logs

__version__ = "2.3.0.dev2"


if not os.getenv("PETALS_IGNORE_DEPENDENCY_VERSION"):
    assert (
        version.parse("4.38.2") <= version.parse(transformers.__version__) < version.parse("4.39.0")
    ), "Please install a proper transformers version: pip install transformers>=4.37.1,<4.39.0"


def _override_bfloat16_mode_default():
    if os.getenv("USE_LEGACY_BFLOAT16") is None:
        hivemind.compression.base.USE_LEGACY_BFLOAT16 = False


_initialize_logs()
_override_bfloat16_mode_default()
Update hivemind to 1.1.8, enable efficient bfloat16 encoding (#311) This PR: 1. Updates hivemind to 1.1.8 (includes https://github.com/learning-at-home/hivemind/pull/565) 2. Enables efficient bfloat16 serialization by default (`USE_LEGACY_BFLOAT16 = False`) 3. Removes logging code that was included to hivemind in https://github.com/learning-at-home/hivemind/pull/542 1 year ago			`import os`
Support macOS (#477) This PR makes both clients and servers work on macOS. Specifically, it: - Follows https://github.com/learning-at-home/hivemind/pull/586 to run a macOS-compatible `p2pd` binary (both x86-64 and ARM64 are supported) - Fixes forking issues and tests on macOS, Python 3.10+ - Introduces basic support for serving model blocks on Apple M1/M2 GPUs (torch.mps) - Increases max number of open files by default (it's not enough on Linux and is really small on macOS) 8 months ago			`import platform`
Update hivemind to 1.1.8, enable efficient bfloat16 encoding (#311) This PR: 1. Updates hivemind to 1.1.8 (includes https://github.com/learning-at-home/hivemind/pull/565) 2. Enables efficient bfloat16 serialization by default (`USE_LEGACY_BFLOAT16 = False`) 3. Removes logging code that was included to hivemind in https://github.com/learning-at-home/hivemind/pull/542 1 year ago
Spam less in server logs (#350) 10 months ago			`os.environ.setdefault("BITSANDBYTES_NOWELCOME", "1")`

Support macOS (#477) This PR makes both clients and servers work on macOS. Specifically, it: - Follows https://github.com/learning-at-home/hivemind/pull/586 to run a macOS-compatible `p2pd` binary (both x86-64 and ARM64 are supported) - Fixes forking issues and tests on macOS, Python 3.10+ - Introduces basic support for serving model blocks on Apple M1/M2 GPUs (torch.mps) - Increases max number of open files by default (it's not enough on Linux and is really small on macOS) 8 months ago			`if platform.system() == "Darwin":`
			`# Necessary for forks to work properly on macOS, see https://github.com/kevlened/pytest-parallel/issues/93`
			`os.environ.setdefault("no_proxy", "*")`
			`os.environ.setdefault("OBJC_DISABLE_INITIALIZE_FORK_SAFETY", "YES")`

Update hivemind to 1.1.8, enable efficient bfloat16 encoding (#311) This PR: 1. Updates hivemind to 1.1.8 (includes https://github.com/learning-at-home/hivemind/pull/565) 2. Enables efficient bfloat16 serialization by default (`USE_LEGACY_BFLOAT16 = False`) 3. Removes logging code that was included to hivemind in https://github.com/learning-at-home/hivemind/pull/542 1 year ago			`import hivemind`
Add LLaMA support (#323) This PR: 1. Abolishes the model conversion procedure. Now, models are downloaded directly from original repositories like https://huggingface.co/bigscience/bloom. Servers download only shards with blocks to be hosted, and clients download only shards with input/output embeddings and layernorms. - BLOOM is loaded from `bigscience/bloom`, but we use the DHT prefix `bigscience/bloom-petals` for backward compatibility. Same with smaller BLOOMs and BLOOMZ. - LLaMA can be loaded from any repo like `username/llama-65b-hf`, but we use the DHT prefix `llama-65b-hf` (without the username) to accomodate blocks from different repos (there're a few of them with minor differences, such as `Llama` vs. `LLaMA` in the class name). 2. Refactors the client to generalize it for multiple models. Now, we have `petals.models` packages that contain model-specific code (e.g. `petals.models.bloom`, `petals.models.llama`). General code (e.g. CPU-efficient LM head, p-tuning) is kept in `petals.client`. 3. Introduces `WrappedLlamaBlock`, `DistributedLlamaConfig`, `DistributedLlamaForCausalLM`, `DistributedLlamaForSequenceClassification`, and `DistributedLlamaModel` compatible with Petals functionality (p-tuning, adapters, etc.). 4. Introduces `AutoDistributedConfig` that automatically chooses the correct config class (`DistributedLlamaConfig` or `DistributedBloomConfig`). The refactored configs contain all model-specific info for both clients and servers. Upgrade instructions: - Remove disk caches for blocks in old (converted) format to save disk space. That is, remove `~/.cache/petals/model--bigscience--bloom-petals` and `~/.cache/petals/model--bigscience--bloomz-petals` directories (if present). 11 months ago			`import transformers`
			`from packaging import version`
Update hivemind to 1.1.8, enable efficient bfloat16 encoding (#311) This PR: 1. Updates hivemind to 1.1.8 (includes https://github.com/learning-at-home/hivemind/pull/565) 2. Enables efficient bfloat16 serialization by default (`USE_LEGACY_BFLOAT16 = False`) 3. Removes logging code that was included to hivemind in https://github.com/learning-at-home/hivemind/pull/542 1 year ago
Fix issues related to `petals` as a module (#159) 1. Added `from petals.client import *` to `petals/__init__.py`, so you can write just that: ```python from petals import DistributedBloomForCausalLM ``` I didn't do the same with server, since its classes are supposed to by used by `petals.cli.run_server`, not end-users. Though it's still possible to do `from petals.server.smth import smth` if necessary. 2. Fixed one more logging issue: log lines from hivemind were shown twice due to a bug in #156. 3. Removed unused `runtime.py`, since the server actually uses `hivemind.moe.Runtime`, and `runtime.py` has no significant changes comparing to it. 1 year ago			`from petals.client import *`
Add LLaMA support (#323) This PR: 1. Abolishes the model conversion procedure. Now, models are downloaded directly from original repositories like https://huggingface.co/bigscience/bloom. Servers download only shards with blocks to be hosted, and clients download only shards with input/output embeddings and layernorms. - BLOOM is loaded from `bigscience/bloom`, but we use the DHT prefix `bigscience/bloom-petals` for backward compatibility. Same with smaller BLOOMs and BLOOMZ. - LLaMA can be loaded from any repo like `username/llama-65b-hf`, but we use the DHT prefix `llama-65b-hf` (without the username) to accomodate blocks from different repos (there're a few of them with minor differences, such as `Llama` vs. `LLaMA` in the class name). 2. Refactors the client to generalize it for multiple models. Now, we have `petals.models` packages that contain model-specific code (e.g. `petals.models.bloom`, `petals.models.llama`). General code (e.g. CPU-efficient LM head, p-tuning) is kept in `petals.client`. 3. Introduces `WrappedLlamaBlock`, `DistributedLlamaConfig`, `DistributedLlamaForCausalLM`, `DistributedLlamaForSequenceClassification`, and `DistributedLlamaModel` compatible with Petals functionality (p-tuning, adapters, etc.). 4. Introduces `AutoDistributedConfig` that automatically chooses the correct config class (`DistributedLlamaConfig` or `DistributedBloomConfig`). The refactored configs contain all model-specific info for both clients and servers. Upgrade instructions: - Remove disk caches for blocks in old (converted) format to save disk space. That is, remove `~/.cache/petals/model--bigscience--bloom-petals` and `~/.cache/petals/model--bigscience--bloomz-petals` directories (if present). 11 months ago			`from petals.models import *`
			`from petals.utils import *`
Fix issues related to `petals` as a module (#159) 1. Added `from petals.client import *` to `petals/__init__.py`, so you can write just that: ```python from petals import DistributedBloomForCausalLM ``` I didn't do the same with server, since its classes are supposed to by used by `petals.cli.run_server`, not end-users. Though it's still possible to do `from petals.server.smth import smth` if necessary. 2. Fixed one more logging issue: log lines from hivemind were shown twice due to a bug in #156. 3. Removed unused `runtime.py`, since the server actually uses `hivemind.moe.Runtime`, and `runtime.py` has no significant changes comparing to it. 1 year ago			`from petals.utils.logging import initialize_logs as _initialize_logs`
Fix logging: do not duplicate lines, enable colors in Colab (#156) 1 year ago
Bump transformers and accelerate versions (#554) Bump versions for transformers and accelerate, remove falcon-rw-1b CI tests 3 months ago			`__version__ = "2.3.0.dev2"`
Add LLaMA support (#323) This PR: 1. Abolishes the model conversion procedure. Now, models are downloaded directly from original repositories like https://huggingface.co/bigscience/bloom. Servers download only shards with blocks to be hosted, and clients download only shards with input/output embeddings and layernorms. - BLOOM is loaded from `bigscience/bloom`, but we use the DHT prefix `bigscience/bloom-petals` for backward compatibility. Same with smaller BLOOMs and BLOOMZ. - LLaMA can be loaded from any repo like `username/llama-65b-hf`, but we use the DHT prefix `llama-65b-hf` (without the username) to accomodate blocks from different repos (there're a few of them with minor differences, such as `Llama` vs. `LLaMA` in the class name). 2. Refactors the client to generalize it for multiple models. Now, we have `petals.models` packages that contain model-specific code (e.g. `petals.models.bloom`, `petals.models.llama`). General code (e.g. CPU-efficient LM head, p-tuning) is kept in `petals.client`. 3. Introduces `WrappedLlamaBlock`, `DistributedLlamaConfig`, `DistributedLlamaForCausalLM`, `DistributedLlamaForSequenceClassification`, and `DistributedLlamaModel` compatible with Petals functionality (p-tuning, adapters, etc.). 4. Introduces `AutoDistributedConfig` that automatically chooses the correct config class (`DistributedLlamaConfig` or `DistributedBloomConfig`). The refactored configs contain all model-specific info for both clients and servers. Upgrade instructions: - Remove disk caches for blocks in old (converted) format to save disk space. That is, remove `~/.cache/petals/model--bigscience--bloom-petals` and `~/.cache/petals/model--bigscience--bloomz-petals` directories (if present). 11 months ago

			`if not os.getenv("PETALS_IGNORE_DEPENDENCY_VERSION"):`
			`assert (`
Fix p2p pushing in rpc_inference (by @miaoqijun ) , support transformers 4.38.2 (#563) This pull request solves #560 using a solution proposed by @miaoqijun . It also bumps transformers to the latest version to test with the latest code. --------- Co-authored-by: Yingtong Dou <ytongdou@gmail.com> 2 months ago			`version.parse("4.38.2") <= version.parse(transformers.__version__) < version.parse("4.39.0")`
			`), "Please install a proper transformers version: pip install transformers>=4.37.1,<4.39.0"`
Fix logging: do not duplicate lines, enable colors in Colab (#156) 1 year ago
Update hivemind to 1.1.8, enable efficient bfloat16 encoding (#311) This PR: 1. Updates hivemind to 1.1.8 (includes https://github.com/learning-at-home/hivemind/pull/565) 2. Enables efficient bfloat16 serialization by default (`USE_LEGACY_BFLOAT16 = False`) 3. Removes logging code that was included to hivemind in https://github.com/learning-at-home/hivemind/pull/542 1 year ago
			`def _override_bfloat16_mode_default():`
			`if os.getenv("USE_LEGACY_BFLOAT16") is None:`
			`hivemind.compression.base.USE_LEGACY_BFLOAT16 = False`


Fix issues related to `petals` as a module (#159) 1. Added `from petals.client import *` to `petals/__init__.py`, so you can write just that: ```python from petals import DistributedBloomForCausalLM ``` I didn't do the same with server, since its classes are supposed to by used by `petals.cli.run_server`, not end-users. Though it's still possible to do `from petals.server.smth import smth` if necessary. 2. Fixed one more logging issue: log lines from hivemind were shown twice due to a bug in #156. 3. Removed unused `runtime.py`, since the server actually uses `hivemind.moe.Runtime`, and `runtime.py` has no significant changes comparing to it. 1 year ago			`_initialize_logs()`
Update hivemind to 1.1.8, enable efficient bfloat16 encoding (#311) This PR: 1. Updates hivemind to 1.1.8 (includes https://github.com/learning-at-home/hivemind/pull/565) 2. Enables efficient bfloat16 serialization by default (`USE_LEGACY_BFLOAT16 = False`) 3. Removes logging code that was included to hivemind in https://github.com/learning-at-home/hivemind/pull/542 1 year ago			`_override_bfloat16_mode_default()`