piper/README.md

![Piper logo](etc/logo.png)

A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4.

``` sh
echo 'Welcome to the world of speech synthesis!' | \
  ./piper --model en-us-blizzard_lessac-medium.onnx --output_file welcome.wav
```

[Listen to voice samples](https://rhasspy.github.io/piper-samples)

Voices are trained with [VITS](https://github.com/jaywalnut310/vits/) and exported to the [onnxruntime](https://onnxruntime.ai/).

## Voices

Our goal is to support Home Assistant and the [Year of Voice](https://www.home-assistant.io/blog/2022/12/20/year-of-voice/).

Download voices from [the release](https://github.com/rhasspy/piper/releases/tag/v0.0.2).

Supported languages:

* Catalan (ca)
* Danish (da)
* German (de)
* British English (en-gb)
* U.S. English (en-us)
* Spanish (es)
* Finnish (fi)
* French (fr)
* Greek (el-gr)
* Italian (it)
* Kazakh (kk)
* Nepali (ne)
* Dutch (nl)
* Norwegian (no)
* Polish (pl)
* Brazilian Portuguese (pt-br)
* Ukrainian (uk)
* Vietnamese (vi)
* Chinese (zh-cn)


## Installation

Download a release:

* [amd64](https://github.com/rhasspy/piper/releases/download/v0.0.2/piper_amd64.tar.gz) (desktop Linux)
* [arm64](https://github.com/rhasspy/piper/releases/download/v0.0.2/piper_arm64.tar.gz) (Raspberry Pi 4)

If you want to build from source, see the [Makefile](Makefile) and [C++ source](src/cpp). Piper depends on a patched `espeak-ng` in [lib](lib), which includes a way to get access to the "terminator" used to end each clause/sentence.

The ONNX runtime is expected in `lib/Linux-$(uname -m)`, so `lib/Linux-x86_64`, etc. You can change this path in `src/cpp/CMakeLists.txt` if necessary.
Last tested with [onnxruntime](https://github.com/microsoft/onnxruntime) 1.14.1.


## Usage

1. [Download a voice](#voices) and extract the `.onnx` and `.onnx.json` files
2. Run the `piper` binary with text on standard input, `--model /path/to/your-voice.onnx`, and `--output_file output.wav`

For example:

``` sh
echo 'Welcome to the world of speech synthesis!' | \
  ./piper --model blizzard_lessac-medium.onnx --output_file welcome.wav
```

For multi-speaker models, use `--speaker <number>` to change speakers (default: 0).

See `piper --help` for more options.


## Training

See [src/python](src/python)

Start by creating a virtual environment:

``` sh
cd piper/src/python
python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools
pip3 install -r requirements.txt
```

Run the `build_monotonic_align.sh` script in the `src/python` directory to build the extension.

Ensure you have [espeak-ng](https://github.com/espeak-ng/espeak-ng/) installed (`sudo apt-get install espeak-ng`).

Next, preprocess your dataset:

``` sh
python3 -m piper_train.preprocess \
  --language en-us \
  --input-dir /path/to/ljspeech/ \
  --output-dir /path/to/training_dir/ \
  --dataset-format ljspeech \
  --sample-rate 22050
```

Datasets must either be in the [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) format or from [Mimic Recording Studio](https://github.com/MycroftAI/mimic-recording-studio) (`--dataset-format mycroft`).

Finally, you can train:

``` sh
python3 -m piper_train \
    --dataset-dir /path/to/training_dir/ \
    --accelerator 'gpu' \
    --devices 1 \
    --batch-size 32 \
    --validation-split 0.05 \
    --num-test-examples 5 \
    --max_epochs 10000 \
    --precision 32
```

Training uses [PyTorch Lightning](https://www.pytorchlightning.ai/). Run `tensorboard --logdir /path/to/training_dir/lightning_logs` to monitor. See `python3 -m piper_train --help` for many additional options.

It is highly recommended to train with the following `Dockerfile`:

``` dockerfile
FROM nvcr.io/nvidia/pytorch:22.03-py3

RUN pip3 install \
    'pytorch-lightning'

ENV NUMBA_CACHE_DIR=.numba_cache
```

See the various `infer_*` and `export_*` scripts in [src/python/piper_train](src/python/piper_train) to test and export your voice from the checkpoint in `lightning_logs`. The `dataset.jsonl` file in your training directory can be used with `python3 -m piper_train.infer` for quick testing:

``` sh
head -n5 /path/to/training_dir/dataset.jsonl | \
  python3 -m piper_train.infer \
    --checkpoint lightning_logs/path/to/checkpoint.ckpt \
    --sample-rate 22050 \
    --output-dir wavs
```


## Running in Python

See [src/python_run](src/python_run)

Run `scripts/setup.sh` to create a virtual environment and install the requirements. Then run:

``` sh
echo 'Welcome to the world of speech synthesis!' | scripts/piper \
  --model /path/to/voice.onnx \
  --output_file welcome.wav
```

If you'd like to use a GPU, install the `onnxruntime-gpu` package:


``` sh
.venv/bin/pip3 install onnxruntime-gpu
```

and then run `scripts/piper` with the `--cuda` argument. You will need to have a functioning CUDA environment, such as what's available in [NVIDIA's PyTorch containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch).
Rename to piper 1 year ago			`![Piper logo](etc/logo.png)`
Add license 1 year ago
Typo 1 year ago			`A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4.`
Update README 1 year ago
			``` sh
			`echo 'Welcome to the world of speech synthesis!' \| \`
Rename to piper 1 year ago			`./piper --model en-us-blizzard_lessac-medium.onnx --output_file welcome.wav`
Update README 1 year ago			```

Update README 1 year ago			`[Listen to voice samples](https://rhasspy.github.io/piper-samples)`

Minor changes to README 1 year ago			`Voices are trained with [VITS](https://github.com/jaywalnut310/vits/) and exported to the [onnxruntime](https://onnxruntime.ai/).`

Update README 1 year ago			`## Voices`

Minor changes to README 1 year ago			`Our goal is to support Home Assistant and the [Year of Voice](https://www.home-assistant.io/blog/2022/12/20/year-of-voice/).`

Rename to piper 1 year ago			`Download voices from [the release](https://github.com/rhasspy/piper/releases/tag/v0.0.2).`
Update README 1 year ago
Update language list 1 year ago			`Supported languages:`
Update README 1 year ago
Update README 1 year ago			`* Catalan (ca)`
Update README 1 year ago			`* Danish (da)`
			`* German (de)`
Add en-gb to support language list 1 year ago			`* British English (en-gb)`
Update README 1 year ago			`* U.S. English (en-us)`
			`* Spanish (es)`
			`* Finnish (fi)`
			`* French (fr)`
Add el-gr 1 year ago			`* Greek (el-gr)`
Update README 1 year ago			`* Italian (it)`
			`* Kazakh (kk)`
			`* Nepali (ne)`
Update README 1 year ago			`* Dutch (nl)`
Update README 1 year ago			`* Norwegian (no)`
Update README 1 year ago			`* Polish (pl)`
Update language list 1 year ago			`* Brazilian Portuguese (pt-br)`
Update README 1 year ago			`* Ukrainian (uk)`
			`* Vietnamese (vi)`
Update README 1 year ago			`* Chinese (zh-cn)`
Update README 1 year ago

			`## Installation`

			`Download a release:`

Rename to piper 1 year ago			`* [amd64](https://github.com/rhasspy/piper/releases/download/v0.0.2/piper_amd64.tar.gz) (desktop Linux)`
			`* [arm64](https://github.com/rhasspy/piper/releases/download/v0.0.2/piper_arm64.tar.gz) (Raspberry Pi 4)`
Update README 1 year ago
Update README 1 year ago			If you want to build from source, see the [Makefile](Makefile) and [C++ source](src/cpp). Piper depends on a patched `espeak-ng` in [lib](lib), which includes a way to get access to the "terminator" used to end each clause/sentence.
Using patched espeak-ng 1 year ago
Update README 1 year ago			The ONNX runtime is expected in `lib/Linux-$(uname -m)`, so `lib/Linux-x86_64`, etc. You can change this path in `src/cpp/CMakeLists.txt` if necessary.
Using patched espeak-ng 1 year ago			`Last tested with [onnxruntime](https://github.com/microsoft/onnxruntime) 1.14.1.`
Update README 1 year ago

			`## Usage`

			1. [Download a voice](#voices) and extract the `.onnx` and `.onnx.json` files
Rename to piper 1 year ago			2. Run the `piper` binary with text on standard input, `--model /path/to/your-voice.onnx`, and `--output_file output.wav`
Update README 1 year ago
			`For example:`

			``` sh
			`echo 'Welcome to the world of speech synthesis!' \| \`
Rename to piper 1 year ago			`./piper --model blizzard_lessac-medium.onnx --output_file welcome.wav`
Update README 1 year ago			```

			For multi-speaker models, use `--speaker <number>` to change speakers (default: 0).

Rename to piper 1 year ago			See `piper --help` for more options.
Update README 1 year ago

			`## Training`

			`See [src/python](src/python)`

			`Start by creating a virtual environment:`

			``` sh
Rename to piper 1 year ago			`cd piper/src/python`
Update README 1 year ago			`python3 -m venv .venv`
			`source .venv/bin/activate`
			`pip3 install --upgrade pip`
			`pip3 install --upgrade wheel setuptools`
			`pip3 install -r requirements.txt`
			```

Add more info to README 1 year ago			Run the `build_monotonic_align.sh` script in the `src/python` directory to build the extension.

Update README 1 year ago			Ensure you have [espeak-ng](https://github.com/espeak-ng/espeak-ng/) installed (`sudo apt-get install espeak-ng`).

			`Next, preprocess your dataset:`

			``` sh
Rename to piper 1 year ago			`python3 -m piper_train.preprocess \`
Update README 1 year ago			`--language en-us \`
			`--input-dir /path/to/ljspeech/ \`
			`--output-dir /path/to/training_dir/ \`
			`--dataset-format ljspeech \`
			`--sample-rate 22050`
			```

			Datasets must either be in the [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) format or from [Mimic Recording Studio](https://github.com/MycroftAI/mimic-recording-studio) (`--dataset-format mycroft`).

			`Finally, you can train:`

			``` sh
Rename to piper 1 year ago			`python3 -m piper_train \`
Update README 1 year ago			`--dataset-dir /path/to/training_dir/ \`
			`--accelerator 'gpu' \`
			`--devices 1 \`
			`--batch-size 32 \`
			`--validation-split 0.05 \`
			`--num-test-examples 5 \`
			`--max_epochs 10000 \`
			`--precision 32`
			```

Rename to piper 1 year ago			Training uses [PyTorch Lightning](https://www.pytorchlightning.ai/). Run `tensorboard --logdir /path/to/training_dir/lightning_logs` to monitor. See `python3 -m piper_train --help` for many additional options.
Update README 1 year ago
			It is highly recommended to train with the following `Dockerfile`:

			``` dockerfile
			`FROM nvcr.io/nvidia/pytorch:22.03-py3`

			`RUN pip3 install \`
			`'pytorch-lightning'`

			`ENV NUMBA_CACHE_DIR=.numba_cache`
			```

Rename to piper 1 year ago			See the various `infer_` and `export_` scripts in [src/python/piper_train](src/python/piper_train) to test and export your voice from the checkpoint in `lightning_logs`. The `dataset.jsonl` file in your training directory can be used with `python3 -m piper_train.infer` for quick testing:
Update README 1 year ago
			``` sh
			`head -n5 /path/to/training_dir/dataset.jsonl \| \`
Rename to piper 1 year ago			`python3 -m piper_train.infer \`
Update README 1 year ago			`--checkpoint lightning_logs/path/to/checkpoint.ckpt \`
			`--sample-rate 22050 \`
			`--output-dir wavs`
			```

Update README 1 year ago
			`## Running in Python`

			`See [src/python_run](src/python_run)`

			Run `scripts/setup.sh` to create a virtual environment and install the requirements. Then run:

			``` sh
Rename to piper 1 year ago			`echo 'Welcome to the world of speech synthesis!' \| scripts/piper \`
Update README 1 year ago			`--model /path/to/voice.onnx \`
			`--output_file welcome.wav`
			```

			If you'd like to use a GPU, install the `onnxruntime-gpu` package:


			``` sh
			`.venv/bin/pip3 install onnxruntime-gpu`
			```

Rename to piper 1 year ago			and then run `scripts/piper` with the `--cuda` argument. You will need to have a functioning CUDA environment, such as what's available in [NVIDIA's PyTorch containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch).
Update README 1 year ago