Update README

1 year ago · b31cece2df
parent 61dc7d415f
commit b31cece2df
1 changed files with 116 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,119 @@
 # Larynx

 A fast, local neural text to speech system.
+
+``` sh
+echo 'Welcome to the world of speech synthesis!' | \
+  ./larynx --model blizzard_lessac-medium.onnx --output_file welcome.wav
+```
+
+## Voices
+
+* [U.S. English](https://github.com/rhasspy/larynx2/releases/download/v0.0.1/voice-english.tar.gz)
+* [German](https://github.com/rhasspy/larynx2/releases/download/v0.0.1/voice-german.tar.gz)
+* [Danish](https://github.com/rhasspy/larynx2/releases/download/v0.0.1/voice-danish.tar.gz)
+* [Norweigian](https://github.com/rhasspy/larynx2/releases/download/v0.0.1/voice-norweigian.tar.gz)
+* [Nepali](https://github.com/rhasspy/larynx2/releases/download/v0.0.1/voice-nepali.tar.gz)
+* [Vietnamese](https://github.com/rhasspy/larynx2/releases/download/v0.0.1/voice-vietnamese.tar.gz)
+
+
+## Purpose
+
+Larynx is meant to sound as good as [CoquiTTS](https://github.com/coqui-ai/TTS), but run reasonbly fast on the Raspberry Pi 4.
+
+Voices are trained with [VITS](https://github.com/jaywalnut310/vits/) and exported to the [onnxruntime](https://onnxruntime.ai/).
+
+
+## Installation
+
+Download a release:
+
+* [amd64](https://github.com/rhasspy/larynx2/releases/download/v0.0.1/larynx_amd64.tar.gz) (desktop Linux)
+* [arm64](https://github.com/rhasspy/larynx2/releases/download/v0.0.1/larynx_arm64.tar.gz) (Raspberry Pi 4)
+
+If you want to build from source, see the [Makefile](Makefile) and [C++ source](src/cpp).
+
+
+## Usage
+
+1. [Download a voice](#voices) and extract the `.onnx` and `.onnx.json` files
+2. Run the `larynx` binary with text on stdin, `--model /path/to/your-voice.onnx`, and `--output_file output.wav`
+
+For example:
+
+``` sh
+echo 'Welcome to the world of speech synthesis!' | \
+  ./larynx --model blizzard_lessac-medium.onnx --output_file welcome.wav
+```
+
+For multi-speaker models, use `--speaker <number>` to change speakers (default: 0).
+
+See `larynx --help` for more options.
+
+
+## Training
+
+See [src/python](src/python)
+
+Start by creating a virtual environment:
+
+``` sh
+python3 -m venv .venv
+source .venv/bin/activate
+pip3 install --upgrade pip
+pip3 install --upgrade wheel setuptools
+pip3 install -r requirements.txt
+```
+
+Ensure you have [espeak-ng](https://github.com/espeak-ng/espeak-ng/) installed (`sudo apt-get install espeak-ng`).
+
+Next, preprocess your dataset:
+
+``` sh
+python3 -m larynx_train.preprocess \
+  --language en-us \
+  --input-dir /path/to/ljspeech/ \
+  --output-dir /path/to/training_dir/ \
+  --dataset-format ljspeech \
+  --sample-rate 22050
+```
+
+Datasets must either be in the [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) format or from [Mimic Recording Studio](https://github.com/MycroftAI/mimic-recording-studio) (`--dataset-format mycroft`).
+
+Finally, you can train:
+
+``` sh
+python3 -m larynx_train \
+    --dataset-dir /path/to/training_dir/ \
+    --accelerator 'gpu' \
+    --devices 1 \
+    --batch-size 32 \
+    --validation-split 0.05 \
+    --num-test-examples 5 \
+    --max_epochs 10000 \
+    --precision 32
+```
+
+Training uses [PyTorch Lightning](https://www.pytorchlightning.ai/). Run `tensorboard --logdir /path/to/training_dir/lightning_logs` to monitor. See `python3 -m larynx_train --help` for many additional options.
+
+It is highly recommended to train with the following `Dockerfile`:
+
+``` dockerfile
+FROM nvcr.io/nvidia/pytorch:22.03-py3
+
+RUN pip3 install \
+    'pytorch-lightning'
+
+ENV NUMBA_CACHE_DIR=.numba_cache
+```
+
+See the various `infer_*` and `export_*` scripts in [src/python/larynx_train](src/python/larynx_train) to test and export your voice from the checkpoint in `lightning_logs`. The `dataset.jsonl` file in your training directory can be used with `python3 -m larynx_train.infer` for quick testing:
+
+``` sh
+head -n5 /path/to/training_dir/dataset.jsonl | \
+  python3 -m larynx_train.infer \
+    --checkpoint lightning_logs/path/to/checkpoint.ckpt \
+    --sample-rate 22050 \
+    --output-dir wavs
+```
+