Speech synthesis in Linux, with Piper

Sometimes I want to listen to written news articles, because I am doing several things at the same time and I don't want to be stuck reading the content of one unique tab.

Firefox has a Reader View mode that is pleasing to the eye, and that can also read text aloud using the speech-dispatcher TTS (text to speech) service provided by Linux.

Speech-dispatcher can use a multitude of TTS engines, but the ones historically available in Linux sounded unnatural. Until recently.

The robotic voice days are over, and natural sounding speech synthesis is now available to Linux users without the need to connect to online services like Google Translate's engine.

Nabu Casa, the company behind Home Assistant, is sponsoring Piper, an open source TTS engine under MIT Licence, so that the Home Assistant automation system can have its own voice assistant.

There is also Mycroft's Mimic3 engine, but for the time being it is only available as a cloud option, so I am ruling it out. There will be an offline version available at some point in time.

Both Piper and Mimic3 came from the need to integrate voice assistants running on a Raspberry Pi, so they can run on low spec systems.

I installed Piper on my Fedora 39 laptop and I am very satisfied with the results.

Here is how I did it:

Download latest release from the Piper project page: https://github.com/rhasspy/piper

Releases are here: https://github.com/rhasspy/piper/releases
Get the package that corresponds to your CPU architecture. For instance, if on an Intel/AMD architecture:

wget https://github.com/rhasspy/piper/releases/download/2023.9.27-1/piper_linux_x64.tar.gz

Extract the package in /opt.

cd /opt
tar xvf /path/to/piper_linux_x64.tar.gz

Listen to voice models and download the ones you like from https://rhasspy.github.io/piper-samples/

Piper is more than just a TTS engine, but this is how I am going to use it, so all I need is existing pre-trained voice models.

Download the onnx and onnx.jason files that correspond to the voices you have selected. For instance, if you want an English US (North American accent) and a French voice, you could pick: Lessac (en_US)

wget https://huggingface.co/rhasspy/piper-voices/blob/main/en/en_US/lessac/high/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/blob/main/en/en_US/lessac/high/en_US-lessac-medium.onnx.json

and UPMC Jessica (fr_FR)

wget https://huggingface.co/rhasspy/piper-voices/blob/main/fr/fr_FR/upmc/medium/fr_FR-upmc-medium.onnx
https://huggingface.co/rhasspy/piper-voices/blob/main/fr/fr_FR/upmc/medium/fr_FR-upmc-medium.onnx.json

Place model files in /opt/piper/models (you may have to create this directory manually).

Create local speech-dispatcher configuration

As a regular desktop user (i.e. not root), create the directory ~/.conf/speech-dispatcher and subdirectory ~/.conf/speech-dispatcher/modules.

Create the file ~/.conf/speech-dispatcher/modules/piper.conf with the following content:

#English US
AddVoice	"en"			"FEMALE1"	"en_US-lessac-medium"
AddVoice	"en_US"		"FEMALE1"	"en_US-lessac-medium"

#French FR
AddVoice	 "fr"			"FEMALE1"	"fr_FR-upmc-medium"
AddVoice	 "fr_FR"		"FEMALE1"	"fr_FR-upmc-medium"

DefaultVoice "en_US-lessac-medium"

GenericExecuteSynth "echo \'$DATA\' | /opt/piper/piper --model /opt/piper/models/$VOICE.onnx --output_raw | aplay -r 22050 -f S16_LE -t raw - "

Create the file ~/.conf/speech-dispatcher/speechd.conf with the following content:

AddModule "piper" "sd_generic" "/home/username/.config/speech-dispatcher/modules/piper.conf"

DefaultModule piper
AudioOutputMethod "pulse"
AudioPulseDevice "default"

Don't forget to replace username with the actual user name.

Reboot

This may not be needed but it could save you some headaches.

Log back in

Test speech synthesis with default voice: spd-say "This is a speech synthesis test"

If this worked, then start Firefox, open a news article in Reader View, and press 'N' to listen to it. You should now be able to have nice quality TTS for articles in French and English.