Re: Creating a new synthesizer

Quentin Christensen

Thank you Reece,

This is an excellent set of resources!

And yes, developing a TTS is a very extensive job, which is pretty much the main reason we didn't get to polishing speech player.

The code is freely available and it is open source if anyone is interested in working on it.

Kind regards


On Fri, Sep 13, 2019 at 5:04 PM Reece H. Dunn <msclrhd@...> wrote:

I'll have more later, but here is a start (welcome to the rabbit hole).

A speech synthesizer voice typically consists of two parts:
1.  the text to phonemes part;
2.  the phonemes to audio part.

The text to phonemes part typically consists of a dictionary mapping words to phonemes and a set of rules for how to pronounce certain word patterns (like "EE" in English).

Phonemes (General)
1. -- used by linguists for transcribing languages (see also all the references from this for phoneme theory)
1. (English)
1. -- see also the different IPA references for a given language

Phoneme Transcription Schemes
1. -- Language-specific SAMPA transcriptions; used by MBROLA voices
1. -- Used as the basis of the CMU/FestVox voices
1. -- Conlang X-SAMPA
1. -- Kirshenbaum / ASCII-IPA
1. -- X-SAMPA

Pronunciation Dictionaries
1. -- python tools for working with CMU dictionary like pronunciation dictionaries
1. -- historical view of the CMU pronunciation dictionary for American English
1. -- my attempts to clean up and extend the cmudict to make it more consistent

Formant Synthesizers
1. -- Dennis Klatt's original 1980 paper
1. -- Dennis Klatt's follow up 1990 paper

Creating a Voice
1. -- A set of 7 English voices with US, Canadian, Indian, and Scottish accents
1. -- FestVox documentation on building a voice
1. -- MBROLA documentation on creating a voice
1. -- eSpeak NG docs on adding a language; the other docs in the docs folder contains more information, and the documentation can definitely be improved

Quentin Christensen
Training and Support Manager

Join to automatically receive all group messages.