Re: Creating a new synthesizer


Reece H. Dunn
 


I'll have more later, but here is a start (welcome to the rabbit hole).

A speech synthesizer voice typically consists of two parts:
1.  the text to phonemes part;
2.  the phonemes to audio part.

The text to phonemes part typically consists of a dictionary mapping words to phonemes and a set of rules for how to pronounce certain word patterns (like "EE" in English).

Phonemes (General)
1.  https://en.wikipedia.org/wiki/International_Phonetic_Alphabet -- used by linguists for transcribing languages (see also all the references from this for phoneme theory)
1.  https://en.wikipedia.org/wiki/Lexical_set (English)
1.  https://en.wikipedia.org/wiki/International_Phonetic_Alphabet_chart_for_English_dialects -- see also the different IPA references for a given language

Phoneme Transcription Schemes
1.  https://www.phon.ucl.ac.uk/home/sampa/ -- Language-specific SAMPA transcriptions; used by MBROLA voices
1.  https://en.wikipedia.org/wiki/ARPABET -- Used as the basis of the CMU/FestVox voices
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/cxs.md -- Conlang X-SAMPA
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/kirshenbaum.md -- Kirshenbaum / ASCII-IPA
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/xsampa.md -- X-SAMPA

Pronunciation Dictionaries
1.  https://github.com/rhdunn/cmudict-tools -- python tools for working with CMU dictionary like pronunciation dictionaries
1.  https://github.com/rhdunn/cmudict -- historical view of the CMU pronunciation dictionary for American English
1.  https://github.com/rhdunn/amepd -- my attempts to clean up and extend the cmudict to make it more consistent

Formant Synthesizers
1.  http://www.fon.hum.uva.nl/david/ma_ssp/2010/Klatt-1980-JAS000971.pdf -- Dennis Klatt's original 1980 paper
1.  http://www.fon.hum.uva.nl/david/ma_ssp/doc/Klatt-1990-JAS000820.pdf -- Dennis Klatt's follow up 1990 paper

Creating a Voice
1.  http://festvox.org/cmu_arctic/ -- A set of 7 English voices with US, Canadian, Indian, and Scottish accents
1.  http://festvox.org/festvox/festvox_toc.html -- FestVox documentation on building a voice
1.  https://github.com/numediart/MBROLATOR -- MBROLA documentation on creating a voice
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/add_language.md -- eSpeak NG docs on adding a language; the other docs in the docs folder contains more information, and the documentation can definitely be improved

Join nvda@nvda.groups.io to automatically receive all group messages.