Re: Creating a new synthesizer
Reece H. Dunn
I'll have more later, but here is a start (welcome to the rabbit hole). A speech synthesizer voice typically consists of two parts: 1. the text to phonemes part; 2. the phonemes to audio part. The text to phonemes part typically consists of a dictionary mapping words to phonemes and a set of rules for how to pronounce certain word patterns (like "EE" in English). Phonemes (General) 1. https://en.wikipedia.org/wiki/International_Phonetic_Alphabet -- used by linguists for transcribing languages (see also all the references from this for phoneme theory) 1. https://en.wikipedia.org/wiki/Lexical_set (English) 1. https://en.wikipedia.org/wiki/International_Phonetic_Alphabet_chart_for_English_dialects -- see also the different IPA references for a given language Phoneme Transcription Schemes 1. https://www.phon.ucl.ac.uk/home/sampa/ -- Language-specific SAMPA transcriptions; used by MBROLA voices 1. https://en.wikipedia.org/wiki/ARPABET -- Used as the basis of the CMU/FestVox voices 1. https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/cxs.md -- Conlang X-SAMPA 1. https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/kirshenbaum.md -- Kirshenbaum / ASCII-IPA 1. https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/xsampa.md -- X-SAMPA Pronunciation Dictionaries 1. https://github.com/rhdunn/cmudict-tools -- python tools for working with CMU dictionary like pronunciation dictionaries 1. https://github.com/rhdunn/cmudict -- historical view of the CMU pronunciation dictionary for American English 1. https://github.com/rhdunn/amepd -- my attempts to clean up and extend the cmudict to make it more consistent Formant Synthesizers 1. http://www.fon.hum.uva.nl/david/ma_ssp/2010/Klatt-1980-JAS000971.pdf -- Dennis Klatt's original 1980 paper 1. http://www.fon.hum.uva.nl/david/ma_ssp/doc/Klatt-1990-JAS000820.pdf -- Dennis Klatt's follow up 1990 paper Creating a Voice 1. http://festvox.org/cmu_arctic/ -- A set of 7 English voices with US, Canadian, Indian, and Scottish accents 1. http://festvox.org/festvox/festvox_toc.html -- FestVox documentation on building a voice 1. https://github.com/numediart/MBROLATOR -- MBROLA documentation on creating a voice 1. https://github.com/espeak-ng/espeak-ng/blob/master/docs/add_language.md -- eSpeak NG docs on adding a language; the other docs in the docs folder contains more information, and the documentation can definitely be improved
|
|