Creating a new synthesizer


Quentin Christensen
 

Hi everyone,

We get asked about creating a new synthesizer from time to time - either simply for ANY synthesizer to support a particular language, or someone wants to create a brand new one, perhaps with their own voice.

Normally I suggest looking at the eSpeak NG project as a starting point.  I was wondering, does anyone have any useful links on how to go about this, that I can pass on when I get such inquiries, please?

I appreciate it's a very complex undertaking, which is why I'm asking here.  If I'm going to pass on anything, I'd rather pass on something of some quality, rather than just a random link off eBay.

Kind regards

Quentin.

--
Quentin Christensen
Training and Support Manager


 

You know that really does bring up an interesting project which seems forgotton.

What happened to our own synth the one called speech player.

Why did we stop developing it or did nv access stop developing it.

It probably won't be compatible with 1903 but thats utterly useless if development has stopped, so maybe we should start doing that again.

I mean I like espeak ng but just about every screen reader company has their own synth.

Dolphin uses orpheus, jaws uses eloquence.

Windoweyes used dectalk, microsoft stuff used microsoft stuff.

Nuance used vocaliser and eloquence for talks.

True, we use espeak which is used with linux as well but the point we started our own synth, and I think we should continue.

For whatever reason it just stopped.

The synth was ok sounding, probably a bit outdated and crappy now but still we really should bring it back for bragging rights at least.



On 13/09/2019 4:23 pm, Quentin Christensen wrote:
Hi everyone,

We get asked about creating a new synthesizer from time to time - either simply for ANY synthesizer to support a particular language, or someone wants to create a brand new one, perhaps with their own voice.

Normally I suggest looking at the eSpeak NG project as a starting point.  I was wondering, does anyone have any useful links on how to go about this, that I can pass on when I get such inquiries, please?

I appreciate it's a very complex undertaking, which is why I'm asking here.  If I'm going to pass on anything, I'd rather pass on something of some quality, rather than just a random link off eBay.

Kind regards

Quentin.

--
Quentin Christensen
Training and Support Manager


Sky Mundell
 

It would be wonderful if Speech Player could be developed again. I should also add that in the other comertial products you mentioned they did allow you to change the synthesizers and NVDA allows you to change the synthesizers as well.

 

From: nvda@nvda.groups.io [mailto:nvda@nvda.groups.io] On Behalf Of Shaun Everiss
Sent: Thursday, September 12, 2019 10:13 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] Creating a new synthesizer

 

You know that really does bring up an interesting project which seems forgotton.

What happened to our own synth the one called speech player.

Why did we stop developing it or did nv access stop developing it.

It probably won't be compatible with 1903 but thats utterly useless if development has stopped, so maybe we should start doing that again.

I mean I like espeak ng but just about every screen reader company has their own synth.

Dolphin uses orpheus, jaws uses eloquence.

Windoweyes used dectalk, microsoft stuff used microsoft stuff.

Nuance used vocaliser and eloquence for talks.

True, we use espeak which is used with linux as well but the point we started our own synth, and I think we should continue.

For whatever reason it just stopped.

The synth was ok sounding, probably a bit outdated and crappy now but still we really should bring it back for bragging rights at least.

 

 

On 13/09/2019 4:23 pm, Quentin Christensen wrote:

Hi everyone,

 

We get asked about creating a new synthesizer from time to time - either simply for ANY synthesizer to support a particular language, or someone wants to create a brand new one, perhaps with their own voice.

 

Normally I suggest looking at the eSpeak NG project as a starting point.  I was wondering, does anyone have any useful links on how to go about this, that I can pass on when I get such inquiries, please?

 

I appreciate it's a very complex undertaking, which is why I'm asking here.  If I'm going to pass on anything, I'd rather pass on something of some quality, rather than just a random link off eBay.

 

Kind regards

 

Quentin.

 

--

Quentin Christensen
Training and Support Manager

 


Gene
 

Bragging rites mean nothing unless the product is superior.  People don't care if NVDA has its own synthesizer or not and resources can be much better used.  It is a very specialized area to create a really good synthesizer and NVDA doesn't have the expertise or the resources to divert to such a product. 
 
JAWS never had its own voice.  Neither did Window-eyes nor does System Access.  Clearly, this is hardly something screen-reader developers or the consuming public are concerned about.
 
Gene

----- Original Message -----
Sent: Friday, September 13, 2019 12:12 AM
Subject: Re: [nvda] Creating a new synthesizer

You know that really does bring up an interesting project which seems forgotton.

What happened to our own synth the one called speech player.

Why did we stop developing it or did nv access stop developing it.

It probably won't be compatible with 1903 but thats utterly useless if development has stopped, so maybe we should start doing that again.

I mean I like espeak ng but just about every screen reader company has their own synth.

Dolphin uses orpheus, jaws uses eloquence.

Windoweyes used dectalk, microsoft stuff used microsoft stuff.

Nuance used vocaliser and eloquence for talks.

True, we use espeak which is used with linux as well but the point we started our own synth, and I think we should continue.

For whatever reason it just stopped.

The synth was ok sounding, probably a bit outdated and crappy now but still we really should bring it back for bragging rights at least.



On 13/09/2019 4:23 pm, Quentin Christensen wrote:
Hi everyone,

We get asked about creating a new synthesizer from time to time - either simply for ANY synthesizer to support a particular language, or someone wants to create a brand new one, perhaps with their own voice.

Normally I suggest looking at the eSpeak NG project as a starting point.  I was wondering, does anyone have any useful links on how to go about this, that I can pass on when I get such inquiries, please?

I appreciate it's a very complex undertaking, which is why I'm asking here.  If I'm going to pass on anything, I'd rather pass on something of some quality, rather than just a random link off eBay.

Kind regards

Quentin.

--
Quentin Christensen
Training and Support Manager


 

True, but my point was that just about every screen reader at least in the old days had their own flagship synth of choice right back to the dos days.

Just like every soundcard had an fm midi chip.

Even though thats strictly no longer necessary anymore, some old habbits die hard.

I would like us on nvda to have our own synth, because we started it and because we can and why not.

Just about everyone else had one, maybe not so much now but I remember the days when sound cards had fm midi chips and when you used x synth for x program.

Thats a bit more fluid now days but even so.



On 13/09/2019 5:15 pm, Sky Mundell wrote:

It would be wonderful if Speech Player could be developed again. I should also add that in the other comertial products you mentioned they did allow you to change the synthesizers and NVDA allows you to change the synthesizers as well.

 

From: nvda@nvda.groups.io [mailto:nvda@nvda.groups.io] On Behalf Of Shaun Everiss
Sent: Thursday, September 12, 2019 10:13 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] Creating a new synthesizer

 

You know that really does bring up an interesting project which seems forgotton.

What happened to our own synth the one called speech player.

Why did we stop developing it or did nv access stop developing it.

It probably won't be compatible with 1903 but thats utterly useless if development has stopped, so maybe we should start doing that again.

I mean I like espeak ng but just about every screen reader company has their own synth.

Dolphin uses orpheus, jaws uses eloquence.

Windoweyes used dectalk, microsoft stuff used microsoft stuff.

Nuance used vocaliser and eloquence for talks.

True, we use espeak which is used with linux as well but the point we started our own synth, and I think we should continue.

For whatever reason it just stopped.

The synth was ok sounding, probably a bit outdated and crappy now but still we really should bring it back for bragging rights at least.

 

 

On 13/09/2019 4:23 pm, Quentin Christensen wrote:

Hi everyone,

 

We get asked about creating a new synthesizer from time to time - either simply for ANY synthesizer to support a particular language, or someone wants to create a brand new one, perhaps with their own voice.

 

Normally I suggest looking at the eSpeak NG project as a starting point.  I was wondering, does anyone have any useful links on how to go about this, that I can pass on when I get such inquiries, please?

 

I appreciate it's a very complex undertaking, which is why I'm asking here.  If I'm going to pass on anything, I'd rather pass on something of some quality, rather than just a random link off eBay.

 

Kind regards

 

Quentin.

 

--

Quentin Christensen
Training and Support Manager

 


 

Probably not, but still we started.

Why did we stop.



On 13/09/2019 5:17 pm, Gene wrote:
Bragging rites mean nothing unless the product is superior.  People don't care if NVDA has its own synthesizer or not and resources can be much better used.  It is a very specialized area to create a really good synthesizer and NVDA doesn't have the expertise or the resources to divert to such a product. 
 
JAWS never had its own voice.  Neither did Window-eyes nor does System Access.  Clearly, this is hardly something screen-reader developers or the consuming public are concerned about.
 
Gene
----- Original Message -----
Sent: Friday, September 13, 2019 12:12 AM
Subject: Re: [nvda] Creating a new synthesizer

You know that really does bring up an interesting project which seems forgotton.

What happened to our own synth the one called speech player.

Why did we stop developing it or did nv access stop developing it.

It probably won't be compatible with 1903 but thats utterly useless if development has stopped, so maybe we should start doing that again.

I mean I like espeak ng but just about every screen reader company has their own synth.

Dolphin uses orpheus, jaws uses eloquence.

Windoweyes used dectalk, microsoft stuff used microsoft stuff.

Nuance used vocaliser and eloquence for talks.

True, we use espeak which is used with linux as well but the point we started our own synth, and I think we should continue.

For whatever reason it just stopped.

The synth was ok sounding, probably a bit outdated and crappy now but still we really should bring it back for bragging rights at least.



On 13/09/2019 4:23 pm, Quentin Christensen wrote:
Hi everyone,

We get asked about creating a new synthesizer from time to time - either simply for ANY synthesizer to support a particular language, or someone wants to create a brand new one, perhaps with their own voice.

Normally I suggest looking at the eSpeak NG project as a starting point.  I was wondering, does anyone have any useful links on how to go about this, that I can pass on when I get such inquiries, please?

I appreciate it's a very complex undertaking, which is why I'm asking here.  If I'm going to pass on anything, I'd rather pass on something of some quality, rather than just a random link off eBay.

Kind regards

Quentin.

--
Quentin Christensen
Training and Support Manager


Gene
 

JAWS and Window-eyes used almost identical synthesizers, among the largest in the business in earlier days.  JAWS used Eloquence, not its own developed synthesizer.  Window-eyes used Via Voice, almost identical in sound and performance, whatever technical difference the two synthesizers had. 
 
System Access to Go used Via Voice.  DOS screen-readers didn't have their own speech.  They supported a lot of synthesizers but they didn't come bundled with one.
 
Gene

----- Original Message -----
Sent: Friday, September 13, 2019 12:21 AM
Subject: Re: [nvda] Creating a new synthesizer

True, but my point was that just about every screen reader at least in the old days had their own flagship synth of choice right back to the dos days.

Just like every soundcard had an fm midi chip.

Even though thats strictly no longer necessary anymore, some old habbits die hard.

I would like us on nvda to have our own synth, because we started it and because we can and why not.

Just about everyone else had one, maybe not so much now but I remember the days when sound cards had fm midi chips and when you used x synth for x program.

Thats a bit more fluid now days but even so.



On 13/09/2019 5:15 pm, Sky Mundell wrote:

It would be wonderful if Speech Player could be developed again. I should also add that in the other comertial products you mentioned they did allow you to change the synthesizers and NVDA allows you to change the synthesizers as well.

 

From: nvda@nvda.groups.io [mailto:nvda@nvda.groups.io] On Behalf Of Shaun Everiss
Sent: Thursday, September 12, 2019 10:13 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] Creating a new synthesizer

 

You know that really does bring up an interesting project which seems forgotton.

What happened to our own synth the one called speech player.

Why did we stop developing it or did nv access stop developing it.

It probably won't be compatible with 1903 but thats utterly useless if development has stopped, so maybe we should start doing that again.

I mean I like espeak ng but just about every screen reader company has their own synth.

Dolphin uses orpheus, jaws uses eloquence.

Windoweyes used dectalk, microsoft stuff used microsoft stuff.

Nuance used vocaliser and eloquence for talks.

True, we use espeak which is used with linux as well but the point we started our own synth, and I think we should continue.

For whatever reason it just stopped.

The synth was ok sounding, probably a bit outdated and crappy now but still we really should bring it back for bragging rights at least.

 

 

On 13/09/2019 4:23 pm, Quentin Christensen wrote:

Hi everyone,

 

We get asked about creating a new synthesizer from time to time - either simply for ANY synthesizer to support a particular language, or someone wants to create a brand new one, perhaps with their own voice.

 

Normally I suggest looking at the eSpeak NG project as a starting point.  I was wondering, does anyone have any useful links on how to go about this, that I can pass on when I get such inquiries, please?

 

I appreciate it's a very complex undertaking, which is why I'm asking here.  If I'm going to pass on anything, I'd rather pass on something of some quality, rather than just a random link off eBay.

 

Kind regards

 

Quentin.

 

--

Quentin Christensen
Training and Support Manager

 


Luke Davis
 

On Fri, 13 Sep 2019, Shaun Everiss wrote:

Even though thats strictly no longer necessary anymore, some old habbits die hard.
Habit is no good reason to do a massive undertaking. As Gene said, unless we can make something worth having--I.E. better than Espeak in some fundamental way--what's the point? Bragging rights is not enough, if nobody actually wants what we're bragging about.

I would like us on nvda to have our own synth, because we started it and because we can and why not.
Gene already explained why not. It may not be a satisfactory answer--personally i would love for SpeechPlayer to be a viable synth that brings something interesting to the table--but the simple fact is that speech synth development takes a dedicated team of linguistic experts, audio experts, programmers, and years of effort dedicated to nothing else. NV Access has four people, and they have the task of overseeing the entire screen reader. If you can't give such an effort the development resources it deserves, you end up with an experiment and a novelty, which is what SpeechPlayer became.

It would be very nice if it could have been more, and maybe something will still happen one day, but there are so many good speech synth options right now, that the urgency and justification for dedicating the astonishingly limited resources available to such a project, just isn't there.

If someone wants to fund a team to do this development work, I'm sure NV Access would be happy to reconsider its priority; but until somebody does, it seems unlikely, for good reason, to continue at any kind of pleasing momentum.

Luke


Luke Davis
 

On Fri, 13 Sep 2019, Gene wrote:

JAWS and Window-eyes used almost identical synthesizers, among the largest in the business in earlier days.  JAWS used Eloquence, not its own developed
synthesizer.  Window-eyes used Via Voice, almost identical in sound and performance, whatever technical difference the two synthesizers had. 
Which was developed by IBM.

Ironically, the IBM Screenreader didn't even use their own speech, they used DEC hardware. The Myna palmtop, for example, back in the 90s, used the IBM screenreader and a torn-down DECTalk Express.

System Access to Go used Via Voice.  DOS screen-readers didn't have their own speech.  They supported a lot of synthesizers but they didn't come bundled
with one.
At least one did. Tiny Talk, by Eric Poelman (forgive me if I got his name wrong--it's been fifteen years since I had to remember that), could use the software speech available on a SB16 sound card. Remember Doctor Sbaitso, anyone? It could use that voice, which makes the worst robotic Espeak voice sound like silk.

https://en.wikipedia.org/wiki/Dr._Sbaitso

Luke


 

I remember the site I'm about to give you form ages ago. They are still at it. This is from a slightly different angle than what you might be looking for. Here it is:

www.modeltalker.org





On 9/13/2019 12:23 AM, Quentin Christensen wrote:
Hi everyone,

We get asked about creating a new synthesizer from time to time - either simply for ANY synthesizer to support a particular language, or someone wants to create a brand new one, perhaps with their own voice.

Normally I suggest looking at the eSpeak NG project as a starting point.  I was wondering, does anyone have any useful links on how to go about this, that I can pass on when I get such inquiries, please?

I appreciate it's a very complex undertaking, which is why I'm asking here.  If I'm going to pass on anything, I'd rather pass on something of some quality, rather than just a random link off eBay.

Kind regards

Quentin.

--
Quentin Christensen
Training and Support Manager


Reece H. Dunn
 


I'll have more later, but here is a start (welcome to the rabbit hole).

A speech synthesizer voice typically consists of two parts:
1.  the text to phonemes part;
2.  the phonemes to audio part.

The text to phonemes part typically consists of a dictionary mapping words to phonemes and a set of rules for how to pronounce certain word patterns (like "EE" in English).

Phonemes (General)
1.  https://en.wikipedia.org/wiki/International_Phonetic_Alphabet -- used by linguists for transcribing languages (see also all the references from this for phoneme theory)
1.  https://en.wikipedia.org/wiki/Lexical_set (English)
1.  https://en.wikipedia.org/wiki/International_Phonetic_Alphabet_chart_for_English_dialects -- see also the different IPA references for a given language

Phoneme Transcription Schemes
1.  https://www.phon.ucl.ac.uk/home/sampa/ -- Language-specific SAMPA transcriptions; used by MBROLA voices
1.  https://en.wikipedia.org/wiki/ARPABET -- Used as the basis of the CMU/FestVox voices
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/cxs.md -- Conlang X-SAMPA
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/kirshenbaum.md -- Kirshenbaum / ASCII-IPA
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/xsampa.md -- X-SAMPA

Pronunciation Dictionaries
1.  https://github.com/rhdunn/cmudict-tools -- python tools for working with CMU dictionary like pronunciation dictionaries
1.  https://github.com/rhdunn/cmudict -- historical view of the CMU pronunciation dictionary for American English
1.  https://github.com/rhdunn/amepd -- my attempts to clean up and extend the cmudict to make it more consistent

Formant Synthesizers
1.  http://www.fon.hum.uva.nl/david/ma_ssp/2010/Klatt-1980-JAS000971.pdf -- Dennis Klatt's original 1980 paper
1.  http://www.fon.hum.uva.nl/david/ma_ssp/doc/Klatt-1990-JAS000820.pdf -- Dennis Klatt's follow up 1990 paper

Creating a Voice
1.  http://festvox.org/cmu_arctic/ -- A set of 7 English voices with US, Canadian, Indian, and Scottish accents
1.  http://festvox.org/festvox/festvox_toc.html -- FestVox documentation on building a voice
1.  https://github.com/numediart/MBROLATOR -- MBROLA documentation on creating a voice
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/add_language.md -- eSpeak NG docs on adding a language; the other docs in the docs folder contains more information, and the documentation can definitely be improved


Quentin Christensen
 

Thank you Reece,

This is an excellent set of resources!

And yes, developing a TTS is a very extensive job, which is pretty much the main reason we didn't get to polishing speech player.

The code is freely available and it is open source if anyone is interested in working on it.

Kind regards

Quentin.

On Fri, Sep 13, 2019 at 5:04 PM Reece H. Dunn <msclrhd@...> wrote:

I'll have more later, but here is a start (welcome to the rabbit hole).

A speech synthesizer voice typically consists of two parts:
1.  the text to phonemes part;
2.  the phonemes to audio part.

The text to phonemes part typically consists of a dictionary mapping words to phonemes and a set of rules for how to pronounce certain word patterns (like "EE" in English).

Phonemes (General)
1.  https://en.wikipedia.org/wiki/International_Phonetic_Alphabet -- used by linguists for transcribing languages (see also all the references from this for phoneme theory)
1.  https://en.wikipedia.org/wiki/Lexical_set (English)
1.  https://en.wikipedia.org/wiki/International_Phonetic_Alphabet_chart_for_English_dialects -- see also the different IPA references for a given language

Phoneme Transcription Schemes
1.  https://www.phon.ucl.ac.uk/home/sampa/ -- Language-specific SAMPA transcriptions; used by MBROLA voices
1.  https://en.wikipedia.org/wiki/ARPABET -- Used as the basis of the CMU/FestVox voices
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/cxs.md -- Conlang X-SAMPA
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/kirshenbaum.md -- Kirshenbaum / ASCII-IPA
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/phonemes/xsampa.md -- X-SAMPA

Pronunciation Dictionaries
1.  https://github.com/rhdunn/cmudict-tools -- python tools for working with CMU dictionary like pronunciation dictionaries
1.  https://github.com/rhdunn/cmudict -- historical view of the CMU pronunciation dictionary for American English
1.  https://github.com/rhdunn/amepd -- my attempts to clean up and extend the cmudict to make it more consistent

Formant Synthesizers
1.  http://www.fon.hum.uva.nl/david/ma_ssp/2010/Klatt-1980-JAS000971.pdf -- Dennis Klatt's original 1980 paper
1.  http://www.fon.hum.uva.nl/david/ma_ssp/doc/Klatt-1990-JAS000820.pdf -- Dennis Klatt's follow up 1990 paper

Creating a Voice
1.  http://festvox.org/cmu_arctic/ -- A set of 7 English voices with US, Canadian, Indian, and Scottish accents
1.  http://festvox.org/festvox/festvox_toc.html -- FestVox documentation on building a voice
1.  https://github.com/numediart/MBROLATOR -- MBROLA documentation on creating a voice
1.  https://github.com/espeak-ng/espeak-ng/blob/master/docs/add_language.md -- eSpeak NG docs on adding a language; the other docs in the docs folder contains more information, and the documentation can definitely be improved



--
Quentin Christensen
Training and Support Manager


Reece H. Dunn
 

More links, resources, and information...

# Natural Language Processing -- Lexers

A lexer or tokenizer (e.g. a Finite State Machine or FSM) will tokenize a string of text -- splitting that text into sequences of characters that represent a full stop, comma, word, number, etc. Things to consider:
1.  Unicode General_Category property.
1.  Unicode Script property (e.g. for mixed Hiragana, Katakana, and Kanji in Japanese).

Be aware that some word sequences can be contracted in speech ("we will" to "we'll") and there can be other suprasegmental (across word) pronunciations (e.g. the "d" and "j" (/dZ/) in "Said John" are typically geminated [1]).

[1] https://en.wikipedia.org/wiki/Gemination

1. https://www.unicode.org/versions/components-12.1.0.html -- Includes Unicode character and emoji data
1. http://cldr.unicode.org/index/downloads/cldr-35 -- Unicode Common Locale Data Repository (includes emoji translations)

# Natural Language Processing -- Part of Speech Tagging

This is used to differentate words with the same pronunciation, but different pronunciations or stresses ("the object" vs "I object", read, lead, "St Noun" vs "Noun St" vs "East St Noun St", etc.). This includes context/usage (as in Chinese "Tai Chi" vs Greek "Chi squared").

In the modern state of the art, this is typically done by a Hidden Markov Model (HMM).

1. https://en.wikipedia.org/wiki/Part-of-speech_tagging
1. https://www.freecodecamp.org/news/an-introduction-to-part-of-speech-tagging-and-the-hidden-markov-model-953d45338f24/
1. https://www.nltk.org/book/ -- Natural Language Processing in Python (see chapter 5 for part of speech tagging)

# Numbers, Abbreviations, etc.

This is identifying the correct context and insering words in place of the numbers, abbreviations, or other contractions. For example, replacing "214" with "two hundred and fourteen".

1. https://en.wikipedia.org/wiki/Comparison_of_American_and_British_English#Numbers
1. https://en.wikipedia.org/wiki/Names_of_large_numbers
1. http://home.kpn.nl/vanadovv/BignumbyN.html -- English large numbers
1. http://home.kpn.nl/vanadovv/Bignum.html -- Dutch large numbers

# Natural Language Processing -- Stemmers

A stemmer is an algorithm that identifies and removes prefices and suffices. This can be used to identify the prefices, suffices, and base words to be pronounced. The classic stemmer is the Porter stemmer.

1. https://tartarus.org/martin/PorterStemmer/def.txt -- An algorithm for suffix stripping, 1980.
1. https://tartarus.org/martin/PorterStemmer/
1. http://snowball.tartarus.org/ -- stemmers in other languages

# Grapheme to Phoneme Translation

1.  http://handle.dtic.mil/100.2/ADA021929 -- Automatic Translation of English Text to Phonetics by Means of Letter-to-Sound Rules (NRL Report 7948), 1976.

# Audio Synthesis / Vocoders

A vocoder (voice encoder/decoder) is a device or application that encodes and decodes voice audio. This can be used in telephone systems, music (e.g. Cher's Believe), or speech synthesis.

A diphone database (e.g. MBROLA) stores audio for phonemes in pairs, from the midpoints of each. This makes it easier to join the audio with reduced artifacts.

1. https://en.wikipedia.org/wiki/Linear_predictive_coding -- Linear Predictive Coding (LPC) is the basis for a number of diphone synthesizers
1. Residual-excited LPC (RELP) vocoder. This is the model used by the festvox/festival diphone voices.
1. https://en.wikipedia.org/wiki/PSOLA -- This is the other method typically used in concatenative synthesizers (adding audio segments together). A variant of this is used by MBROLA.

More recent vocoders (like wavenet) use neural networks. I'm not as familiar with this approach.

It is also useful to be familiar with Digital Signal Processing (DSP) techniques and terminology, e.g. spectrum and formants.

# Some YouTube Resources

1. https://www.youtube.com/watch?v=xzL-pxcpo-E -- Prof. Simon King - Using Speech Synthesis to give Everyone their own Voice
1. https://www.youtube.com/channel/UCvn_XCl_mgQmt3sD753zdJA -- Rachel's English for American English pronunciation. Includes demonstrations on how the mouth moves when producing vowels, etc.
1. https://www.youtube.com/channel/UCMk_WSPy3EE16aK5HLzCJzw -- NativLang. Has information about the structure and pronunciation of different languages, especially anciant and difficult languages.
1. https://www.youtube.com/user/LinguisticsMarburg -- The Virtual Linguistics Campus. Has resources on different aspects of linguistics and phonology. It also has a series on the evolution of English.
1. https://www.youtube.com/channel/UCXCxNFxw6iq-Mh4uIjYvufg -- Jackson Crawford. Old Norse with some information on related languages (Old English, Old Icelandic, Old Norwegian).

Kind regards,
Reece