A few questions about NVDA, eSpeak NG, and languages (non-programer)


Daniel Parker
 
Edited

[Version note: In this general query and my notes that follow, I will be referring both to NVDA 2020.4 and the most recent beta of NVDA 2021.1; if no version is specified, assume the comment applies to both.]
 
Hello all,
 
I would first like to say that I'm a newcomer to the group, and I very much appreciate the level of dedication, enthusiasm, and zeal of all the contributors who share my dream of a free screen reader which not only shows up the best in paid software, but also has possibly the best language support of any reader I've come across, and that's saying something! This question is primarily aimed at developers, but help/knowledge from anyone is very welcome.
 
I am now mostly a Mac user, but before 2017 I primarily used Windows, and still occasionally do for certain programs or, yes, viewing text in languages not supported by VoiceOver. Ever since 2016, when I was told about NVDA's rapid progress and tried it for myself, I have continually been impressed by the huge and ever-expanding list of TTS languages supported, especially through eSpeak NG. I am aware that said synthesizer is its own independent project, but I do have questions about how much modification the NVDA team, or community, does to the synthesizer's output before packaging eSpeak with a release. To preface, I have absolutely no knowledge of programing or development, and I will likely ask foolish questions and make foolish assumptions, for which I apologize.
 
[Note: I have developed some informal notes about eSpeak TTS languages, including which have translated what ASCII symbols, which have phonetic problems, etc. They will be at the bottom of this message if anyone cares.]
 
First, I have a question regarding TTS ASCII symbol translation. Some eSpeak languages have translated numbers, punctuation symbols (e.g., dot/period, comma, colon), and also symbols like the carriage return. Some languages only translate numbers but leave the English punctuation symbols. And some don't translate either. One language, Hebrew, translates only punctuation (more on Hebrew later).
 
I am American, and so I find some of NVDA's default punctuation names (dot, tick, bang, semi) a bit idiosyncratic as compared with, say, Jaws 'period, apostrophe, exclaim, semicolon'. But when dealing with the TTS languages that do not have punctuation translated, it's always the NVDA set of names. I am wondering whether or not those names originate with NVDA or eSpeak (or perhaps both), and thus whether it is the NVDA community or the eSpeak community which provides those translations. I also have this question about TTS languages like Hebrew, Japanese and Burmese which, instead of defaulting to eSpeak English for Latin-script words, spell out all Latin letters. Is that a function of how NVDA uses or packages eSpeak, or eSpeak itself?
 
Second, as I mentioned, I am no programer. However, I do have some knowledge of linguistics and phonology from having taken a few linguistics classes in university, as well as having studied as a hobby on my own time. I have found certain TTS languages with sometimes glaring phonetic problems, some of which I know I could fix easily if only I knew how to program. These include doubled phonemes which should actually be lengthened, incorrect prosody/stress, diphthongs in some languages which could be better handled, and in the case of Haitian Creole, a plainly incorrect phonemic inventory, full stop. I am aware that this is most likely under the purview of the eSpeak NG community, but this leads back to the first question about translation. While I regrettably do not speak any minority language and so cannot translate the interface, it might be possible for me to look up what  punctuation mark A or number A is called in language X and add it to the speech output. But this is where I'm unsure. If I want to help with eSpeak NG phonemics, I'm going to have to learn Speech Synthesis Markup Language (SSML) and probably also some C, which eSpeak is written in. But if any of the things I have mentioned (along with languages with scrambled letters) are controlled by NVDA, then I will likely also have to learn Python. Learning three programing languages seems like a lot for a complete beginner. If that's what it takes I'll do my best, but I want to be sure of what I need to do. I guess I'm just wondering if developers or the NVDA Team have ideas about what would be the best way to prepare myself for this work, which I am very passionate about doing, but for which I lack knowledge as yet.
 
I am also aware that eSpeak has many more TTS languages that NVDA does not include, at least according to a cursory glance at Wikipedia and Github. Is there a reason for this, or a process by which the NVDA Team or community determines which new languages are ready to be included in the next release?
 
Thank you very much in advance for your responses to these elementary questions, and for this excellent community that has worked so hard to realize our shared goal.
 
All the best,
Daniel
 
------------------
 
eSpeak NG TTS language notes
 
Numbers and punctuation translated: Arabic1, Aragonese, Bulgarian, Catalan, Czech, Danish, German, Greek, Spanish (both versions), Persian (both versions)2, Finnish, French (all versions), Gaelic (Irish), Hindi2, Hungarian, ), Armenian (West Armenian), Icelandic, Italian, Japanese, Georgian, Kannada, Korean, Kyrgyz, Lithuanian, Macedonian, Dutch, Polish, Portuguese (Brazil), Romanian, Russian (both versions), Slovak, Slovenian, Albanian, Serbian, Swedish, Tamil, Turkish, Ukrainian, Vietnamese (all versions).
 
Only numbers translated: Afrikaans, Amharic, Assamese, Azerbaijani, Bashkir,, Bengali, Bishnupriya Manipuri, Bosnian, Chinese Mandarin (both versions), Welsh, Esperanto, Estonian, Basque, Gaelic (Scottish), Hakka Chinese, Hawaiian, Armenian (East Armenian),  Interlingua, Indonesian, Ido, Lojban, Greenlandic3, Konkani, Kurdish, Latin, Lingua Franca Nova, Latgalian, Latvian, Malayalam, Marathi, Malay, Maltese, Maori (as of beta version), Myanmar Burmese, Norwegian Bokmal, Nahuatl Classical, Nepali, Oromo, Oriya, Punjabi, Papiamento, Klingon, Portuguese (Portugal), Lang Belta, Pyash, Quechua, Shantaiyai, Sindhi, Sinhala, Swahili, Telugu, Thai, Setswana, Tatar, Uzbek, Urdu, Chinese (Cantonese) (both versions).
 
Only punctuation translated: Hebrew1
None translated: Cherokee, Chuvash, Guarani, Greek (Ancient), Kazakh, Nogai, Turkmen. 
 
Phonetic improvements (as of the most recent beta of 2021.1): Arabic, Maori.
 
Latin words spelled: Bashkir, Chuvash, Hebrew, Armenian (West Armenian), Japanese, Konkani, Kyrgyz, Myanmar Burmese, Shantaiyai.
 
Latin letters scrambled: Greek (Ancient)
 
Phonemic problems (pronunciations in X-Sampa; not exhaustive):
 
Arabic: vowel qualities and some consonants are not accurate. Long vowels often doubled as of 2020.4; this is better as of the most recent beta. Stress needs work.
Estonian: Lengthened phonemes doubled.
Gaelic (Irish): some. For example, 'ae' should not make the following consonant slender; 'aei' serves that purpose. 'Aei' should sound like [e:], not [e:I].
Gaelic (Scottish): some, possibly. R sounds strange in many circumstances; need to investigate.
Hawaiian: Some diphthongs could use work. The phonemic inventory may not actually reflect current spoken Hawaiian.
Haitian Creole: Major problems. No nasal vowels. Several digraphs are incorrectly pronounced. 'Ou' is pronounced [o"u] instead of [u], 'ch' is pronounced [Sh] instead of [s], 'j' is pronounced [j] instead of [Z], 'ui' is pronounced [u"i] instead of [Hi], and 'r' is pronounced [h] instead of [R] or [G]. In addition, the number 1 is incorrectly pronounced [y.o"un] instead of the correct [ju~].
Italian: minor; single 'r' between vowels could be [4] instead of [r] and everywhere else 'r' and double 'rr' might be [r] instead of [r:]. Some other phonemes like 'ff' could be lengthened, not doubled. Check raddoppiamento sintattico.
Japanese: in some cases like in the numbers, alveolar consonants before [i] are not palatalized. Lengthened consonants are incorrectly doubled.
Maltese: Some, but I do not know enough yet to fully document.
Portuguese (Portugal): 'r' except between vowels and double 'rr' could be [r] or [R] instead of [r\].
Romanian: 'r' could more distinctly be [4] or [r] rather than [r\]
Lue Saami: lengthened phonemes doubled?
Setswana: unsure about vowel qualities. Lengthened phonemes doubled.
 
Sounds for some native, latin or foreign letter names: Cherokee, Greek (ancient), Hakka Chinese, Hawaiian, Haitian Creole, Lojban, Georgian, Maori (beta), Nahuatl Classical, Quiche, Uyghur, 
 
Pronunciation of foreign letter names in the middle of words, or native letters scrambled when foreign letters appear: Cherokee, Chinese Mandarin (Latin as pinyin), Greek (Ancient), Hakka Chinese, Hawaiian, Maori (as of beta version),  Klingon, Lang Belta, Quiche, Lue Saami, Turkmen, Uyghur, Uzbek.
 
Alphabet name (e.g., Cyrillic) appended to native letters: Kyrgyz, Nogai. 
 
English letter names: Chinese Mandarin (Latin as Pinyin), Persian Pinglish, Gaelic (both versions), Kannada, Maori (new version), Malayalam, Malay, Shantaiyai, Tamil, Telugu, Setswana (some).
 
Other problems:
Cherokee: Does not recognize 'q' as a native letter; pronounces letter name in the middle of words, which is disruptive. Instead of reverting to English, words with non-Cherokee Latin letters are scrambled.
Greek (ancient): cannot pronounce names of letters and letters with diacritics (modern Greek is fully able to do this); instead, uses the sound of the letter or the encoding number for some letters with diacritics. Reads with distinct pause between words. Pronounces most Greek words fine, but no tone distinctions. Pronounces letter name of 'c' in the middle of words.
Hakka Chinese: scrambles the following Latin letters in the middle of a word: B, C, D, G, J, Q, R, W, X, Z. Some is understandable since Hakka was likely developed for the Latin orthography.
Hawaiian: Appears to be based on a Slavic-language inventory. The use of 'y' in a word is disruptive.
Lang Belta: when foreign letters appear, spells in phonetic alphabet, but maybe this is intentional?
Quechua: names for 'k' and 'q' are not distinct. May be based partially on Slavic inventory?
Tatar: does not recognize some native Cyrillic letters; uses encoding number.
 
1) Tokenization for Hebrew and Arabic is obviously a major problem. Punctuation and numbers, when translated at all, are not voweled and make no sense. Perhaps adding vowels to the specifications would solve this.
2) Persian and Hindi do not translate all punctuation. Also say a few random marks out loud, such as comma and the right parenthesis.
3) in some cases, Greenlandic uses an approximation of Danish numbers (e.g., when reading by character), but otherwise it uses its own (e.g., when reading by line).


Louder Pages
 

Daniel,

Have you come across RHVoice? It is an Open Source TTS with several languages.  RHVoice has its own NVDA add-on.

https://github.com/RHVoice

It uses HTS technology which provides more natural voices than e-speak.

We here at https://louderpages.org  are just completing a Macedonian voice commissioned by the government of N.Macedonia, and are starting work on Albanian.  

Reading about your interest in linguistics and phonetics, I think we could use your help in future projects.  We use a mix of U.K. and U.S. English as a working language, so we can save you from those Aussie punctuation marks. :)

I hope you will get in touch -  inq@...

Kind regards, Mark


Luis Carlos González Moráles
 

Wow great collaboration with the government! I'm just waiting answers in creating a spanish language voice for the synth.

Louder Pages wrote:

Daniel,

Have you come across RHVoice? It is an Open Source TTS with several languages.  RHVoice has its own NVDA add-on.

https://github.com/RHVoice

It uses HTS technology which provides more natural voices than e-speak.

We here at https://louderpages.org  are just completing a Macedonian voice commissioned by the government of N.Macedonia, and are starting work on Albanian.  

Reading about your interest in linguistics and phonetics, I think we could use your help in future projects.  We use a mix of U.K. and U.S. English as a working language, so we can save you from those Aussie punctuation marks. :)

I hope you will get in touch -  inq@...

Kind regards, Mark


Daniel Parker
 

On Wed, Jun 2, 2021 at 10:19 PM, Louder Pages wrote:
Daniel,

Have you come across RHVoice? It is an Open Source TTS with several languages.  RHVoice has its own NVDA add-on.

https://github.com/RHVoice

It uses HTS technology which provides more natural voices than e-speak.

We here at https://louderpages.org  are just completing a Macedonian voice commissioned by the government of N.Macedonia, and are starting work on Albanian.  

Reading about your interest in linguistics and phonetics, I think we could use your help in future projects.  We use a mix of U.K. and U.S. English as a working language, so we can save you from those Aussie punctuation marks. :)

I hope you will get in touch -  inq@...

Kind regards, Mark


Daniel Parker
 

This was done in error; disregard.


Daniel Parker
 

Absolutely will do Mark. Expect an email shortly. However, I'm still interested in what the NVDA team has to say on this regarding their use of eSpeak. No disrespect to your project, but eSpeak is still good, and I don't mind robotic voices, as many blind people will tell you. If a response from them is not forthcoming I may contact them directly.

Best,
Daniel


Robert Mendoza
 

Hi,


Looking at the link mentioned below however could not find the RH voices addons files for nvda, and for English voice. I appreciate if you could share the direct link instead. Thanks.


Kind regards,


Robert

On 6/3/2021 9:00 AM, Louder Pages wrote:
Daniel,

Have you come across RHVoice? It is an Open Source TTS with several languages.  RHVoice has its own NVDA add-on.

https://github.com/RHVoice

It uses HTS technology which provides more natural voices than e-speak.

We here at https://louderpages.org  are just completing a Macedonian voice commissioned by the government of N.Macedonia, and are starting work on Albanian.  

Reading about your interest in linguistics and phonetics, I think we could use your help in future projects.  We use a mix of U.K. and U.S. English as a working language, so we can save you from those Aussie punctuation marks. :)

I hope you will get in touch -  inq@...

Kind regards, Mark


Louder Pages
 

Hi Robert,  You are looking for the RHVoice NVDA add-on, right? Try this page. https://github.com/RHVoice/RHVoice/blob/master/doc/en/Binaries.md

It also has links to the English voices, which in my opinion are not as good as other languages . The male voice, Alan is best I think, although it is a Scottish, not US accent. 

We here at Louder Pages collaborate closely with RH Voice and would like to make an improved English voice.  But, so many more deserving languages to cover! LOL

- Mark


Luis Carlos González Moráles
 

You said you're working on spanish voices. I've sent you a mail but didn't answer. can I be applyed?

Louder Pages wrote:

Hi Robert,  You are looking for the RHVoice NVDA add-on, right? Try this page. https://github.com/RHVoice/RHVoice/blob/master/doc/en/Binaries.md

It also has links to the English voices, which in my opinion are not as good as other languages . The male voice, Alan is best I think, although it is a Scottish, not US accent. 

We here at Louder Pages collaborate closely with RH Voice and would like to make an improved English voice.  But, so many more deserving languages to cover! LOL

- Mark


Louder Pages
 

Hi Daniel, 

In case you don't hear back from the NVDA team soon, can I ask:  Have you looked at the affiliation of the authors of each of those E-speak NG voices?  I think for the most part that those authors may be unaware of NVDA's existence.  Any phonetic issues are the responsibility of the e-speak maintainers and not, I would  suggest, within  NVDA's purview.

And coming back to RHVoice. Do you prefer robotic voices because they can be played at speed?  HTS voices are playable at high speed intelligibly too. 

- Mark


Luke Davis
 

Daniel Parker wrote:

Absolutely will do Mark. Expect an email shortly. However, I'm still interested in what the NVDA team has to say on this regarding their use of eSpeak. No
Most of NV Access does not participate on this list. Therefore, if you want to know what the NVDA team has to say, you may want to post your message where they are more likely to see it, such as either as a github issue, or on the nvda-devel list.

https://github.com/nvaccess/nvda/issues

https://groups.io/g/nvda-devel

Luke


Louder Pages
 

Luis, I'm glad you made contact with me through email.

Our Spanish work has been suspended for a while now.  We are looking for a good reason to resume.

Below is a link to a short mp3. It is an example of our initial experiment. It is a European, Madrid-area accent.  It should be easy to move to Lat-Am accents. But, I have been told that a cecero accent is helpful to blind Latin-Americans for making clear the spelling of words.

https://louderpages.org/imgs/audio/carlos_1.mp3


Daniel Parker
 

Thank you for this, Luke. Will post there.


Daniel Parker
 

Mark,

I posted this here under the assumption that what you said about the authors of eSpeak is correct. However, I wanted to know exactly what NVDA does to eSpeak before packaging it with a release, whether they are in charge of ASCII symbol translation and the like. I will post the question to the development group as Luke suggested. To your point about HTS voices being intelligible at speed, fair enough. I do still think eSpeak is a worthy project. I hope it won't raise too many eyebrows with your group if I end up working on both projects in some capacity.

Speaking of, in your initial message you wrote the email address as "inq@...", without the s. On your website, the s in "Louderpages" is included. I just wanted to make sure which is correct before dropping you a line.

Best,
Daniel


Louder Pages
 

Daniel, Sorry about the email issue - yes inq@... with an "S".

I meant to say - you could check out RHVoice's Tatar  language to see how it compares with that issue you identified with Espeak-NG.

Meanwhile, if the other readers will indulge me, I would like to show off our latest voice, Kiko.  It will make an appearance with an NVDA and SAPI download in a week or two. 

Here are some comparison clips for people who are interested.  Same text, three ways:  E-Speak, Kiko, Kiko double speed.   I hope the E-speak m4a format works for people.  E-speak is so far the only Macedonian TTS available.  Until now.

https://louderpages.org/imgs/audio/mk_espeak_c19.m4a

https://louderpages.org/imgs/audio/mk_kiko_c19.mp3

https://louderpages.org/imgs/audio/mk_kikox2_c19.mp3

- Mark


David Goldfield
 

Hi, Mark.

It has been perhaps over a decade since I had used these voices until your most recent message reminding us that they were still available. I was intrigued and so I downloaded all of them. I personally prefer BDL as it offers very good high frequency response. I hope that you might consider making the voices a bit easier to locate from the Github page as some users might have a difficult time in understanding which addons are needed and how to best find them. Many thanks for making these available.

 

David Goldfield,

Blindness Assistive Technology Specialist

JAWS Certified, 2019

Subscribe to the Tech-VI announcement list to receive emails regarding news and events in the blindness assistive technology field.

Email: tech-vi+subscribe@groups.io

 

www.DavidGoldfield.org

 

 

 

From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Louder Pages
Sent: Friday, June 4, 2021 6:38 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] A few questions about NVDA, eSpeak NG, and languages (non-programer)

 

Daniel, Sorry about the email issue - yes inq@... with an "S".

I meant to say - you could check out RHVoice's Tatar  language to see how it compares with that issue you identified with Espeak-NG.

Meanwhile, if the other readers will indulge me, I would like to show off our latest voice, Kiko.  It will make an appearance with an NVDA and SAPI download in a week or two. 

Here are some comparison clips for people who are interested.  Same text, three ways:  E-Speak, Kiko, Kiko double speed.   I hope the E-speak m4a format works for people.  E-speak is so far the only Macedonian TTS available.  Until now.

https://louderpages.org/imgs/audio/mk_espeak_c19.m4a

https://louderpages.org/imgs/audio/mk_kiko_c19.mp3

https://louderpages.org/imgs/audio/mk_kikox2_c19.mp3

- Mark