[Version note: In this general query and my notes that follow, I will be referring both to NVDA 2020.4 and the most recent beta of NVDA 2021.1; if no version is specified, assume the comment applies to both.]
Hello all,
I would first like to say that I'm a newcomer to the group, and I very much appreciate the level of dedication, enthusiasm, and zeal of all the contributors who share my dream of a free screen reader which not only shows up the best in paid software, but also has possibly the best language support of any reader I've come across, and that's saying something! This question is primarily aimed at developers, but help/knowledge from anyone is very welcome.
I am now mostly a Mac user, but before 2017 I primarily used Windows, and still occasionally do for certain programs or, yes, viewing text in languages not supported by VoiceOver. Ever since 2016, when I was told about NVDA's rapid progress and tried it for myself, I have continually been impressed by the huge and ever-expanding list of TTS languages supported, especially through eSpeak NG. I am aware that said synthesizer is its own independent project, but I do have questions about how much modification the NVDA team, or community, does to the synthesizer's output before packaging eSpeak with a release. To preface, I have absolutely no knowledge of programing or development, and I will likely ask foolish questions and make foolish assumptions, for which I apologize.
[Note: I have developed some informal notes about eSpeak TTS languages, including which have translated what ASCII symbols, which have phonetic problems, etc. They will be at the bottom of this message if anyone cares.]
First, I have a question regarding TTS ASCII symbol translation. Some eSpeak languages have translated numbers, punctuation symbols (e.g., dot/period, comma, colon), and also symbols like the carriage return. Some languages only translate numbers but leave the English punctuation symbols. And some don't translate either. One language, Hebrew, translates only punctuation (more on Hebrew later).
I am American, and so I find some of NVDA's default punctuation names (dot, tick, bang, semi) a bit idiosyncratic as compared with, say, Jaws 'period, apostrophe, exclaim, semicolon'. But when dealing with the TTS languages that do not have punctuation translated, it's always the NVDA set of names. I am wondering whether or not those names originate with NVDA or eSpeak (or perhaps both), and thus whether it is the NVDA community or the eSpeak community which provides those translations. I also have this question about TTS languages like Hebrew, Japanese and Burmese which, instead of defaulting to eSpeak English for Latin-script words, spell out all Latin letters. Is that a function of how NVDA uses or packages eSpeak, or eSpeak itself?
Second, as I mentioned, I am no programer. However, I do have some knowledge of linguistics and phonology from having taken a few linguistics classes in university, as well as having studied as a hobby on my own time. I have found certain TTS languages with sometimes glaring phonetic problems, some of which I know I could fix easily if only I knew how to program. These include doubled phonemes which should actually be lengthened, incorrect prosody/stress, diphthongs in some languages which could be better handled, and in the case of Haitian Creole, a plainly incorrect phonemic inventory, full stop. I am aware that this is most likely under the purview of the eSpeak NG community, but this leads back to the first question about translation. While I regrettably do not speak any minority language and so cannot translate the interface, it might be possible for me to look up what punctuation mark A or number A is called in language X and add it to the speech output. But this is where I'm unsure. If I want to help with eSpeak NG phonemics, I'm going to have to learn Speech Synthesis Markup Language (SSML) and probably also some C, which eSpeak is written in. But if any of the things I have mentioned (along with languages with scrambled letters) are controlled by NVDA, then I will likely also have to learn Python. Learning three programing languages seems like a lot for a complete beginner. If that's what it takes, I'll do my best, but I want to be sure of what I need to do. I guess I'm just wondering if developers or the NVDA Team have ideas about what would be the best way to prepare myself for this work, which I am very passionate about doing, but for which I lack knowledge as yet.
I am also aware that eSpeak has many more TTS languages that NVDA does not include, at least according to a cursory glance at Wikipedia and Github. Is there a reason for this, or a process by which the NVDA Team or community determines which new languages are ready to be included in the next release?
Thank you very much in advance for your responses to these elementary questions, and for this excellent community that has worked so hard to realize our shared goal.
All the best,
Daniel
------------------
eSpeak NG TTS language notes
Numbers and punctuation translated: Arabic1, Aragonese, Bulgarian, Catalan, Czech, Danish, German, Greek, Spanish (both versions), Persian (both versions)2, Finnish, French (all versions), Gaelic (Irish), Hindi2, Hungarian, ), Armenian (West Armenian), Icelandic, Italian, Japanese, Georgian, Kannada, Korean, Kyrgyz, Lithuanian, Macedonian, Dutch, Polish, Portuguese (Brazil), Romania, Russian (both versions), Slovak, Slovenian, Albanian, Serbian, Swedish, Tamil, Turkish, Ukrainian, Vietnamese (all versions).
Only numbers translated: Afrikaans, Amharic, Assamese, Azerbaijani, Bashkir,, Bengali, Bishnapriya Manipuri, Bosnian, Chinese Mandarin (both versions), Welsh, Esperanto, Estonian, Basque, Gaelic (Scottish), Hakka Chinese, Hawaiian, Armenian (East Armenian), Interlingua, Indonesian, Ido, Lojban, Greenlandic3, Konkani, Kurdish, Latin, Lingua Franca Nova, Latgalian, Latvian, Malayalam, Marathi, Malay, Maltese, Myanmar Burmese, Norwegian Bokmal, Nahuatl Classical, Nepali, Oromo, Oriya, Punjabi, Papiamento, Klingon, Portuguese (Portugal), Pyash, Quechua, Shantaiyai, Sindhi, Sinhala, Swahili, Telugu, Thai, Setswana, Tatar, Uzbek, Urdu, Chinese (Cantonese) (both versions).
Only punctuation translated: Hebrew1
None translated: Cherokee, Chuvash, Guarani, Greek (Ancient), Kazakh, Nogai, Turkmen, Tatar,
Phonetic improvements (as of the most recent beta of 2021.1): Arabic, Maori,
Latin words spelled: Bashkir, Chuvash, Hebrew, Armenian (West Armenian), Japanese, Konkani, Kyrgyz, Shantiyai,
Latin letters scrambled: Greek (Ancient)
Phonemic problems (pronunciations in X-Sampa):
Arabic: vowel qualities and some consonants are not accurate. Long vowels often doubled as of 2020.4; this is better as of the most recent beta. Stress needs work.
Gaelic (Irish): some. For example, 'ae' should not make the following consonant slender; 'aei' serves that purpose. 'Aei' should sound like [e:], not [e:I].
Gaelic (Scottish): some, possibly. R sounds strange in many circumstances; need to investigate.
Hawaiian: Some diphthongs could use work. The phonemic inventory may not actually reflect current spoken Hawaiian.
Haitian Creole: Major problems. No nasal vowels. Several digraphs are incorrectly pronounced. 'Ou' is pronounced [o"u] instead of [u], 'ch' is pronounced [Sh] instead of [s], 'j' is pronounced [j] instead of [Z],and 'ui' is pronounced [u"i] instead of [Hi], 'r' is pronounced [h] instead of [R] or [G]. In addition, the number 1 is incorrectly pronounced [y.o"un] instead of the correct [ju~].
Italian: minor; single 'r' between vowels could be [4] instead of [r] and everywhelse 'r' and double 'rr' might be [r] instead of [r:]. Some other phonemes like 'ff' could be lengthened, not doubled. Check raddoppiamento sintattico.
Japanese: in some cases like in the numbers, alveolar consonants before [i] are not palatalized. Lengthened consonants are incorrectly doubled.
Maltese: Some, but I do not know enough yet to fully document.
Portuguese (Portugal): 'r' except between vowels and double 'rr' could be [r] or [R] instead of [r\].
Romanian: 'r' could more distinctly be [4] or [r] rather than [r\]
Lue Saami: lengthened phonemes doubled?
Setswana: unsure about vowel qualities. Lengthened phonemes doubled.
Sounds for some native, latin or foreign letter names: Cherokee, Greek (ancient), Hakka Chinese, Hawaiian, Haitian Creole, Lojban, Georgian, Maori (beta), Nahuatl Classical, Quiche, Uyghur,
Pronunciation of foreign letter names in the middle of words, or native letters scrambled when foreign letters appear: Cherokee, Chinese Mandarin (Latin as pinyin), Greek (Ancient), Hakka Chinese, Hawaiian, Maori (as of beta version), Klingon, Lang Belta, Quiche, Lue Saami, Turkmen, Uyghur, Uzbek.
Alphabet name (e.g., Cyrillic) appended to native letters: Kyrgyz, Nogai.
English letter names: Chinese Mandarin (Latin as Pinyin), Persian Pinglish, Gaelic (both versions), Kannada, Maori (new version), Malayalam, Malay, Shantaiyai, Tamil, Telugu, Setswana (some).
Other problems:
Cherokee: Does not recognize 'q' as a native letter; pronounces letter name in the middle of words, which is disruptive. Instead of reverting to English, words with non-Cherokee Latin letters are scrambled.
Greek (ancient): cannot pronounce names of letters and letters with diacritics (modern Greek is fully able to do this); instead, uses the sound of the letter or the encoding number for some letters with diacritics. Reads with distinct pause between words. Pronounces most Greek words fine, but no tone distinctions. Pronounces letter name of 'c' in the middle of words.
Hakka Chinese: scrambles the following Latin letters in the middle of a word: B, C, D, G, J, Q, R, W, X, Z. Some is understandable since Hakka was likely developed for the Latin orthography.
Hawaiian: Appears to be based on a Slavic-language inventory. The use of 'y' in a word is disruptive.
Lang Belta: when foreign letters appear, spells in phonetic alphabet, but maybe this is intentional?
Quechua: names for 'k' and 'q' are not distinct. May be based partially on Slavic inventory?
Tatar: does not recognize some native Cyrillic letters; uses encoding number.
1) Tokenization for Hebrew and Arabic is obviously a major problem. Punctuation and numbers, when translated at all, are not voweled and make no sense. Perhaps adding vowels to the specifications would solve this.
2) Persian and Hindi do not translate all punctuation. Also say a few random marks out loud, such as comma and the right parenthesis.
3) in some cases, Greenlandic uses an approximation of Danish numbers (e.g., when reading by character), but otherwise it uses its own (e.g., when reading by line).