Re: regular expression and speech dic


Mr. Wong Chi Wai, William <cwwong.pro@...>
 

Haha, oh that is the problem.
I refined your suggestion as
(lv)([^\u4e00-\u9fa5a-z0-9]+)
where \u4e00-\u9fa5 I found from the web is the code for Chinese word; and replace as level\2
it turn out that
the lv in 你的lv是? will read as level
but the lv in lv121 will not.

I wish to adopt your suggestion, but I have to include the option of no Chinese, no english and no numbers follow by the "lv"

 
lv 121





Brian Vogel 於 24/8/2018 11:02 寫道:

William,

           You need to read up more on regular expression syntax.  As I explained earlier, the "lv" matches the string "lv" and [^a-z] matches one, and only one, character that is not in the range lowercase a to z.  So your expression, when the string being matched is "lv12", matches "lv" as the first match, and "1" as the second match.  You must use a quantifier after a character range if you want it to match zero or more, which would be '.', zero or 1, which would be '?', or one or more, which would be '+'.

           Your regular expression matches the lv and eats it then matches a SINGLE character that follows "lv" that is not a character between lowercase a and z, and that's all.  In addition it "eats" those characters as part of the match, leaving you only with 2.

           You also need to understand that regular expressions "eat" the things they match, and if you want to use those things later you MUST enclose the matching sequence in parentheses to refer to them in the replacement.  In the case of 'lv' you can safely toss that away because you know you want "level" for that regardless of what follows it.  You can't say the same of the digit sequence that follows the lv, which you most likely want to have read out as the number it represents.

Again, I offer what I said last night:
--------------------------------------------------------------------------
          Your problem is that the "lv" matches those two characters exactly, but "(^a-z)" captures ONE character that is not lower case a through lower case z.

           If you want something of the form "lv12" pronounced as "level 12" you would use the regular expression:   lv([0-9]+)
and the substitution would be:   level \2

           The regular expression I gave says, "Match the characters 'lv' literally, then match one or more repetitions of the numeric digit characters, saving them for later use, which is what the parentheses around that part of the expression does.   Since you know the "lv" is to be pronounced "level," there's no need to capture it for later use, but you need the digit sequence, whatever it may be, to be spoken later, and the parentheses allow it to be referred to later, as a unit as "\2" [the second match, where "lv" was the first match, but not saved because there were no parentheses around it].
--------------------------------------------------------------------------
--

Brian - Windows 10 Home, 64-Bit, Version 1803, Build 17134  

    A little kindness from person to person is better than a vast love for all humankind.

           ~ Richard Dehmel

 

 


Join nvda@nvda.groups.io to automatically receive all group messages.