How to spell out Roman numerals


 

On Mon, Nov 9, 2020 at 02:53 AM, Luke Davis wrote:
why in the world are we still using Roman Numerals?
-
1. Because they look pretty on a clock face.
2. Because they are handy to differentiate parts of Outline structure.
3. Because of the vagaries that are the history of writing, whether jargon or regular.   If I see the word stage followed by a Roman numeral I immediately think cancer or other medical staging while if it's an Arabic numeral I think something else, whether a physical location, one of a number of options (stage 3 as opposed to stage 2 in multi-stage performing arts center), etc.

No one ever said natural language, or its conventions, "make perfect sense."   Heck, natural language, whether spoken or written, always involves unresolvable abiguity on the reader's or listener's end or else "misunderstandings" in the literal sense would not be possible.
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


 

Janet,

           Thanks for the follow-up.  I am now looking at this closely due to an unexpected peculiarity in how the pattern matching in the NVDA dictionary works.  The not substituting for the single letter I will help quite a bit, as the rest of the roman numerals will work.  By the way, I just realized after typing out the rest of the message that the two entries for RN 5 and RN 10 should be entered in your dictionary BEFORE the one that covers all the other RNs due to the peculiarity I allude to.

            Because the pronoun I is now out of the picture, I will suggest that you use a variant on Luke's proposed regular expression for matching, as it eliminates a number of issues related to the peculiarity to which I make reference.  Since you have said you only need the Roman numerals one through ten, and we've now dropped one/the pronoun I, I am going to modify his regular expression only to accommodate that, and make it shorter for copy and paste.  Were you looking for any Roman numeral up to 9 "Roman digits" long, I wouldn't even consider touching it.  Give this a try as the pattern, and I can omit the colon part of matching now, too:

\b([IXV])([IXV])([I])?([I])?([a-zA-Z])?\b

and this as the replacement:

\1 \2 \3 \4 \5

Oddly enough, the above will work for every Roman numeral between two and nine, except 5, because 5 is a single V followed by nothing, and also omits RN ten, because it's a single letter X, so you need two additional entries to cover those two.  Here's the pattern for RN 5:

\b(V)([a-zA-Z])?\b

and the replacement:

\1 \2

and for RN 10:

\b(X)([a-zA-Z])?\b

and the replacement:

\1 \2

All of the above presume the Roman numeral with possible optional letter afterward will be free-standing, that is, with whitespace (whether space, tab, newline, etc.) immediately before or after, at a word boundary.  Essentially they're their own "words."  If you know that you could have any specific punctuation coming immediately afterward, and know what specific punctuation marks are involved, I'd need to tweak to account for that.

Unless you're dealing with a text that includes Greek letter names written out, the above should not prove problematic.
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Janet Brandly
 

Hi Brian,

 

Okay, I won’t install that list yet. With regard to the word ‘I” versus the Roman numeral I, because the word is more common in language than the Roman numeral, let’s just leave it as it is; i.e., NVDA will say the word “I” regardless and I will just have to doublecheck if I am working in a context where the Roman numeral may be a possibility. Hopefully that makes it easier all around.

 

Janet

 

From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Brian Vogel
Sent: November 8, 2020 3:20 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

 

Janet,

            If you have not already entered those rules hold off on doing so for two reasons:

1.  I have just learned that dictionary processing does not terminate after the first successful match, but continues down the list, using the substituted text for subsequent processing each time a match is found, until the end of the list is reached.  So I need to consider the order again to make this successful.

2.  It's likely I can remove the trailing colon requirement for all but Roman numeral one, unless it works better to assume it's there, then that needn't be changed

The big problem here is really the Roman numeral one, and having some very clear way of differentiating it from the pronoun I, and that is not at all easy sans some other delimiter used with it to make that differentiation clear.

I'm not going to work on this further until I see a response from you, as it's clear that this is going to be an iterative process until we nail down precisely what will work based on the text you're generally processing, which looks to be medical in nature.
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Janet Brandly
 

Hi Brian,

 

Thank you for doing this. I will try it and let you know how it works.

 

Janet

 

 

 

 

 

From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Brian Vogel
Sent: November 7, 2020 7:07 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

 

Janet,

Below is the list of 10 regular expressions, followed by what you use for the replacement, that you need to enter in your speech dictionary in the order listed.  I emphasize again: in the order listed.

This is important because I believe (and am waiting for confirmation) that the speech dictionary (or any dictionary) has its entries processed in order, and on the first match the replacement is passed to the synthesizer and the processing for that "word/character cluster" stops.  If you had the entry for Roman numeral one first, it would snag Roman numerals 4, 3, and 2 incorrectly since all of them are composed of a collection of capital Is.

In addition, I am going to presume from your example that all of these Roman numeral, with possible optional letter, sequences must have a colon after the last character of the sequence with no space between the two.  If there is no colon then the match will not work, and that's by design, as I do not want the pronoun I to be captured as Roman numeral one.

If you want the word "Roman" or something else in front of the individual characters of the numeral before they're read out one by one then stick that in front of the first character of the replacement string.  I just went for the individual letters making up the numeral, along with the letter following it, if that letter is present.

The regular expressions all start with a backslash and end with a question mark.  The replacement strings all end with backslash one (the digit 1).  When working with the dictionary to add entries, the regular expression goes in the Pattern edit box, the replacement string in the Replacement box, and the Type radio button must be set to regular expression.

\s?IIII([a-z])?:\s?    I I I I \1

 

\s?III([a-z])?:\s?    I I I \1

 

\s?II([a-z])?:\s?    I I \1

 

\s?I([a-z])?:\s?    I \1

 

\s?IV([a-z])?:\s?    I V \1

 

\s?VIII([a-z])?:\s?    V I I I \1

 

\s?VII([a-z])?:\s?    V I I \1

 

\s?VI([a-z])?:\s?    V I \1

 

\s?V([a-z])?:\s?    V \1

 

\s?IX([a-z])?:\s?    I X \1

 

\s?X([a-z])?:\s?    X \1



--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Janet Brandly
 

Hi Brian,

 

Sorry for taking so long to respond. The letters following the Roman numerals, whether they are in lower-case or upper-case, would likely only need to go to E.

 

Thanks,

 

Janet

 

 

From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Brian Vogel
Sent: November 7, 2020 4:06 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

 

Janet,

        Thanks.  I'll post what you need to use a bit later this evening.  I will only ask this one time again, are the letters after the numeral limited to a certain set?  If not, I'll look for lowercase A through lowercase Z, but if it should only be A through F (or something similar) let me know.  That's very easy to tweak.

--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Luke Davis
 

On Sun, 8 Nov 2020, Brian Vogel wrote:

I'd have to say that I disagree with you about regular expressions not being confusing, and particularly to those who don't have a programming background
Okay, I was probably too cavalier in saying that.
They can be confusing, especially in the more advanced concepts.
I have been using them for nearly 25 years, and I still have to refresh myself on certain details from time to time.
But the basics of what they are doing does not have to be confusing. You can learn to pick them apart and understand more about what's going on then just seeing a pile of weird parentheticals and punctuations. You may not always understand the depths of some esoteric construction, but i believe the basics can be learned without a programming background, at least enough to use them for solving moderate problems.

Getting mad ninja skills with them takes years of familiarity, as does anything, even spoken languages.

with regular expressions is clearly fresher and better than mine is.As you know from my private e-mail to you, while I can figure out how one can capture
I don't know about that, but it may be more twisty. I started with writing and debugging expressions used in the very old days to fight email spam, before the big companies took over that business. Some of those expressions could be mind blowing, hair pulling, and thousands of characters long. I was very much thrown in the deep end and had to learn to swim or else.

the Roman numerals one through four, or five through eight, with a single "compact" regular expression pattern match I have no idea how one would then have
that match parsed out for replacement as individual letters.  If you want the individual letters, things get much more messy.
As far as I can tell, it can't be done except as we've done it. There is no cleaner way. Though I'd love to learn of one.

soon as Janet had noted that her needed range was the Roman numerals one through ten, I took the easy way out, and a way that I felt was "more
I never saw that, not having received her messages.

understandable" to the person who was going to used it.I only wish I could figure out a way to handle Roman numeral one reliably sans a delimiting colon.  I
can find no way to do that which doesn't require linguistic analysis rather than pattern matching.  The pronoun I is just so common which makes it a
nightmare.  You can pretty much count on a capital I followed by a verb being the pronoun, not a Roman numeral, but there is no way I know of to express
that in a regex.
Agreed. I suspect machine learning and neural nets would be the class of solutions preferred for that one.

It does beg the question though: why in the world are we still using Roman Numerals? We don't have to carve straight lines into rock in order to record our numbers any more, for Jupiter's sake. But that's a question for another day, and another list.

Luke


Luke Davis
 

On Sat, 7 Nov 2020, Brian Vogel wrote:

[Regarding my question about the use of \s?]
My thinking it that there can be no whitespace after the colon, or an instance of a single whitespace character, but not multiple whitespace characters. 
Definitely not the same as .* at all.
Are you sure? Just using egrep -e on my local machine, all of the following expressions will match the string "test":

.*st
te.*
\s?st
te\s?

I agree that one could probably use \b, but I was thinking "whitespace" and used whitespace matching.
I understand why whitespace might be interesting there. But searching for it as optional, without an anchor or pre/post text, is the same as not searching for it at all.

It is an old truism in regex building, that if your match potentially includes zero of something at the start or end of an expression, it may as well not be there unless there is an anchor.

For example, my tests above, rewritten as:

^\s?st
te\s?$

would fail as they should. But the naked zero space matches in the originals function exactly as a full wildcard (.*).

Luke


 

Janet,

            If you have not already entered those rules hold off on doing so for two reasons:

1.  I have just learned that dictionary processing does not terminate after the first successful match, but continues down the list, using the substituted text for subsequent processing each time a match is found, until the end of the list is reached.  So I need to consider the order again to make this successful.

2.  It's likely I can remove the trailing colon requirement for all but Roman numeral one, unless it works better to assume it's there, then that needn't be changed

The big problem here is really the Roman numeral one, and having some very clear way of differentiating it from the pronoun I, and that is not at all easy sans some other delimiter used with it to make that differentiation clear.

I'm not going to work on this further until I see a response from you, as it's clear that this is going to be an iterative process until we nail down precisely what will work based on the text you're generally processing, which looks to be medical in nature.
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


 

On Sun, Nov 8, 2020 at 01:20 PM, Luke Davis wrote:
The major problem with this approach is its lack of flexibility.
                Agreed.  Using fixed word patterns is to be avoided when it can reasonably be avoided because you always end up not catching things you want to catch and vice versa.   And even with regular expressions, that can often be the case, though more rarely, until you've actually encountered exception conditions you had never anticipated when writing them.

Using a regular expression based method will likely be much more efficient. Even if you don't understand them, and don't want to take a couple hours to learn them (they only look confusing, they aren't really), you can ask someone to make one for you as was done here.
I'd have to say that I disagree with you about regular expressions not being confusing, and particularly to those who don't have a programming background where they've had to deal with "abstract, multiple-possible patterns within a larger pattern," way of thinking.   It took me a very long time to wrap my head around the more complex and nuanced aspects of regular expressions, and I've never dealt with recursive ones at all.  The level of abstraction you start having to think about to craft regular expressions that are both pieces of art and often nightmarishly dense, even to the initiated, is cultivated slowly over time.  And lots of the typical matches are not expressed in the documentation in the way that they'd be talked about in typical conversation.  I was once "the regex maker" where I'd sit and have the person describe to me all (or as many as they knew of) the examples of what they wanted to catch, and when, then translating it to a regex.  That is a non-trivial task once you get beyond the most basic matching.  But you already know this, as your skill with regular expressions is clearly fresher and better than mine is.

As you know from my private e-mail to you, while I can figure out how one can capture the Roman numerals one through four, or five through eight, with a single "compact" regular expression pattern match I have no idea how one would then have that match parsed out for replacement as individual letters.  If you want the individual letters, things get much more messy.

By the way, I absolutely love your solution for the generic case of a Roman Numeral that's 9 "Roman Digits" or fewer long, optionally followed by a letter, followed by the colon.

But as soon as Janet had noted that her needed range was the Roman numerals one through ten, I took the easy way out, and a way that I felt was "more understandable" to the person who was going to used it.

I only wish I could figure out a way to handle Roman numeral one reliably sans a delimiting colon.  I can find no way to do that which doesn't require linguistic analysis rather than pattern matching.  The pronoun I is just so common which makes it a nightmare.  You can pretty much count on a capital I followed by a verb being the pronoun, not a Roman numeral, but there is no way I know of to express that in a regex.

And I apologize to those who feel plowed under by these discussions of regular expressions.  But since they are such a powerful feature of NVDA's pattern matching for the various dictionaries they do fall under the category of NVDA related and there may be readers who do want to know more about how they're used.  If not, delete these messages or mute a topic once it deep dives into something like this discussion.
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Luke Davis
 

The major problem with this approach is its lack of flexibility. You literally have to individually code every possible RN you might encounter so that it pronounces correctly. Otherwise, the first time you get an IXII you weren't expecting, the whole thing becomes ineffective.

A secondary problem is that you will have issues any time you deal with a numbered list in Word or wherever, which includes lettered subpoints.
A. This
B. Is
C. a
D. Test

Would come out as:

A. This
B. Is
100. a
D. Test

I believe, which would be disconcerting.

Using a regular expression based method will likely be much more efficient. Even if you don't understand them, and don't want to take a couple hours to learn them (they only look confusing, they aren't really), you can ask someone to make one for you as was done here.

Luke

On Sun, 8 Nov 2020, Gene wrote:

with a period afgter, the dictionary read iv as 4. This may be of considerable value for those who don't know how to work with regular expressions and want to make Roman numeral pronunciation rules that work properly. The only thing I can think of that shouldn't be placed in the dictionary is a single I and 1 in the pronounced as field. You would constantly hear I spoken as in One went to the store.


Gene
 

I said in an earlier message that the dictionary shouldn't work as I described. On further thought, it should. It doesn't know when moving by carachter by carachter if the letters are part of a word or not. If the dictionary didn't work this way, my method wouldn't work. The reason it does is that when the dictionary sees a Roman numeral, even if it is a single letter by itself on a line, or a single letter with a space on both or either sides, it thinks it is a word and that's why it pronounces it as a Roman numeral.

In short, the dictionary does what it is supposed to do, as far as I can see, and capitalizing the numerals and making them case sensative will, as I said mostly eliminate the problem.

Gene

-----Original Message-----
From: Gene
Sent: Sunday, November 08, 2020 6:33 AM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

I just changed the entries as I described and they work properly.

Gene
-----Original Message-----
From: Gene via groups.io
Sent: Sunday, November 08, 2020 6:12 AM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

I just found out a way in which the speech dicgtionary acts improperly and
causes a problem. While what I said is true and I just tested it by making
a lot of entries and having them read in a novel that uses Roman numerals, I
found that when I type and have speak characters on, when I type something
like the letter v or x, the Roman numeral is read. This shouldn't happen
because I'm using the whole word setting. I'm going to change the entries
to case sensative and change all my entries to capital letters. That should
eliminate most such unwanted behavior.

I also just found out that when I move character by character through a
word, letters are announced as Roman numerals. For example, in the word
having, if I move by letter through the word, the v is spoken as 5.

Again, making such dictionary entries capitalized and making them case
sensative, should eliminate most such behavior, not all unfortunately. If I
type the name Vicgtoria or Xavier, the capitalized letter will be announced
as a Roman numeral.

However, what I've discussed may be useful to some people and if desired, a
portable copy of NVDA can be used with such dictionary entries for reading..

Gene
-----Original Message-----
From: Gene
Sent: Sunday, November 08, 2020 5:55 AM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

I think I figured it out. When using the whole word setting, if I don't
include a space before and afgter the numeral, it works. I made an entry iv
and in the pronounced as field I placed 4. I didn't make it case sensative
because I wanted to test what the dictionary would do in general.

When it saw iv in a word such as exclusive, it read the word properly. When
it saw iv just as letters, whether they were at the beginning of a line with
a space after, in a sentence with a space before and after, or at the end
with a period afgter, the dictionary read iv as 4. This may be of
considerable value for those who don't know how to work with regular
expressions and want to make Roman numeral pronunciation rules that work
properly. The only thing I can think of that shouldn't be placed in the
dictionary is a single I and 1 in the pronounced as field. You would
constantly hear I spoken as in One went to the store.

Gene
-----Original Message-----
From: Gene
Sent: Saturday, November 07, 2020 5:41 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

It isn't clear to me after experimenting with the speech dictionary how the
whole word setting works. I had originally thought that those who wanted to
have something like Roman numerals spoken as standard numbers might need
another choice in the speech dictionary. There is now whole word and
anywhere as choices. As I thought about it, I thought that whole word
should work and that no other option would be necessary. If IV were seen as
a whole word and the dictionary spoke 4 when it saw IV with spaces on either
side, if you include spaces in the entry, that that would not result in
extraneoussspeaking of 4. So another choice in the list of radio buttons,
as Ioriginally suggested, wouldn't be needed.

I experimented with this and iv by itself with spaces in the pattern field
and 4 in the pronounced as field doesn't work. IV is still spoken as IV
when written in this way. what does NVDA consider a whole word? When I try
a word such as alive and use the whole word setting, that works. Perhaps
what NVDA sees as a whole word needs to be changed.

Since most people won't know how to work with regular expressions, the
ability to do this sort of thing using the whole word option might be
valuable.

Gene
-----Original Message-----
From: Brian Vogel
Sent: Saturday, November 07, 2020 5:01 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

On Sat, Nov 7, 2020 at 04:53 PM, Gene wrote:
This could be made much easier,-
What could be?

Everything you write after this is essentially what I proposed: using the
speech dictionary with regular expression matching to very strictly limit
what is captured and substituted.

One of the beauties of regular expressions is how they can be crafted to
catch only what you want.

--


Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041

It’s hard waking up and realizing it’s not always black and white.

~ Kelley Boorn


Gene
 

I just changed the entries as I described and they work properly.

Gene

-----Original Message-----
From: Gene via groups.io
Sent: Sunday, November 08, 2020 6:12 AM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

I just found out a way in which the speech dicgtionary acts improperly and
causes a problem. While what I said is true and I just tested it by making
a lot of entries and having them read in a novel that uses Roman numerals, I
found that when I type and have speak characters on, when I type something
like the letter v or x, the Roman numeral is read. This shouldn't happen
because I'm using the whole word setting. I'm going to change the entries
to case sensative and change all my entries to capital letters. That should
eliminate most such unwanted behavior.

I also just found out that when I move character by character through a
word, letters are announced as Roman numerals. For example, in the word
having, if I move by letter through the word, the v is spoken as 5.

Again, making such dictionary entries capitalized and making them case
sensative, should eliminate most such behavior, not all unfortunately. If I
type the name Vicgtoria or Xavier, the capitalized letter will be announced
as a Roman numeral.

However, what I've discussed may be useful to some people and if desired, a
portable copy of NVDA can be used with such dictionary entries for reading..

Gene
-----Original Message-----
From: Gene
Sent: Sunday, November 08, 2020 5:55 AM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

I think I figured it out. When using the whole word setting, if I don't
include a space before and afgter the numeral, it works. I made an entry iv
and in the pronounced as field I placed 4. I didn't make it case sensative
because I wanted to test what the dictionary would do in general.

When it saw iv in a word such as exclusive, it read the word properly. When
it saw iv just as letters, whether they were at the beginning of a line with
a space after, in a sentence with a space before and after, or at the end
with a period afgter, the dictionary read iv as 4. This may be of
considerable value for those who don't know how to work with regular
expressions and want to make Roman numeral pronunciation rules that work
properly. The only thing I can think of that shouldn't be placed in the
dictionary is a single I and 1 in the pronounced as field. You would
constantly hear I spoken as in One went to the store.

Gene
-----Original Message-----
From: Gene
Sent: Saturday, November 07, 2020 5:41 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

It isn't clear to me after experimenting with the speech dictionary how the
whole word setting works. I had originally thought that those who wanted to
have something like Roman numerals spoken as standard numbers might need
another choice in the speech dictionary. There is now whole word and
anywhere as choices. As I thought about it, I thought that whole word
should work and that no other option would be necessary. If IV were seen as
a whole word and the dictionary spoke 4 when it saw IV with spaces on either
side, if you include spaces in the entry, that that would not result in
extraneoussspeaking of 4. So another choice in the list of radio buttons,
as Ioriginally suggested, wouldn't be needed.

I experimented with this and iv by itself with spaces in the pattern field
and 4 in the pronounced as field doesn't work. IV is still spoken as IV
when written in this way. what does NVDA consider a whole word? When I try
a word such as alive and use the whole word setting, that works. Perhaps
what NVDA sees as a whole word needs to be changed.

Since most people won't know how to work with regular expressions, the
ability to do this sort of thing using the whole word option might be
valuable.

Gene
-----Original Message-----
From: Brian Vogel
Sent: Saturday, November 07, 2020 5:01 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

On Sat, Nov 7, 2020 at 04:53 PM, Gene wrote:
This could be made much easier,-
What could be?

Everything you write after this is essentially what I proposed: using the
speech dictionary with regular expression matching to very strictly limit
what is captured and substituted.

One of the beauties of regular expressions is how they can be crafted to
catch only what you want.

--


Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041

It’s hard waking up and realizing it’s not always black and white.

~ Kelley Boorn


Gene
 

I just found out a way in which the speech dicgtionary acts improperly and causes a problem. While what I said is true and I just tested it by making a lot of entries and having them read in a novel that uses Roman numerals, I found that when I type and have speak characters on, when I type something like the letter v or x, the Roman numeral is read. This shouldn't happen because I'm using the whole word setting. I'm going to change the entries to case sensative and change all my entries to capital letters. That should eliminate most such unwanted behavior.

I also just found out that when I move character by character through a word, letters are announced as Roman numerals. For example, in the word having, if I move by letter through the word, the v is spoken as 5.

Again, making such dictionary entries capitalized and making them case sensative, should eliminate most such behavior, not all unfortunately. If I type the name Vicgtoria or Xavier, the capitalized letter will be announced as a Roman numeral.

However, what I've discussed may be useful to some people and if desired, a portable copy of NVDA can be used with such dictionary entries for reading..

Gene

-----Original Message-----
From: Gene
Sent: Sunday, November 08, 2020 5:55 AM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

I think I figured it out. When using the whole word setting, if I don't
include a space before and afgter the numeral, it works. I made an entry iv
and in the pronounced as field I placed 4. I didn't make it case sensative
because I wanted to test what the dictionary would do in general.

When it saw iv in a word such as exclusive, it read the word properly. When
it saw iv just as letters, whether they were at the beginning of a line with
a space after, in a sentence with a space before and after, or at the end
with a period afgter, the dictionary read iv as 4. This may be of
considerable value for those who don't know how to work with regular
expressions and want to make Roman numeral pronunciation rules that work
properly. The only thing I can think of that shouldn't be placed in the
dictionary is a single I and 1 in the pronounced as field. You would
constantly hear I spoken as in One went to the store.

Gene
-----Original Message-----
From: Gene
Sent: Saturday, November 07, 2020 5:41 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

It isn't clear to me after experimenting with the speech dictionary how the
whole word setting works. I had originally thought that those who wanted to
have something like Roman numerals spoken as standard numbers might need
another choice in the speech dictionary. There is now whole word and
anywhere as choices. As I thought about it, I thought that whole word
should work and that no other option would be necessary. If IV were seen as
a whole word and the dictionary spoke 4 when it saw IV with spaces on either
side, if you include spaces in the entry, that that would not result in
extraneoussspeaking of 4. So another choice in the list of radio buttons,
as Ioriginally suggested, wouldn't be needed.

I experimented with this and iv by itself with spaces in the pattern field
and 4 in the pronounced as field doesn't work. IV is still spoken as IV
when written in this way. what does NVDA consider a whole word? When I try
a word such as alive and use the whole word setting, that works. Perhaps
what NVDA sees as a whole word needs to be changed.

Since most people won't know how to work with regular expressions, the
ability to do this sort of thing using the whole word option might be
valuable.

Gene
-----Original Message-----
From: Brian Vogel
Sent: Saturday, November 07, 2020 5:01 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

On Sat, Nov 7, 2020 at 04:53 PM, Gene wrote:
This could be made much easier,-
What could be?

Everything you write after this is essentially what I proposed: using the
speech dictionary with regular expression matching to very strictly limit
what is captured and substituted.

One of the beauties of regular expressions is how they can be crafted to
catch only what you want.

--


Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041

It’s hard waking up and realizing it’s not always black and white.

~ Kelley Boorn


Sean Randall
 

the problom with this approach is that other letters than I would lose their meaning, such as x, v, c and so on.

whether NVDA should pronounce Iii as "3` or "I I I" is a matter for the synthesizer and user though, surely we shouldn't proscribe that level of detail for users in the general case.

On 8 Nov 2020, at 11:55, Gene <gsasner@gmail.com> wrote:

I think I figured it out. When using the whole word setting, if I don't include a space before and afgter the numeral, it works. I made an entry iv and in the pronounced as field I placed 4. I didn't make it case sensative because I wanted to test what the dictionary would do in general.

When it saw iv in a word such as exclusive, it read the word properly. When it saw iv just as letters, whether they were at the beginning of a line with a space after, in a sentence with a space before and after, or at the end with a period afgter, the dictionary read iv as 4. This may be of considerable value for those who don't know how to work with regular expressions and want to make Roman numeral pronunciation rules that work properly. The only thing I can think of that shouldn't be placed in the dictionary is a single I and 1 in the pronounced as field. You would constantly hear I spoken as in One went to the store.

Gene
-----Original Message----- From: Gene
Sent: Saturday, November 07, 2020 5:41 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

It isn't clear to me after experimenting with the speech dictionary how the
whole word setting works. I had originally thought that those who wanted to
have something like Roman numerals spoken as standard numbers might need
another choice in the speech dictionary. There is now whole word and
anywhere as choices. As I thought about it, I thought that whole word
should work and that no other option would be necessary. If IV were seen as
a whole word and the dictionary spoke 4 when it saw IV with spaces on either
side, if you include spaces in the entry, that that would not result in
extraneoussspeaking of 4. So another choice in the list of radio buttons,
as Ioriginally suggested, wouldn't be needed.

I experimented with this and iv by itself with spaces in the pattern field
and 4 in the pronounced as field doesn't work. IV is still spoken as IV
when written in this way. what does NVDA consider a whole word? When I try
a word such as alive and use the whole word setting, that works. Perhaps
what NVDA sees as a whole word needs to be changed.

Since most people won't know how to work with regular expressions, the
ability to do this sort of thing using the whole word option might be
valuable.

Gene
-----Original Message----- From: Brian Vogel
Sent: Saturday, November 07, 2020 5:01 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

On Sat, Nov 7, 2020 at 04:53 PM, Gene wrote:
This could be made much easier,-
What could be?

Everything you write after this is essentially what I proposed: using the
speech dictionary with regular expression matching to very strictly limit
what is captured and substituted.

One of the beauties of regular expressions is how they can be crafted to
catch only what you want.

--


Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041

It’s hard waking up and realizing it’s not always black and white.

~ Kelley Boorn









[Recycle trees] Save a tree...please don't print this e-mail unless you really need to.

Confidentiality Notice
This message and any attachments are private and confidential and may be subject to legal privilege and copyright. If you are not the intended recipient please do not publish or copy it to anyone else. If you have received this message in error please notify the sender immediately by using the reply facility in your email software and then remove it from your system.

Data Protection
We comply with data protection legislation, including the General Data Protection Regulation (GDPR), and take the security and privacy of personal data very seriously. If you no longer wish to receive emails from us please forward this email (so we can see who it was sent to you by) to dpo@ncw.co.uk<mailto:dpo@ncw.co.uk> with your request, and we will review our information in line with your wishes.

Disclaimer
Although this email and attachments have been scanned for viruses, New College Worcester accepts no liability for any loss or damage arising from the receipt or use of this communication.


Gene
 

I think I figured it out. When using the whole word setting, if I don't include a space before and afgter the numeral, it works. I made an entry iv and in the pronounced as field I placed 4. I didn't make it case sensative because I wanted to test what the dictionary would do in general.

When it saw iv in a word such as exclusive, it read the word properly. When it saw iv just as letters, whether they were at the beginning of a line with a space after, in a sentence with a space before and after, or at the end with a period afgter, the dictionary read iv as 4. This may be of considerable value for those who don't know how to work with regular expressions and want to make Roman numeral pronunciation rules that work properly. The only thing I can think of that shouldn't be placed in the dictionary is a single I and 1 in the pronounced as field. You would constantly hear I spoken as in One went to the store.

Gene

-----Original Message-----
From: Gene
Sent: Saturday, November 07, 2020 5:41 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

It isn't clear to me after experimenting with the speech dictionary how the
whole word setting works. I had originally thought that those who wanted to
have something like Roman numerals spoken as standard numbers might need
another choice in the speech dictionary. There is now whole word and
anywhere as choices. As I thought about it, I thought that whole word
should work and that no other option would be necessary. If IV were seen as
a whole word and the dictionary spoke 4 when it saw IV with spaces on either
side, if you include spaces in the entry, that that would not result in
extraneoussspeaking of 4. So another choice in the list of radio buttons,
as Ioriginally suggested, wouldn't be needed.

I experimented with this and iv by itself with spaces in the pattern field
and 4 in the pronounced as field doesn't work. IV is still spoken as IV
when written in this way. what does NVDA consider a whole word? When I try
a word such as alive and use the whole word setting, that works. Perhaps
what NVDA sees as a whole word needs to be changed.

Since most people won't know how to work with regular expressions, the
ability to do this sort of thing using the whole word option might be
valuable.

Gene
-----Original Message-----
From: Brian Vogel
Sent: Saturday, November 07, 2020 5:01 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] How to spell out Roman numerals

On Sat, Nov 7, 2020 at 04:53 PM, Gene wrote:
This could be made much easier,-
What could be?

Everything you write after this is essentially what I proposed: using the
speech dictionary with regular expression matching to very strictly limit
what is captured and substituted.

One of the beauties of regular expressions is how they can be crafted to
catch only what you want.

--


Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041

It’s hard waking up and realizing it’s not always black and white.

~ Kelley Boorn


Luke Davis
 

An alternative to Brian's method, might be something longish like the following. Although again I don't fully understand the issue, not having gotten Janet's messages, so it might fail the use case after all.

I spent about an hour trying to figure out some more elegant way of doing this, and couldn't come up with anything shorter than the below. Brian's method is probably easier to understand, although this cuts and pastes as a single entry, so i guess it has that going for it. :)

The idea below is to match, at the start of any word, any RN between one and nine characters long, and additionally to match one optional subsequent non RN character, and a required final colon. That was what I understood from Brian's messages anyway.

Match type: regular expression
Case sensitive: yes
Pattern:

\b([MCLXVI])([MCLXVI])?([MCLXVI])?([MCLXVI])?([MCLXVI])?([MCLXVI])?([MCLXVI])?([MCLXVI])?([MCLXVI])?([a-zA-Z])?(?=:)

Replacement:

\1 \2 \3 \4 \5 \6 \7 \8 \9 \10

I tested a version of this in a temporary dictionary, and it appeared to work.

The weird construct for the colon at the end, is because it's punctuation. I don't know when NVDA applies punctuation processing to this chain of dictionaries, and so I thought it better to make sure the colon was there, but let it actually be processed by normal rules with a forward reference. I did not test that part in the temp dictionary, as I only just thought of it. If this fails, try replacing "(?=:)" with just ":", and put a colon at the end of the replacement string as well.

Luke


 

My thinking it that there can be no whitespace after the colon, or an instance of a single whitespace character, but not multiple whitespace characters.  Definitely not the same as .* at all.

I agree that one could probably use \b, but I was thinking "whitespace" and used whitespace matching.  And remember whitespace is not just a space, but includes space, tab stop and line break.

Also, I sometimes change my mind about what I'm going to capture, and \b is non-capturing.

There's a reason I have said, repeatedly, that I am doing "quick and dirty" to get the result I'm looking for.  It's entirely possible, nay, probable, that certain of my regexes could be expressed more elegantly.  If it works on the tests I'm running, as I expect it to, it's "good enough."
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Luke Davis
 

Brian

I'm not receiving Janet's messages for some reason, so I'm not sure of every detail of her requirement for this, but I am left with a question.

What is the \s? at each end doing?
I mean obviously it is looking for zero or one space characters, but why?

If you can have zero space characters, that means you can have any character there, including a space character, since the space matching is un-anchored.
In fact, it is the same as \s*, for the same reason. (Or, possibly even the same as .*)

So I think the expression should work identically with or without the "\s?", although I could understand a "\b".

What am I missing?

Luke


 

Janet,

          A second quick addendum, I just realized that what I've given so far may not work for cancer staging, as I presume that would be the word "stage" followed by the Roman numerals one through four, depending on which stage.

          We can create four more patters specific to the word stage preceding the Roman numerals one through four.  Luckily, you won't have to worry about reordering, as the prior matches all require a colon, and would all fail for the Roman numerals one through four that don't have a colon immediately following.
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


 

Janet,

           A quick addendum, I don't know how the letter A following a Roman numeral will end up being pronounced, as that's based on the synthesizer, so you may get ah or you may get A.  I just can't be sure.  I think letters B through Z are more likely to be read as the character itself.
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn