Does the Case Sensitive checkbox have any interaction with pattern matching when a Regex is used?


Dave Grossoehme
 

Hi:  If's been years since I dealt with this as a programmer. However, if I remember correctly upper case is first then lower case is second.  If you don't define in your expression, and you have a S then s does the action work in how the default of the program for expressions are unless there is a change put into your written expression?

Dave

On 1/14/2023 2:20 AM, Luke Davis wrote:
On Jan 13, Brian Vogel wrote:

The regex processing as currently implemented by default also makes ([A-Z]) actually capture ([A-Z,a-z]) or ([A-z]), so it's no surprise that I had
gibberish, as it is capturing all kinds of letters I don't intend it to unless I turn case sensitivity on.Just plain dumb in the regex world.
True, but as pointed out: every tool in the Unix world (where these were developed), that does regex matching, provides an "ignore case" switch that will do this same thing.

Though it's never by default, and I agree that it is not what anyone using character classes would expect to be the default, unless they notice the checkbox.

If you put in a feature request to have a warning pop-up when using regular expressions and case sensitivity is off, I will implement the feature when I get the chance, assuming the devs don't object.

The main objection will likely be: anyone using regular expressions is technically savvy enough to pay attention to how their checkboxes are set.
If you do put in a feature request, I suggest you address that up front.

Luke




Luke Davis
 

Brian Vogel wrote:

-In addition, I think that checkbox should be moved to be the very last item in the dialog.  That way, no matter what you may have chosen as your matching
type, you'd naturally review its case sensitivity last.
I like this idea.

I suspect the current expected use case is to just have users press enter when they've entered the minimum needed data to get the replacement to work, and so changing case sensitivity was thought more important than match type.
Moving it to the end would require going there, and eliminate this expediency.

Just a guess, though.

Luke


 

On Sat, Jan 14, 2023 at 10:07 AM, Brian Vogel wrote:
The environment should support typical expectations of those who write regular expressions.  Since one has to specify insensitive (and, believe it or not, I have never done so even though I know that option does exist) under all other circumstances, that should be mimicked as the default when Regular Expression is used.
-
In addition, I think that checkbox should be moved to be the very last item in the dialog.  That way, no matter what you may have chosen as your matching type, you'd naturally review its case sensitivity last.

And I think that the default state for anywhere and word should be case insensitive (checkbox unchecked) and for regex sensitive (checkbox checked) and that box state would be toggled to either one of those defaults based upon the match type choice you made.
--

Brian Virginia, USA Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit

It is much easier to be critical than to be correct.

       ~ Benjamin Disraeli, 1804-1881


 

On Sat, Jan 14, 2023 at 08:16 AM, Travis Siegel wrote:

Regexp is something I've always had a hard time with as well, and I've been doing the programming thing since the mid 80s, so don't feel bad.

 

-
Absolutely.  You have to have a somewhat twisted mind to wrap your head around the syntax, and particularly the kind of syntax that allows you to catch all sorts of variants and where conditionals are used.

A simple character class is not too terribly difficult, but you can (and often do) get into the weeds very quickly.  And even skilled regex writers do, too.
--

Brian Virginia, USA Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit

It is much easier to be critical than to be correct.

       ~ Benjamin Disraeli, 1804-1881


 

On Sat, Jan 14, 2023 at 02:21 AM, Luke Davis wrote:
True, but as pointed out: every tool in the Unix world (where these were developed), that does regex matching, provides an "ignore case" switch that will do this same thing.

Though it's never by default, and I agree that it is not what anyone using character classes would expect to be the default, unless they notice the checkbox.
-
Luke, 

Even in Python it's not case insensitive by default.

I noticed that checkbox, and always have.  And I made the entirely reasonable assumption that it had zero effect when using a regular expression because those of us who use them construct most of them based on matching exactly what we're looking for, case included.

It absolutely floors me that selecting the Regular Expression radio button does not automatically cause that checkbox to be checked as well as the default action.  No one in their right mind who uses regular expressions regularly (in all senses of that adverb) would presume case insensitivity as the default - ever.

The environment should support typical expectations of those who write regular expressions.  Since one has to specify insensitive (and, believe it or not, I have never done so even though I know that option does exist) under all other circumstances, that should be mimicked as the default when Regular Expression is used.
--

Brian Virginia, USA Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit

It is much easier to be critical than to be correct.

       ~ Benjamin Disraeli, 1804-1881


 

On Sat, Jan 14, 2023 at 04:34 AM, Tara Roys wrote:
And you can tell when I wrote my ‘here’s my janky hack to speak proper nouns out loud’ post, I didn’t even remember to add the ‘you need to check ‘case sensitive’ box to make this work’ because it’s the sort of detail that slipped my mind.  It’s a very mind-slipping detail.
-
As well it would be.  As I've said, and Cyrille said, under any "normal" circumstance a regular expression is case sensitive by default, and you have to take special measures to make it insensitive.

Your original regex, ([A-Z]), written anywhere but in the NVDA dictionary, would ONLY capture capital letters, because that's the exact range specified.  Most regex writers who wanted lowercase and uppercase would write, ([a-z,A-Z]), because no one writing what you did would presume a default state of case insensitivity.

You learn to write these things with case sensitivity in mind.

Until we had this discussion, I would never, in a million years, have thought that a well constructed regex that uses case to carefully specify exactly what should be captured would be rendered case insensitive by default when NVDA dictionary processing uses it.

It's the exact opposite of what those who use this tool regularly, virtually anywhere else, would expect.
--

Brian Virginia, USA Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit

It is much easier to be critical than to be correct.

       ~ Benjamin Disraeli, 1804-1881


mike mcglashon
 

Quoting:
The main objection will likely be: anyone using regular expressions is
technically savvy enough to pay attention to how their checkboxes are set.
End quote:
I don't know much here,
But the words regular expression as they apply here don't seem very regular
at all do they?



Please advise as you like.

Mike M.

Mike mcglashon
Email: Michael.mcglashon@...
Ph: 618 783 9331

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Luke Davis
Sent: Saturday, January 14, 2023 1:21 AM
To: nvda@nvda.groups.io
Subject: Re: [nvda] Does the Case Sensitive checkbox have any interaction
with pattern matching when a Regex is used?

On Jan 13, Brian Vogel wrote:

The regex processing as currently implemented by default also makes
([A-Z]) actually capture ([A-Z,a-z]) or ([A-z]), so it's no surprise that
I had gibberish, as it is capturing all kinds of letters I don't intend it
to unless I turn case sensitivity on.Just plain dumb in the regex world.

True, but as pointed out: every tool in the Unix world (where these were
developed), that does regex matching, provides an "ignore case" switch that
will do this same thing.

Though it's never by default, and I agree that it is not what anyone using
character classes would expect to be the default, unless they notice the
checkbox.

If you put in a feature request to have a warning pop-up when using regular
expressions and case sensitivity is off, I will implement the feature when I
get the chance, assuming the devs don't object.

The main objection will likely be: anyone using regular expressions is
technically savvy enough to pay attention to how their checkboxes are set.
If you do put in a feature request, I suggest you address that up front.

Luke


Travis Siegel
 

Tara, you did specify that the case sensitive box needed to be checked.  That's why I got it to work the first time I tried it.  Later, I went back and didn't check the box, just to see what happened, and that created quite the mess.  Had to use narrator to get out of that one.

Regexp is something I've always had a hard time with as well, and I've been doing the programming thing since the mid 80s, so don't feel bad.


On 1/14/2023 4:34 AM, Tara Roys wrote:

…I am very grudgingly ‘tech savvy enough’ to deal with regexes.  Barely.  Because they break my brain and the only reason I learned them was to be able to coerce Nvda into telling me if a word is capitalized when reading text.  

That’s why I posses most of my tech skills, to be honest- I can’t force the software to do what I need it to do until I pick up all sorts of low level skills like regexs. 

The less unexpected brain breaking things I have to learn, the better- and a wierd checkbox that is sort of hanging out in the middle of the dialog, just begging to be skipped by a grumpy user in a hurry to make things work…that’s why I would check that box by default, because the first regrx I made captured EVERY letter and not just the capitals and barred out gibberish. 

And you can tell when I wrote my ‘here’s my janky hack to speak proper nouns out loud’ post, I didn’t even remember to add the ‘you need to check ‘case sensitive’ box to make this work’ because it’s the sort of detail that slipped my mind.  It’s a very mind-slipping detail.



-Tara

On Sat, Jan 14, 2023 at 1:21 AM Luke Davis <luke@...> wrote:
On Jan 13, Brian Vogel wrote:

> The regex processing as currently implemented by default also makes ([A-Z]) actually capture ([A-Z,a-z]) or ([A-z]), so it's no surprise that I had
> gibberish, as it is capturing all kinds of letters I don't intend it to unless I turn case sensitivity on.Just plain dumb in the regex world.

True, but as pointed out: every tool in the Unix world (where these were
developed), that does regex matching, provides an "ignore case" switch that will
do this same thing.

Though it's never by default, and I agree that it is not what anyone using
character classes would expect to be the default, unless they notice the
checkbox.

If you put in a feature request to have a warning pop-up when using regular
expressions and case sensitivity is off, I will implement the feature when I get the chance,
assuming the devs don't object.

The main objection will likely be: anyone using regular expressions is
technically savvy enough to pay attention to how their checkboxes are set.
If you do put in a feature request, I suggest you address that up front.

Luke






Tara Roys
 

…I am very grudgingly ‘tech savvy enough’ to deal with regexes.  Barely.  Because they break my brain and the only reason I learned them was to be able to coerce Nvda into telling me if a word is capitalized when reading text.  

That’s why I posses most of my tech skills, to be honest- I can’t force the software to do what I need it to do until I pick up all sorts of low level skills like regexs. 

The less unexpected brain breaking things I have to learn, the better- and a wierd checkbox that is sort of hanging out in the middle of the dialog, just begging to be skipped by a grumpy user in a hurry to make things work…that’s why I would check that box by default, because the first regrx I made captured EVERY letter and not just the capitals and barred out gibberish. 

And you can tell when I wrote my ‘here’s my janky hack to speak proper nouns out loud’ post, I didn’t even remember to add the ‘you need to check ‘case sensitive’ box to make this work’ because it’s the sort of detail that slipped my mind.  It’s a very mind-slipping detail.

On Sat, Jan 14, 2023 at 1:21 AM Luke Davis <luke@...> wrote:
On Jan 13, Brian Vogel wrote:

> The regex processing as currently implemented by default also makes ([A-Z]) actually capture ([A-Z,a-z]) or ([A-z]), so it's no surprise that I had
> gibberish, as it is capturing all kinds of letters I don't intend it to unless I turn case sensitivity on.Just plain dumb in the regex world.

True, but as pointed out: every tool in the Unix world (where these were
developed), that does regex matching, provides an "ignore case" switch that will
do this same thing.

Though it's never by default, and I agree that it is not what anyone using
character classes would expect to be the default, unless they notice the
checkbox.

If you put in a feature request to have a warning pop-up when using regular
expressions and case sensitivity is off, I will implement the feature when I get the chance,
assuming the devs don't object.

The main objection will likely be: anyone using regular expressions is
technically savvy enough to pay attention to how their checkboxes are set.
If you do put in a feature request, I suggest you address that up front.

Luke






Luke Davis
 

On Jan 13, Brian Vogel wrote:

The regex processing as currently implemented by default also makes ([A-Z]) actually capture ([A-Z,a-z]) or ([A-z]), so it's no surprise that I had
gibberish, as it is capturing all kinds of letters I don't intend it to unless I turn case sensitivity on.Just plain dumb in the regex world.
True, but as pointed out: every tool in the Unix world (where these were developed), that does regex matching, provides an "ignore case" switch that will do this same thing.

Though it's never by default, and I agree that it is not what anyone using character classes would expect to be the default, unless they notice the checkbox.

If you put in a feature request to have a warning pop-up when using regular expressions and case sensitivity is off, I will implement the feature when I get the chance, assuming the devs don't object.

The main objection will likely be: anyone using regular expressions is technically savvy enough to pay attention to how their checkboxes are set.
If you do put in a feature request, I suggest you address that up front.

Luke


 

The regex processing as currently implemented by default also makes ([A-Z]) actually capture ([A-Z,a-z]) or ([A-z]), so it's no surprise that I had gibberish, as it is capturing all kinds of letters I don't intend it to unless I turn case sensitivity on.

Just plain dumb in the regex world.
--

Brian Virginia, USA Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit

It is much easier to be critical than to be correct.

       ~ Benjamin Disraeli, 1804-1881


 

Gentlemen,

Thank you for this information.  In all my years using regular expressions, case sensitivity based on the content of the expression was ON by default, and that was across a number of mostly Unix or Unix-like (Linux) contexts.  Cyrille pointed out that in standard regexps, case sensitivity is on by default.

Were I "the one in charge here" I would automatically make case sensitivity default to on if the regular expression radio button is selected, as this makes the regular expression behave as almost anyone who's ever written one, anywhere, would expect it to.

The way it works now is confusing, and violates reasonable preconceptions of anyone who has ever used regexes routinely.  It is, as the colloquialism goes, ass backwards.
--

Brian Virginia, USA Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit

It is much easier to be critical than to be correct.

       ~ Benjamin Disraeli, 1804-1881


Cyrille
 

Hi Brian

In Python case sensitivity has always be an option for regexps with the re.I (or re.IGNORECASE. And I guess that such "ignore case" was also available in Unix world, at least for the grep command which can take the "i" flag to be case unsensitive.

What may have confused you with NVDA dialog box is that by default the search is case unsensitive, i.e. the "Match case" checkbox is not checked. On the contrary, in standard regexps, the search/match/substitution is case sensitive by default and you should add a flag to indicate that it is case unsensitive; and you may have worked with regexps without using them at all.

Cheers,

Cyrille


On Fri, Jan 13, 2023 at 05:51 AM, Brian Vogel wrote:
Tyler,

Using your example of testing, which I should have done, I am using this sentence as the test sentence:
I have no idea why there is No one who can say NO and stick to it.

The first no is lowercase, second capital N lowercase O, and last uppercase.  My expectation, prior to any testing using the regex:
[Nn]o
as the matching pattern with 
Yes
being the replacement, that what NVDA should say when reading that sentence would be: I have yes idea why there is yes one who can say NO and stick to it.
Checking the "case sensitive" checkbox should have zero effect, as this should not be applied when a regex pattern is used.

But when I do use exactly that regex, all three "no"s in that sentence get matched when case sensitivity is NOT checked.  Yet the regex itself should NOT match an all uppercase NO.

This is freakin' confusing to someone who has a very long history indeed with regular expressions.  A regular expression pattern match should use precisely what is specified by that regular expression, nothing more, nothing less.  If I had wanted to capture and substitute all three cases the regex would have been [Nn][oO], and that should capture nO (lowercase N with uppercase O), too, but the original one should not.
--

Brian Virginia, USA Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit

It is much easier to be critical than to be correct.

       ~ Benjamin Disraeli, 1804-1881


Travis Siegel
 

In this case, the case sensitive setting makes all the difference.

Apparently, the voices capitalize everything when they send it to the synth (no clue why), so if you don't put the check on the case sensitive option, you're definitely going to get garbage.  If that's checked, you should get the initially discussed feedback.


On 1/12/2023 11:16 PM, Brian Vogel wrote:

In all my years using Regular Expressions, I never use any case sensitivity indicatior such as the checkbox when creating NVDA dictionary entries, because the structure of the regex should be handling that itself.

But, I noticed in the SayCap.dic file its creator has case sensitivity turned on, even though the regular expressions do handle case sensitivity as written.

I don't know anything about how NVDA may handle a checked case sensitive box with a regular expression match.  I would think (hope, fervently) that it would be ignored if checked for a regular expression match, but I have learned the hard way that very peculiar things can happen at times with the NVDA dictionaries.

I suspect that the fact that the dictionary processing does not drop out when a match and substitution is initially made, but keeps passing the replacement string "down the line" to be checked against subsequent entries, may be causing some of my "gibberish" issues.  But I have no idea how/whether/if the case sensitive option could have any impact whatsoever in processing if a regex match is used.
--

Brian Virginia, USA Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit

It is much easier to be critical than to be correct.

       ~ Benjamin Disraeli, 1804-1881


 

Tyler,

Using your example of testing, which I should have done, I am using this sentence as the test sentence:
I have no idea why there is No one who can say NO and stick to it.

The first no is lowercase, second capital N lowercase O, and last uppercase.  My expectation, prior to any testing using the regex:
[Nn]o
as the matching pattern with 
Yes
being the replacement, that what NVDA should say when reading that sentence would be: I have yes idea why there is yes one who can say NO and stick to it.
Checking the "case sensitive" checkbox should have zero effect, as this should not be applied when a regex pattern is used.

But when I do use exactly that regex, all three "no"s in that sentence get matched when case sensitivity is NOT checked.  Yet the regex itself should NOT match an all uppercase NO.

This is freakin' confusing to someone who has a very long history indeed with regular expressions.  A regular expression pattern match should use precisely what is specified by that regular expression, nothing more, nothing less.  If I had wanted to capture and substitute all three cases the regex would have been [Nn][oO], and that should capture nO (lowercase N with uppercase O), too, but the original one should not.
--

Brian Virginia, USA Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit

It is much easier to be critical than to be correct.

       ~ Benjamin Disraeli, 1804-1881


Tyler Spivey
 

This is easy enough to test:
Pattern: \btest\b; Replacement: testing; case: off; Type: Regular expression
Results: test = Testing, Test = Testing
With case on, results: test = testing, Test = test

On 1/12/2023 8:16 PM, Brian Vogel wrote:
In all my years using Regular Expressions, I never use any case sensitivity indicatior such as the checkbox when creating NVDA dictionary entries, because the structure of the regex should be handling that itself.
But, I noticed in the SayCap.dic file its creator has case sensitivity turned on, even though the regular expressions do handle case sensitivity as written.
I don't know anything about how NVDA may handle a checked case sensitive box with a regular expression match.  I would think (hope, fervently) that it would be ignored if checked for a regular expression match, but I have learned the hard way that very peculiar things can happen at times with the NVDA dictionaries.
I suspect that the fact that the dictionary processing does not drop out when a match and substitution is initially made, but keeps passing the replacement string "down the line" to be checked against subsequent entries, may be causing some of my "gibberish" issues.  But I have no idea how/whether/if the case sensitive option could have any impact whatsoever in processing if a regex match is used.
--
Brian - Virginia, USA- Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit
*/It is much easier to be critical than to be correct./*
       ~ Benjamin Disraeli, 1804-1881


 

In all my years using Regular Expressions, I never use any case sensitivity indicatior such as the checkbox when creating NVDA dictionary entries, because the structure of the regex should be handling that itself.

But, I noticed in the SayCap.dic file its creator has case sensitivity turned on, even though the regular expressions do handle case sensitivity as written.

I don't know anything about how NVDA may handle a checked case sensitive box with a regular expression match.  I would think (hope, fervently) that it would be ignored if checked for a regular expression match, but I have learned the hard way that very peculiar things can happen at times with the NVDA dictionaries.

I suspect that the fact that the dictionary processing does not drop out when a match and substitution is initially made, but keeps passing the replacement string "down the line" to be checked against subsequent entries, may be causing some of my "gibberish" issues.  But I have no idea how/whether/if the case sensitive option could have any impact whatsoever in processing if a regex match is used.
--

Brian Virginia, USA Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit

It is much easier to be critical than to be correct.

       ~ Benjamin Disraeli, 1804-1881