Will NVDA eventually use AI for better GUI recognition?


Luke Robinett <blindgroupsluke@...>
 

So one thing I enjoy about VoiceOver on my iPhone is it has gotten really good at using UI to make otherwise inaccessible UI elements available to interact with. More than just simple OCR, it can ascertain the layout and make educated guesses about controls like buttons and tabs, greatly expanding the usability of apps that otherwise would be partially or totally inaccessible.

Is there any chance NVDA will eventually reach that level of sophistication? I know there are third party add-ons that attempt to bridge that gap for specific types of apps, for example the great Sibiac add-on which helps make certain music production apps and plugins accessible with NVDA, but it would be great to see these capabilities broadened and rolled into the core functionality of the product.

 

Thanks,

Luke

 


William
 

That is good idea, and also I like voiceover the ability to label inaccessible element myself.


Luke Robinett 於 12/1/2021 7:48 寫道:

So one thing I enjoy about VoiceOver on my iPhone is it has gotten really good at using UI to make otherwise inaccessible UI elements available to interact with. More than just simple OCR, it can ascertain the layout and make educated guesses about controls like buttons and tabs, greatly expanding the usability of apps that otherwise would be partially or totally inaccessible.

Is there any chance NVDA will eventually reach that level of sophistication? I know there are third party add-ons that attempt to bridge that gap for specific types of apps, for example the great Sibiac add-on which helps make certain music production apps and plugins accessible with NVDA, but it would be great to see these capabilities broadened and rolled into the core functionality of the product.

 

Thanks,

Luke

 


 

That is actually a great idea, I can imagine this capability being used to make PC games more accessible with NVDA.


Rob Hudson <rob_hudson3182@...>
 

Is that capability even available in python.

----- Original Message -----
From: "Supanut Leepaisomboon" <supanut2000@...>
To: nvda@nvda.groups.io
Date: Mon, 11 Jan 2021 16:18:00 -0800
Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition?

That is actually a great idea, I can imagine this capability being used to make PC games more accessible with NVDA.







Sascha Cowley
 

Yes. What about it wouldn't be, and why?

On 2021-01-12 11:29, Rob Hudson wrote:
Is that capability even available in python.

----- Original Message -----
From: "Supanut Leepaisomboon" <supanut2000@...>
To: nvda@nvda.groups.io
Date: Mon, 11 Jan 2021 16:18:00 -0800
Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition?

That is actually a great idea, I can imagine this capability being used to make PC games more accessible with NVDA.








Jaffar Sidek <jaffar.sidek10@...>
 

You can do Machine learning with python, so It may not be imposible to implement AI.

On 12/1/2021 8:29 am, Rob Hudson wrote:
Is that capability even available in python.

----- Original Message -----
From: "Supanut Leepaisomboon" <supanut2000@...>
To: nvda@nvda.groups.io
Date: Mon, 11 Jan 2021 16:18:00 -0800
Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition?

That is actually a great idea, I can imagine this capability being used to make PC games more accessible with NVDA.








Rob Hudson <rob_hudson3182@...>
 

Yeah all I know is that it's a scripting language. Didn't know how deep it got into machine learning fuzzy logic and AI>

----- Original Message -----
From: "Jaffar Sidek" <jaffar.sidek10@...>
To: nvda@nvda.groups.io
Date: Tue, 12 Jan 2021 08:55:24 +0800
Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition?

You can do Machine learning with python, so It may not be imposible to
implement AI.

On 12/1/2021 8:29 am, Rob Hudson wrote:
Is that capability even available in python.

----- Original Message -----
From: "Supanut Leepaisomboon" <supanut2000@...>
To: nvda@nvda.groups.io
Date: Mon, 11 Jan 2021 16:18:00 -0800
Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition?

That is actually a great idea, I can imagine this capability being used to make PC games more accessible with NVDA.













Jaffar Sidek <jaffar.sidek10@...>
 

Python has become quite a power pack programming language in it's own right.  It is a multi-purpose language tool for writing GUIS, Scientific Applications, and games.  It drives the web with it's Django and Flask built apps.  It has frameworks built based upon it to make Android and IPhone apps.  It is powerful, yet relatively easier to learn than Java or C or C++.  Gone are the days when Python is just a mere scripting language.  In fact, NVDA wouldn't be where it is today if Python were just a mere scripting language.  I should think that more appropriate research should be made before sweeping statements like this were made.  Cheers!

On 12/1/2021 9:00 am, Rob Hudson wrote:
Yeah all I know is that it's a scripting language. Didn't know how deep it got into machine learning fuzzy logic and AI>

----- Original Message -----
From: "Jaffar Sidek" <jaffar.sidek10@...>
To: nvda@nvda.groups.io
Date: Tue, 12 Jan 2021 08:55:24 +0800
Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition?

You can do Machine learning with python, so It may not be imposible to
implement AI.

On 12/1/2021 8:29 am, Rob Hudson wrote:
Is that capability even available in python.

----- Original Message -----
From: "Supanut Leepaisomboon" <supanut2000@...>
To: nvda@nvda.groups.io
Date: Mon, 11 Jan 2021 16:18:00 -0800
Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition?

That is actually a great idea, I can imagine this capability being used to make PC games more accessible with NVDA.












Suhas D
 

That would be so amazing.
Oh, the possibilities!


---

--Suhas
Sent from Thunderbird

“Things can turn out differently, Apollo.
That's the nice thing about being human.
We only have one life, but we can choose what kind of story it's going to be.”
Rick Riordan
On 1/12/2021 5:18, Luke Robinett wrote:

So one thing I enjoy about VoiceOver on my iPhone is it has gotten really good at using UI to make otherwise inaccessible UI elements available to interact with. More than just simple OCR, it can ascertain the layout and make educated guesses about controls like buttons and tabs, greatly expanding the usability of apps that otherwise would be partially or totally inaccessible.

Is there any chance NVDA will eventually reach that level of sophistication? I know there are third party add-ons that attempt to bridge that gap for specific types of apps, for example the great Sibiac add-on which helps make certain music production apps and plugins accessible with NVDA, but it would be great to see these capabilities broadened and rolled into the core functionality of the product.

 

Thanks,

Luke

 


Sean Murphy
 

AI is not that smart as it is all dependent on the data used to train it. Thus all object recognition AI has someone who has applied a specific description to icons, a part of an image, etc. An example where this breaks is if the developer does not use standard icons to indicate navigation like backwards, forward, okay, etc. The user will get some description from the Object recognition and might not be meaningful due to the description provided. The term used in other countries for that description also might not be known:

Lets say an icon is used to indicate to the user that this is the delete button. The AI Object OCR detects this as a trash can. Outside the USA the term trash can is not used. So the user might not understand this is the delete button. The delete button is tied to some important bit of information. They press it to try and work out what it does. Oops, that information is gone. The object recognition then has to know what country your device is using and then use that’s countries term for a rubbish bin (Australian term for trash can). This again assumes this icon is a delete button. There is no guarantee the designer of that app will use internationally standard icons. Yes, there is an organisation that defines all the icons, emoticons, etc.

 

Outside simple standardised icons. Object recognition has a long way to go and requires far more research. I do have the hope it will be able to handle complex images like organisation graphs, etc.

 

Note, the above is just an example demonstrating AI is not a silver bullet. It does help, but I have seen situations on Apple where it has caused more problems than it fixes. Especially if the button, link, etc already has an accessible name which VoiceOver reads out. The Apple solution still gives you the object recognition icon which makes things to verbose. Before it is introduced into any screen reader. Real careful thinking needs to be taken into consideration how it is adopted. As I can see it becoming more of an annoyance than a solution.

 

 

I am not saying it should not be explored to see how it will help. One area is AI dynamically changing the UI of a program to make it easier for a keyboard user. This is one thing that people are already doing work on.

 

Sean

 

 

From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Luke Robinett
Sent: Tuesday, January 12, 2021 10:49 AM
To: nvda@nvda.groups.io
Subject: [nvda] Will NVDA eventually use AI for better GUI recognition?

 

So one thing I enjoy about VoiceOver on my iPhone is it has gotten really good at using UI to make otherwise inaccessible UI elements available to interact with. More than just simple OCR, it can ascertain the layout and make educated guesses about controls like buttons and tabs, greatly expanding the usability of apps that otherwise would be partially or totally inaccessible.

Is there any chance NVDA will eventually reach that level of sophistication? I know there are third party add-ons that attempt to bridge that gap for specific types of apps, for example the great Sibiac add-on which helps make certain music production apps and plugins accessible with NVDA, but it would be great to see these capabilities broadened and rolled into the core functionality of the product.

 

Thanks,

Luke

 


Luke Robinett <blindgroupsluke@...>
 

Rod, it for sure is. In fact, python is touted as one of the better languages for machine learning.

On Jan 11, 2021, at 4:30 PM, Rob Hudson <rob_hudson3182@...> wrote:

Is that capability even available in python.

----- Original Message -----
From: "Supanut Leepaisomboon" <supanut2000@...>
To: nvda@nvda.groups.io
Date: Mon, 11 Jan 2021 16:18:00 -0800
Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition?

That is actually a great idea, I can imagine this capability being used to make PC games more accessible with NVDA.










Luke Robinett <blindgroupsluke@...>
 

Modern python is just as capable a language as Java, C++, JavaScript and others. Also, you can write modules in C for NVDA if I’m not mistaken. I believe some of the application‘s core is written in that language.

On Jan 11, 2021, at 5:01 PM, Rob Hudson <rob_hudson3182@...> wrote:

Yeah all I know is that it's a scripting language. Didn't know how deep it got into machine learning fuzzy logic and AI>

----- Original Message -----
From: "Jaffar Sidek" <jaffar.sidek10@...>
To: nvda@nvda.groups.io
Date: Tue, 12 Jan 2021 08:55:24 +0800
Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition?

You can do Machine learning with python, so It may not be imposible to
implement AI.

On 12/1/2021 8:29 am, Rob Hudson wrote:
Is that capability even available in python.

----- Original Message -----
From: "Supanut Leepaisomboon" <supanut2000@...>
To: nvda@nvda.groups.io
Date: Mon, 11 Jan 2021 16:18:00 -0800
Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition?

That is actually a great idea, I can imagine this capability being used to make PC games more accessible with NVDA.
















Luke Robinett <blindgroupsluke@...>
 

Sean,
I’m not fully on board with your assessment. Machine learning isn’t a static thing; by its very definition, you feed it content in a particular domain area and it gets better and better at what it does. Just because something is still in its infancy doesn’t mean it isn’t worth pursuing. If that were the attitude, we wouldn’t have screen readers in the first place. Besides, much like current OCR capabilities, this wouldn’t be a replacement for how NVDA normally works – just another supplement. and like I said, it’s already being implemented quite successfully in voiceover, so the argument that the tech just isn’t reliable enough yet kind of goes out the window. That doesn’t mean it’s perfect, but when has anything about accessibility ever been perfect? LOL.

On Jan 12, 2021, at 1:32 AM, Sean Murphy <mhysnm1964@...> wrote:



AI is not that smart as it is all dependent on the data used to train it. Thus all object recognition AI has someone who has applied a specific description to icons, a part of an image, etc. An example where this breaks is if the developer does not use standard icons to indicate navigation like backwards, forward, okay, etc. The user will get some description from the Object recognition and might not be meaningful due to the description provided. The term used in other countries for that description also might not be known:

Lets say an icon is used to indicate to the user that this is the delete button. The AI Object OCR detects this as a trash can. Outside the USA the term trash can is not used. So the user might not understand this is the delete button. The delete button is tied to some important bit of information. They press it to try and work out what it does. Oops, that information is gone. The object recognition then has to know what country your device is using and then use that’s countries term for a rubbish bin (Australian term for trash can). This again assumes this icon is a delete button. There is no guarantee the designer of that app will use internationally standard icons. Yes, there is an organisation that defines all the icons, emoticons, etc.

 

Outside simple standardised icons. Object recognition has a long way to go and requires far more research. I do have the hope it will be able to handle complex images like organisation graphs, etc.

 

Note, the above is just an example demonstrating AI is not a silver bullet. It does help, but I have seen situations on Apple where it has caused more problems than it fixes. Especially if the button, link, etc already has an accessible name which VoiceOver reads out. The Apple solution still gives you the object recognition icon which makes things to verbose. Before it is introduced into any screen reader. Real careful thinking needs to be taken into consideration how it is adopted. As I can see it becoming more of an annoyance than a solution.

 

 

I am not saying it should not be explored to see how it will help. One area is AI dynamically changing the UI of a program to make it easier for a keyboard user. This is one thing that people are already doing work on.

 

Sean

 

 

From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Luke Robinett
Sent: Tuesday, January 12, 2021 10:49 AM
To: nvda@nvda.groups.io
Subject: [nvda] Will NVDA eventually use AI for better GUI recognition?

 

So one thing I enjoy about VoiceOver on my iPhone is it has gotten really good at using UI to make otherwise inaccessible UI elements available to interact with. More than just simple OCR, it can ascertain the layout and make educated guesses about controls like buttons and tabs, greatly expanding the usability of apps that otherwise would be partially or totally inaccessible.

Is there any chance NVDA will eventually reach that level of sophistication? I know there are third party add-ons that attempt to bridge that gap for specific types of apps, for example the great Sibiac add-on which helps make certain music production apps and plugins accessible with NVDA, but it would be great to see these capabilities broadened and rolled into the core functionality of the product.

 

Thanks,

Luke

 


Devin Prater
 

OCR wasn't all that great 10 years ago. Now, it's very usable. I expect the same to be true of screen/user interface recognition years from now as well.

On 1/12/21 1:39 PM, Luke Robinett wrote:

Sean,
I’m not fully on board with your assessment. Machine learning isn’t a static thing; by its very definition, you feed it content in a particular domain area and it gets better and better at what it does. Just because something is still in its infancy doesn’t mean it isn’t worth pursuing. If that were the attitude, we wouldn’t have screen readers in the first place. Besides, much like current OCR capabilities, this wouldn’t be a replacement for how NVDA normally works – just another supplement. and like I said, it’s already being implemented quite successfully in voiceover, so the argument that the tech just isn’t reliable enough yet kind of goes out the window. That doesn’t mean it’s perfect, but when has anything about accessibility ever been perfect? LOL.

On Jan 12, 2021, at 1:32 AM, Sean Murphy <mhysnm1964@...> wrote:



AI is not that smart as it is all dependent on the data used to train it. Thus all object recognition AI has someone who has applied a specific description to icons, a part of an image, etc. An example where this breaks is if the developer does not use standard icons to indicate navigation like backwards, forward, okay, etc. The user will get some description from the Object recognition and might not be meaningful due to the description provided. The term used in other countries for that description also might not be known:

Lets say an icon is used to indicate to the user that this is the delete button. The AI Object OCR detects this as a trash can. Outside the USA the term trash can is not used. So the user might not understand this is the delete button. The delete button is tied to some important bit of information. They press it to try and work out what it does. Oops, that information is gone. The object recognition then has to know what country your device is using and then use that’s countries term for a rubbish bin (Australian term for trash can). This again assumes this icon is a delete button. There is no guarantee the designer of that app will use internationally standard icons. Yes, there is an organisation that defines all the icons, emoticons, etc.

 

Outside simple standardised icons. Object recognition has a long way to go and requires far more research. I do have the hope it will be able to handle complex images like organisation graphs, etc.

 

Note, the above is just an example demonstrating AI is not a silver bullet. It does help, but I have seen situations on Apple where it has caused more problems than it fixes. Especially if the button, link, etc already has an accessible name which VoiceOver reads out. The Apple solution still gives you the object recognition icon which makes things to verbose. Before it is introduced into any screen reader. Real careful thinking needs to be taken into consideration how it is adopted. As I can see it becoming more of an annoyance than a solution.

 

 

I am not saying it should not be explored to see how it will help. One area is AI dynamically changing the UI of a program to make it easier for a keyboard user. This is one thing that people are already doing work on.

 

Sean

 

 

From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Luke Robinett
Sent: Tuesday, January 12, 2021 10:49 AM
To: nvda@nvda.groups.io
Subject: [nvda] Will NVDA eventually use AI for better GUI recognition?

 

So one thing I enjoy about VoiceOver on my iPhone is it has gotten really good at using UI to make otherwise inaccessible UI elements available to interact with. More than just simple OCR, it can ascertain the layout and make educated guesses about controls like buttons and tabs, greatly expanding the usability of apps that otherwise would be partially or totally inaccessible.

Is there any chance NVDA will eventually reach that level of sophistication? I know there are third party add-ons that attempt to bridge that gap for specific types of apps, for example the great Sibiac add-on which helps make certain music production apps and plugins accessible with NVDA, but it would be great to see these capabilities broadened and rolled into the core functionality of the product.

 

Thanks,

Luke

 


Pranav Lal
 

Hi all,

As others have mentioned, the problem is the model. We need someone to compile data on what user interfaces look like, probably at a control level and then determine how to interact with it.

Suppose there is a non-standard edit box, NVDA would need to know that there is an edit box and then would also need to interact with it. It may not get the relevant events from it so it is hard to say how good such a feature would be but this is very possible to do. Does anyone have a database of pictures of windows controls?


Pranav


Luke Robinett <blindgroupsluke@...>
 

I’m good at coming up with great ideas but that doesn’t necessarily mean I have any idea how the heck to make them happen LOL. All joking at my own expense aside, I would think this kind of information already exists in some API somewhere? AWS perhaps? Like I say, apples voiceover screen reader is already doing this so the tech is definitely out there somewhere.

On Jan 12, 2021, at 4:22 PM, Pranav Lal <pranav.lal@...> wrote:

Hi all,

As others have mentioned, the problem is the model. We need someone to compile data on what user interfaces look like, probably at a control level and then determine how to interact with it.

Suppose there is a non-standard edit box, NVDA would need to know that there is an edit box and then would also need to interact with it. It may not get the relevant events from it so it is hard to say how good such a feature would be but this is very possible to do. Does anyone have a database of pictures of windows controls?


Pranav