Will NVDA eventually use AI for better GUI recognition?
Luke Robinett <blindgroupsluke@...>
So one thing I enjoy about VoiceOver on my iPhone is it has gotten really good at using UI to make otherwise inaccessible UI elements available to interact with. More than just simple OCR, it can ascertain the layout and make educated guesses about controls like buttons and tabs, greatly expanding the usability of apps that otherwise would be partially or totally inaccessible. Is there any chance NVDA will eventually reach that level of sophistication? I know there are third party add-ons that attempt to bridge that gap for specific types of apps, for example the great Sibiac add-on which helps make certain music production apps and plugins accessible with NVDA, but it would be great to see these capabilities broadened and rolled into the core functionality of the product.
Thanks, Luke
|
|
William
That is good idea, and also I like voiceover the ability to label inaccessible element myself.
Luke Robinett 於 12/1/2021 7:48 寫道:
|
|
That is actually a great idea, I can imagine this capability being used to make PC games more accessible with NVDA.
|
|
Rob Hudson <rob_hudson3182@...>
Is that capability even available in python.
toggle quoted message
Show quoted text
----- Original Message -----
From: "Supanut Leepaisomboon" <supanut2000@...> To: nvda@nvda.groups.io Date: Mon, 11 Jan 2021 16:18:00 -0800 Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition? That is actually a great idea, I can imagine this capability being used to make PC games more accessible with NVDA. |
|
Sascha Cowley
Yes. What about it wouldn't be, and why?
toggle quoted message
Show quoted text
On 2021-01-12 11:29, Rob Hudson wrote:
Is that capability even available in python. |
|
Jaffar Sidek <jaffar.sidek10@...>
You can do Machine learning with python, so It may not be imposible to implement AI.
toggle quoted message
Show quoted text
On 12/1/2021 8:29 am, Rob Hudson wrote:
Is that capability even available in python. |
|
Rob Hudson <rob_hudson3182@...>
Yeah all I know is that it's a scripting language. Didn't know how deep it got into machine learning fuzzy logic and AI>
toggle quoted message
Show quoted text
----- Original Message -----
From: "Jaffar Sidek" <jaffar.sidek10@...> To: nvda@nvda.groups.io Date: Tue, 12 Jan 2021 08:55:24 +0800 Subject: Re: [nvda] Will NVDA eventually use AI for better GUI recognition? You can do Machine learning with python, so It may not be imposible to |
|
Jaffar Sidek <jaffar.sidek10@...>
Python has become quite a power pack programming language in it's own right. It is a multi-purpose language tool for writing GUIS, Scientific Applications, and games. It drives the web with it's Django and Flask built apps. It has frameworks built based upon it to make Android and IPhone apps. It is powerful, yet relatively easier to learn than Java or C or C++. Gone are the days when Python is just a mere scripting language. In fact, NVDA wouldn't be where it is today if Python were just a mere scripting language. I should think that more appropriate research should be made before sweeping statements like this were made. Cheers!
toggle quoted message
Show quoted text
On 12/1/2021 9:00 am, Rob Hudson wrote:
Yeah all I know is that it's a scripting language. Didn't know how deep it got into machine learning fuzzy logic and AI> |
|
Suhas D
That would be so amazing.
--- --Suhas
Sent from Thunderbird — “Things can turn out differently, Apollo.Rick Riordan On 1/12/2021 5:18, Luke Robinett wrote:
|
|
Sean Murphy
AI is not that smart as it is all dependent on the data used to train it. Thus all object recognition AI has someone who has applied a specific description to icons, a part of an image, etc. An example where this breaks is if the developer does not use standard icons to indicate navigation like backwards, forward, okay, etc. The user will get some description from the Object recognition and might not be meaningful due to the description provided. The term used in other countries for that description also might not be known: Lets say an icon is used to indicate to the user that this is the delete button. The AI Object OCR detects this as a trash can. Outside the USA the term trash can is not used. So the user might not understand this is the delete button. The delete button is tied to some important bit of information. They press it to try and work out what it does. Oops, that information is gone. The object recognition then has to know what country your device is using and then use that’s countries term for a rubbish bin (Australian term for trash can). This again assumes this icon is a delete button. There is no guarantee the designer of that app will use internationally standard icons. Yes, there is an organisation that defines all the icons, emoticons, etc.
Outside simple standardised icons. Object recognition has a long way to go and requires far more research. I do have the hope it will be able to handle complex images like organisation graphs, etc.
Note, the above is just an example demonstrating AI is not a silver bullet. It does help, but I have seen situations on Apple where it has caused more problems than it fixes. Especially if the button, link, etc already has an accessible name which VoiceOver reads out. The Apple solution still gives you the object recognition icon which makes things to verbose. Before it is introduced into any screen reader. Real careful thinking needs to be taken into consideration how it is adopted. As I can see it becoming more of an annoyance than a solution.
I am not saying it should not be explored to see how it will help. One area is AI dynamically changing the UI of a program to make it easier for a keyboard user. This is one thing that people are already doing work on.
Sean
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Luke Robinett
Sent: Tuesday, January 12, 2021 10:49 AM To: nvda@nvda.groups.io Subject: [nvda] Will NVDA eventually use AI for better GUI recognition?
So one thing I enjoy about VoiceOver on my iPhone is it has gotten really good at using UI to make otherwise inaccessible UI elements available to interact with. More than just simple OCR, it can ascertain the layout and make educated guesses about controls like buttons and tabs, greatly expanding the usability of apps that otherwise would be partially or totally inaccessible. Is there any chance NVDA will eventually reach that level of sophistication? I know there are third party add-ons that attempt to bridge that gap for specific types of apps, for example the great Sibiac add-on which helps make certain music production apps and plugins accessible with NVDA, but it would be great to see these capabilities broadened and rolled into the core functionality of the product.
Thanks, Luke
|
|
Luke Robinett <blindgroupsluke@...>
Rod, it for sure is. In fact, python is touted as one of the better languages for machine learning.
toggle quoted message
Show quoted text
On Jan 11, 2021, at 4:30 PM, Rob Hudson <rob_hudson3182@...> wrote: |
|
Luke Robinett <blindgroupsluke@...>
Modern python is just as capable a language as Java, C++, JavaScript and others. Also, you can write modules in C for NVDA if I’m not mistaken. I believe some of the application‘s core is written in that language.
toggle quoted message
Show quoted text
On Jan 11, 2021, at 5:01 PM, Rob Hudson <rob_hudson3182@...> wrote: |
|
Luke Robinett <blindgroupsluke@...>
Sean, I’m not fully on board with your assessment. Machine learning isn’t a static thing; by its very definition, you feed it content in a particular domain area and it gets better and better at what it does. Just because something is still in its infancy doesn’t mean it isn’t worth pursuing. If that were the attitude, we wouldn’t have screen readers in the first place. Besides, much like current OCR capabilities, this wouldn’t be a replacement for how NVDA normally works – just another supplement. and like I said, it’s already being implemented quite successfully in voiceover, so the argument that the tech just isn’t reliable enough yet kind of goes out the window. That doesn’t mean it’s perfect, but when has anything about accessibility ever been perfect? LOL. On Jan 12, 2021, at 1:32 AM, Sean Murphy <mhysnm1964@...> wrote:
|
|
Devin Prater
OCR wasn't all that great 10 years ago. Now, it's very usable. I
expect the same to be true of screen/user interface recognition
years from now as well. On 1/12/21 1:39 PM, Luke Robinett
wrote:
|
|
Hi all,
As others have mentioned, the problem is the model. We need someone to compile data on what user interfaces look like, probably at a control level and then determine how to interact with it. Suppose there is a non-standard edit box, NVDA would need to know that there is an edit box and then would also need to interact with it. It may not get the relevant events from it so it is hard to say how good such a feature would be but this is very possible to do. Does anyone have a database of pictures of windows controls? Pranav |
|
Luke Robinett <blindgroupsluke@...>
I’m good at coming up with great ideas but that doesn’t necessarily mean I have any idea how the heck to make them happen LOL. All joking at my own expense aside, I would think this kind of information already exists in some API somewhere? AWS perhaps? Like I say, apples voiceover screen reader is already doing this so the tech is definitely out there somewhere.
toggle quoted message
Show quoted text
On Jan 12, 2021, at 4:22 PM, Pranav Lal <pranav.lal@...> wrote: |
|