AI is not that smart as it is all dependent on the data used to train it. Thus all object recognition AI has someone who has applied a specific description to icons, a part of an image, etc. An example where this breaks is if the developer does not use standard icons to indicate navigation like backwards, forward, okay, etc. The user will get some description from the Object recognition and might not be meaningful due to the description provided. The term used in other countries for that description also might not be known:
Lets say an icon is used to indicate to the user that this is the delete button. The AI Object OCR detects this as a trash can. Outside the USA the term trash can is not used. So the user might not understand this is the delete button. The delete button is tied to some important bit of information. They press it to try and work out what it does. Oops, that information is gone. The object recognition then has to know what country your device is using and then use that’s countries term for a rubbish bin (Australian term for trash can). This again assumes this icon is a delete button. There is no guarantee the designer of that app will use internationally standard icons. Yes, there is an organisation that defines all the icons, emoticons, etc.
Outside simple standardised icons. Object recognition has a long way to go and requires far more research. I do have the hope it will be able to handle complex images like organisation graphs, etc.
Note, the above is just an example demonstrating AI is not a silver bullet. It does help, but I have seen situations on Apple where it has caused more problems than it fixes. Especially if the button, link, etc already has an accessible name which VoiceOver reads out. The Apple solution still gives you the object recognition icon which makes things to verbose. Before it is introduced into any screen reader. Real careful thinking needs to be taken into consideration how it is adopted. As I can see it becoming more of an annoyance than a solution.
I am not saying it should not be explored to see how it will help. One area is AI dynamically changing the UI of a program to make it easier for a keyboard user. This is one thing that people are already doing work on.
From: firstname.lastname@example.org <email@example.com> On Behalf Of Luke Robinett
Sent: Tuesday, January 12, 2021 10:49 AM
Subject: [nvda] Will NVDA eventually use AI for better GUI recognition?
So one thing I enjoy about VoiceOver on my iPhone is it has gotten really good at using UI to make otherwise inaccessible UI elements available to interact with. More than just simple OCR, it can ascertain the layout and make educated guesses about controls like buttons and tabs, greatly expanding the usability of apps that otherwise would be partially or totally inaccessible.
Is there any chance NVDA will eventually reach that level of sophistication? I know there are third party add-ons that attempt to bridge that gap for specific types of apps, for example the great Sibiac add-on which helps make certain music production apps and plugins accessible with NVDA, but it would be great to see these capabilities broadened and rolled into the core functionality of the product.