AI is not that smart
as it is all dependent on the data used to train it.
Thus all object recognition AI has someone who has
applied a specific description to icons, a part of an
image, etc. An example where this breaks is if the
developer does not use standard icons to indicate
navigation like backwards, forward, okay, etc. The user
will get some description from the Object recognition
and might not be meaningful due to the description
provided. The term used in other countries for that
description also might not be known:
Lets say an icon is
used to indicate to the user that this is the delete
button. The AI Object OCR detects this as a trash can.
Outside the USA the term trash can is not used. So the
user might not understand this is the delete button. The
delete button is tied to some important bit of
information. They press it to try and work out what it
does. Oops, that information is gone. The object
recognition then has to know what country your device is
using and then use that’s countries term for a rubbish
bin (Australian term for trash can). This again assumes
this icon is a delete button. There is no guarantee the
designer of that app will use internationally standard
icons. Yes, there is an organisation that defines all
the icons, emoticons, etc.
standardised icons. Object recognition has a long way to
go and requires far more research. I do have the hope it
will be able to handle complex images like organisation
Note, the above is
just an example demonstrating AI is not a silver bullet.
It does help, but I have seen situations on Apple where
it has caused more problems than it fixes. Especially if
the button, link, etc already has an accessible name
which VoiceOver reads out. The Apple solution still
gives you the object recognition icon which makes things
to verbose. Before it is introduced into any screen
reader. Real careful thinking needs to be taken into
consideration how it is adopted. As I can see it
becoming more of an annoyance than a solution.
I am not saying it
should not be explored to see how it will help. One area
is AI dynamically changing the UI of a program to make
it easier for a keyboard user. This is one thing that
people are already doing work on.
So one thing I enjoy
about VoiceOver on my iPhone is it has gotten really
good at using UI to make otherwise inaccessible UI
elements available to interact with. More than just
simple OCR, it can ascertain the layout and make
educated guesses about controls like buttons and tabs,
greatly expanding the usability of apps that otherwise
would be partially or totally inaccessible.
Is there any chance
NVDA will eventually reach that level of sophistication?
I know there are third party add-ons that attempt to
bridge that gap for specific types of apps, for example
the great Sibiac add-on which helps make certain music
production apps and plugins accessible with NVDA, but it
would be great to see these capabilities broadened and
rolled into the core functionality of the product.