A more intelligent NVDA OCR?


Luke Robinett <blindgroupsluke@...>
 

First of all, just having OCR capabilities in NVDA has been a game changer! It was only recently that I learned you can hit enter on text in the OCR window and NVDA would try to invoke that corresponding control in the application.
With that in mind, it seems like this could be evolved even further. Is the technology available such that NVDA could try to not only recognize text from the app but ascertain the structure of the user interface, perhaps organizing things into form elements like buttons, text fields etc and then presenting them in the same element browser we use for web pages?
Barring that, perhaps a more immediately reachable upgrade would be different actions we can take on items in the OCR result. For example, instead of just the option to hit enter to invoke something, what about a Context menu that lets you invoke it or simply position the mouse cursor where that element was found?
I’m not a python developer so take these as suggestions from a layman, but if possible, I think it would frankly revolutionize NVDA. I Work with a lot of music recording/production software and much of it is completely inaccessible, relying on highly visual interfaces that screen readers have no handle into. Something like what I’ve described would really be amazing for these types of programs. Thanks!

Join nvda@nvda.groups.io to automatically receive all group messages.