Topics

A more intelligent NVDA OCR?


Luke Robinett
 

First of all, just having OCR capabilities in NVDA has been a game changer! It was only recently that I learned you can hit enter on text in the OCR window and NVDA would try to invoke that corresponding control in the application.
With that in mind, it seems like this could be evolved even further. Is the technology available such that NVDA could try to not only recognize text from the app but ascertain the structure of the user interface, perhaps organizing things into form elements like buttons, text fields etc and then presenting them in the same element browser we use for web pages?
Barring that, perhaps a more immediately reachable upgrade would be different actions we can take on items in the OCR result. For example, instead of just the option to hit enter to invoke something, what about a Context menu that lets you invoke it or simply position the mouse cursor where that element was found?
I’m not a python developer so take these as suggestions from a layman, but if possible, I think it would frankly revolutionize NVDA. I Work with a lot of music recording/production software and much of it is completely inaccessible, relying on highly visual interfaces that screen readers have no handle into. Something like what I’ve described would really be amazing for these types of programs. Thanks!


Hope Williamson
 

I've never used the OCR with NVDA, so how do you use it?


Rob Hudson
 

Find a graphic with text and hit NVDA+r. OCR results pop up in a little virtual window. Hit escape to close.

----- Original Message -----
From: "Hope Williamson" <webspinner@...>
To: nvda@nvda.groups.io
Date: Mon, 28 Sep 2020 14:19:52 -0700
Subject: Re: [nvda] A more intelligent NVDA OCR?

I've never used the OCR with NVDA, so how do you use it?






Hope Williamson
 

OK will do. Does this work with image PDF documents?


Hope Williamson
 

Ok nope it really doesn't. I guess that's the one thing I'll keep the
other screen reader for.


Rob Hudson
 

It works with content immediately visible on the screen.

----- Original Message -----
From: "Hope Williamson" <webspinner@...>
To: nvda@nvda.groups.io
Date: Mon, 28 Sep 2020 14:36:25 -0700
Subject: Re: [nvda] A more intelligent NVDA OCR?

OK will do. Does this work with image PDF documents?






Quentin Christensen
 

It should work with image PDF files.  A couple of things to bear in mind:
- It will recognise printed or typed text, but not so much handwritten text (a limitation of basically every OCR).
- It works, as Rob noted, with what is currently visible on screen.  If you are zoomed in such that only 1/10th of the page is visible, that's all it will be able to run OCR on.  If, however, you zoom out so you can fit 100 pages on screen, that won't work either, as the text will be too small to recognise.  One page should work well though.

Regards

Quentin.

On Tue, Sep 29, 2020 at 8:05 AM Rob Hudson <rob_hudson3182@...> wrote:
It works with content immediately visible on the screen.

----- Original Message -----
From: "Hope Williamson" <webspinner@...>
To: nvda@nvda.groups.io
Date: Mon, 28 Sep 2020 14:36:25 -0700
Subject: Re: [nvda] A more intelligent NVDA OCR?

> OK will do. Does this work with image PDF documents?
>
>
>
>
>
>







--
Quentin Christensen
Training and Support Manager


 

I think I totally agree with the original post.
NVDA OCR, in the current form, is quite good already but I think this could be improved further, if technology allows it.
From my tests, OCR seems to work even with games such as Sims 3, thus allowing me to read in-game texts without the use of Windows Magnifier (well, at least for most of the time).


Rob Hudson
 

Games move pretty fast though. Are you able to OCR and read and act with real speed?

----- Original Message -----
From: "Supanut Leepaisomboon" <@supanut2000>
To: nvda@nvda.groups.io
Date: Mon, 28 Sep 2020 18:55:51 -0700
Subject: Re: [nvda] A more intelligent NVDA OCR?

I think I totally agree with the original post.
NVDA OCR, in the current form, is quite good already but I think this could be improved further, if technology allows it.
From my tests, OCR seems to work even with games such as Sims 3, thus allowing me to read in-game texts without the use of Windows Magnifier (well, at least for most of the time).







 

I actually have not try the OCR function with games that involves fast movement of texts, or requires constant realtime interaction with texts, so I can't say how well OCR would work in that kind of situation.


rowen brian <manchen0528@...>
 

Hi,  Rob Hudson.
I think, Lion OCR with automatic recognition can do this.
 


Larry Wang
 

It is only available in Windows 10, it is also limited by your system language. For ocr on lower version of windows you can try this addon written by me.

https://github.com/larry801/online_ocr/releases/download/0.19.5-dev/onlineOCR-0.19-dev.nvda-addon

在 2020/9/29 5:39, Hope Williamson 写道:

Ok nope it really doesn't. I guess that's the one thing I'll keep the
other screen reader for.




Bhavya shah
 

Dear all,

I look forward to using this neat OCR functionality when I switch to a
Windows 10 system. I am not sure about interpreting control types and
interface structure from scanned text, but I am quite surprised to
learn that while you can press Enter to simulate a left click on a
portion of the scanned text, you cannot press the Applications key to
right click it currently. Is there an existing GitHub ticket for this
feature request? If not, I encourage Luke and others to consider
filing one.

Thanks.

On 9/29/20, Larry Wang <larry.wang.801@...> wrote:
It is only available in Windows 10, it is also limited by your system
language. For ocr on lower version of windows you can try this addon
written by me.

https://github.com/larry801/online_ocr/releases/download/0.19.5-dev/onlineOCR-0.19-dev.nvda-addon

在 2020/9/29 5:39, Hope Williamson 写道:
Ok nope it really doesn't. I guess that's the one thing I'll keep the
other screen reader for.








--
Best Regards
Bhavya Shah
Stanford University | Class of 2024
E-mail Address: bhavya.shah125@...
LinkedIn: https://www.linkedin.com/in/bhavyashah125/


Lukasz Golonka
 

Hello,

On Mon, 28 Sep 2020 10:03:12 -0700
"Luke Robinett" <blindgroupsluke@...> wrote:

With that in mind, it seems like this could be evolved even further. Is the technology available such that NVDA could try to not only recognize text from the app but ascertain the structure of the user interface, perhaps organizing things into form elements like buttons, text fields etc and then presenting them in the same element browser we use for web pages?
Have you heard about Sibiac (
http://www.azslow.com/index.php?topic=372.0)?
I'm personally not a musician, so I haven't tested, but some people
reports great success with it and it more or less does what you've asked
for.

Barring that, perhaps a
more immediately reachable upgrade would be different actions we can
take on items in the OCR result. For example, instead of just the option
to hit enter to invoke something, what about a Context menu that lets
you invoke it or simply position the mouse cursor where that element was
found?

It is possible already. To position your mouse on the recognized item you
just need to press NVDA+numpad Slash to route the mouse to the review
cursor position. After that you can perform any mouse operation.

--
Regards
Lukasz


Lukasz Golonka
 

Hello Bhavya,


On Wed, 30 Sep 2020 04:46:35 +0530
"Bhavya shah" <bhavya.shah125@...> wrote:

Dear all,

I look forward to using this neat OCR functionality when I switch to a
Windows 10 system.
I assume you're avare of the tesseract based OCR add-on for older
versions of Windows? Assuming that you have tried it what exactly is so
neat in the Windows 10 OCR in comparison with the tesseract based one.
Since i'm currently maintaining it I'm genuinely interested in your
feedback.

--
Regards
Lukasz