Re: new addon: NVDA Advanced OCR.


benmoxey@...
 

Hi everyone

 

First, a big thank you to all of the developers who freely give their time to create these excellent additions to NVDA. I’m looking forward to trying this one. 😊

 

Second, Joseph’s comment about educating content creators about the importance of accessibility is extremely relevant. It is simply not true that documents that are originally created as PDFs are generally accessible.

 

When a PDF is created accessibly, they have a proper heading structure, table structures (with appropriately marked headers), accessible links, alt text etc. In fact, if tagged correctly, you get the best reading experience when loaded in Adobe Reader DC; they navigate like a well-designed website. The reason that loading PDFs in a web browser has become so popular in the blind community is because they are so often created with little accessibility in mind. The browser presents the documents contents the way it thinks it’s supposed to be displayed. This is why you often notice that there’s a heading structure that doesn’t really make sense, no images  and no tables.

 

I think this is an important point to make because there is a bit of a perception out there that being able to access some information is good enough. We in the blind community deserve better and it starts with education. These add-ons are a very valuable and appreciated work-around in the meantime.

 

All the best.

 

Ben

 

From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Brian Vogel
Sent: Thursday, 9 December 2021 4:30 AM
To: nvda@nvda.groups.io
Subject: Re: [nvda] new addon: NVDA Advanced OCR.

 

On Wed, Dec 8, 2021 at 12:13 PM, Joseph Lee wrote:

For PDF files, provided that they are generated with accessibility in mind,

-
Joseph,

I haven't seen any PDF originally created as PDF that's not accessible, fully accessible, with the possible exception of the lack of Alt Text for images.

That being said, I always presume these OCR functions are going to need to exist for a very long time simply because there exist so many image scanned PDF files that were created long before OCR became a standard part of scanning (or even existed).

I'll tell you what I told several of my former clients who were grad students, and who routinely were handed ancient image scanned PDFs that have been in use for years to decades:  OCR process them, save the text layer with the file itself, then try like the dickens to get whoever it is that maintains the archive from which the original was pulled to ditch that original and replace it with the OCRed version.

It's really not anyone's fault that inaccessible PDFs exist that were scanned in "another era."  But those documents can easily be made accessible via OCR done so that the result can be saved as part of the source file.  Those who are the digital archivists should be willing to replace inaccessible versions with accessible ones with just the slightest bit of vetting of the result.  And if they don't want to accept an OCRed version from someone else, a system needs to be in place to report image scanned PDFs for permanent OCR processing by staff, and that it be done promptly.  This isn't time intensive when you're working on demand, rather than a search and destroy mission for every PDF that might be image scanned.
 
--

Brian - Windows 10, 64-Bit, Version 21H1, Build 19043  

The difference between a top-flight creative man and the hack is his ability to express powerful meanings indirectly.

         ~ Vance Packard

 

Join nvda@nvda.groups.io to automatically receive all group messages.