From: firstname.lastname@example.org [mailto:email@example.com] On Behalf Of Brian Vogel
Sent: Saturday, April 11, 2020 4:44 PM
Subject: Re: [nvda] google chrome version 81
On Sat, Apr 11, 2020 at 03:43 AM, Shaun Everiss wrote:
A properly designed pdf should just work.
And something created as PDF from the get go generally will. A huge number of commercial documents are generated initially as PDFs, not much unlike one uses Word, and when the document is text-based to begin with, it reads.
The problem is, and you identified it, is that there are millions upon millions of image scanned PDFs that were created as such before the age of OCR being an automatic part of that process. Unless you (and it's generally the individual you) run OCR processing on them they won't be accessible regardless of what is being used in an attempt to read them. This is what it is, and in the case of image PDFs that are images of text, running OCR processing generally resolves the problem if the original image is even just decent.
But you've hit the nail on the head in that something generated as a PDF from scratch, that is, designed from the ground up in PDF format, is an entirely different animal than an old image scanned PDF. Given how long electronic documents circulate in their original forms it's likely that image scanned PDFs, and the need to post-process them with OCR when the image is an image of things like a magazine article, letter, book page, or the like will be with us long past the time I'll be dead.
I wrote a brief tutorial on a specific piece of software I like for doing this because its OCR is really, really good and you can save the text layer it generates along with the file so it's there when you need to open it again in the future or send it along to someone else. https://nvda.groups.io/g/nvda/message/33512
Brian - Windows 10 Pro, 64-Bit, Version 1909, Build 18363
Power is being told you're not loved and not being destroyed by it.