Require help to convert image PDFs into text PDF documents


ss sarfudeen
 

Dear list members,

I have an image or a scanned PDF document which I am unable to read
using NVDA. When I open it with Acrobat reader and try to read it with
NVDA, I am not able to make sense of the document. I would like to
know the method or the program with which I can convert it in to an
accessible PDF document with proper alignment of the links, headings,
tables and the text.
Your inputs are highly appreciated.

Thanking you for any and all assistance,

Regards,

Sultana


Luke Davis
 

If they really are image PDFs, then you will never get what you want.

If you use OCR software on them (others can suggest what OCR software is best these days; I don't use OCR packages on Windows), you can get some version of the text out of these.
For example, if I process an image PDF through KNFB reader, or one of the network based recognition apps, I can get somewhere between a 70 percent and 95 percent accurate recognition of the contents.

NVDA's built in OCR feature (NVDA+R) can possibly read these one page at a time, although it's not designed for full document recognition.

However you won't get links and headings and all of that. Software simply isn't smart enough to figure out which pictures of text are supposed to become those formatting and connective elements.

Also, while I could be mistaken about this, I don't believe an image PDF can have links.

Lastly, this is not an NVDA problem. You probably want to take this conversation to a general Windows list or the chat subgroup.

Luke

"I have no idea what I'm supposed to do. I only know what I can do." -James T. Kirk


Cearbhall O'Meadhra
 

Hi,

Try Abbyyy Finereader. It is very accurate.

All the best,

Cearbhall

m +353 (0)833323487 Ph: _353 (0)1-2864623 e: cearbhall.omeadhra@blbc.ie

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of ss sarfudeen
Sent: Friday, April 30, 2021 8:50 PM
To: nvda@nvda.groups.io
Subject: [nvda] Require help to convert image PDFs into text PDF documents

Dear list members,

I have an image or a scanned PDF document which I am unable to read using NVDA. When I open it with Acrobat reader and try to read it with NVDA, I am not able to make sense of the document. I would like to know the method or the program with which I can convert it in to an accessible PDF document with proper alignment of the links, headings, tables and the text.
Your inputs are highly appreciated.

Thanking you for any and all assistance,

Regards,

Sultana






--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


 

Free & Good OCR Software for Image Scanned PDFs
--

Brian - Windows 10 Pro, 64-Bit, Version 20H2, Build 19042  

Always remember others may hate you but those who hate you don't win unless you hate them.  And then you destroy yourself.

       ~ Richard M. Nixon

 


ss sarfudeen
 

Dear Members,

Thank you all for your suggestions and inputs. Will try one of these
recommendations and see if it solves my purpose.

With kind regards,

Sultana

On 5/1/21, Brian Vogel <britechguy@gmail.com> wrote:
Free & Good OCR Software for Image Scanned PDFs (
https://nvda.groups.io/g/nvda/message/33512 )
--

Brian - Windows 10 Pro, 64-Bit, Version 20H2, Build 19042

Always remember others may hate you but those who hate you don't win unless
you hate them. And then you destroy yourself.

~ Richard M. Nixon






 

Hi; open the document in acrobat reader ; then press nvda key plus L; to OCR it and read the text;hth

No trees were destroyed in the sending of this message, however, a significant number of electrons were terribly inconvenienced.


enes sarıbaş
 

You actually can get images, links and headings ETC. Finereader is able to do this.

On 4/30/2021 3:00 PM, Luke Davis wrote:
If they really are image PDFs, then you will never get what you want.

If you use OCR software on them (others can suggest what OCR software is best these days; I don't use OCR packages on Windows), you can get some version of the text out of these.
For example, if I process an image PDF through KNFB reader, or one of the network based recognition apps, I can get somewhere between a 70 percent and 95 percent accurate recognition of the contents.

NVDA's built in OCR feature (NVDA+R) can possibly read these one page at a time, although it's not designed for full document recognition.

However you won't get links and headings and all of that. Software simply isn't smart enough to figure out which pictures of text are supposed to become those formatting and connective elements.

Also, while I could be mistaken about this, I don't believe an image PDF can have links.

Lastly, this is not an NVDA problem. You probably want to take this conversation to a general Windows list or the chat subgroup.

Luke

"I have no idea what I'm supposed to do.  I only know what I can do." -James T. Kirk




Luke Davis
 

enes sarıbaş wrote:

You actually can get images, links and headings ETC. Finereader is able to do this.
I sit corrected. I have never used that product, so wasn't aware.

Links, as raw http addresses, I can understand. For domain names, it's easy enough; but when you get to things after the first slash or question mark, they tend to get hairy, and you are at the mercy of the recognition engine--one character goes wrong, and poof.

Luke


enes sarıbaş
 

If the image is reasonably good, links convert well. Actuallly I have clicked on URLs in scanned letters from my mailbox and they work.

On 5/1/2021 3:21 PM, Luke Davis wrote:
enes sarıbaş wrote:

You actually can get images, links and headings ETC. Finereader is able to do this.
I sit corrected. I have never used that product, so wasn't aware.

Links, as raw http addresses, I can understand. For domain names, it's easy enough; but when you get to things after the first slash or question mark, they tend to get hairy, and you are at the mercy of the recognition engine--one character goes wrong, and poof.

Luke




 

Hi, i apologize for my mistake above.
It's nvda plus R not nvda plus L as i had earlier stated, thanks
Arvind

No trees were destroyed in the sending of this message, however, a significant number of electrons were terribly inconvenienced.

On 1 May 2021, at 11:19 PM, Dr. arvind singh brar via groups.io <arvindsinghbrar=gmail.com@groups.io> wrote:

Hi; open the document in acrobat reader ; then press nvda key plus L; to OCR it and read the text;hth

No trees were destroyed in the sending of this message, however, a significant number of electrons were terribly inconvenienced.




ss sarfudeen
 

Hello list members,

The built-in OCR feature of NVDA doesn't work in this document.

Regards,

Sultana

On 5/2/21, enes sarıbaş <enes.saribas@gmail.com> wrote:
If the image is reasonably good, links convert well. Actuallly I have
clicked on URLs in scanned letters from my mailbox and they work.

On 5/1/2021 3:21 PM, Luke Davis wrote:
enes sarıbaş wrote:

You actually can get images, links and headings ETC. Finereader is
able to do this.
I sit corrected. I have never used that product, so wasn't aware.

Links, as raw http addresses, I can understand. For domain names, it's
easy enough; but when you get to things after the first slash or
question mark, they tend to get hairy, and you are at the mercy of the
recognition engine--one character goes wrong, and poof.

Luke