Topics

NVDA Addon PDF2Text and python 3


felix keller
 

Hello list,
There was once the NVDA extension PDF2Text.
Could someone see if they could be converted to Python 3?
To see the extensions you can use the following link:
https://jeff.tdrealms.com/Add-Ons/PDF2Text.nvda-addon
I would look forward to feedback.
Kind regards
Felix Keller


 

What did this add-on do that NVDA OCR cannot do, or a third-party OCR engine in a PDF Reader cannot do?  I am unfamiliar with the add-on or its function, but the name suggests that it does OCR on image scanned PDFs.

--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Daniel Damacena
 

I use a tool to convert pdf files to txt ones. It is called xpdf. The link is below:

https://www.xpdfreader.com/download.html

Download the command line tool. Put a shortcut of the pdftotxt.exe on the context menu by doing as follows:

press win + r. type "shell:sendto" and paste the shortcut.

Now all you have to do to convert a file is to press applications, send to, pdftotxt.

A file with the same name will be put in the same folder, but converted to txt.

Em 26/10/2020 04:47, felix keller escreveu:

Hello list,
There was once the NVDA extension PDF2Text.
Could someone see if they could be converted to Python 3?
To see the extensions you can use the following link:
https://jeff.tdrealms.com/Add-Ons/PDF2Text.nvda-addon
I would look forward to feedback.
Kind regards
Felix Keller





 

Daniel,

           Thanks very much for the link to the download page for Xpdf and XpdfReader.  It also appears that they have language support for a variety of languages, though, with the exceptions of Greek and Turkish, none of the European languages (other than English, which I presume is the default based on the fact the site is in English) are included in the list of language support packs.

            It's nice that there is an easy way to get the image to OCRed PDF converter integrated directly in to the Windows shell for ease of use, too.

            Just curious whether you use XpdfReader and whether it's accessible?
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Daniel Damacena
 

As far as I know, it's not able to execute OCR. It converts directly. I use only the command line tool, not the reader.

Good luck!

Em 26/10/2020 15:20, Brian Vogel escreveu:

Daniel,

           Thanks very much for the link to the download page for Xpdf and XpdfReader.  It also appears that they have language support for a variety of languages, though, with the exceptions of Greek and Turkish, none of the European languages (other than English, which I presume is the default based on the fact the site is in English) are included in the list of language support packs.

            It's nice that there is an easy way to get the image to OCRed PDF converter integrated directly in to the Windows shell for ease of use, too.

            Just curious whether you use XpdfReader and whether it's accessible?
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


 

On Mon, Oct 26, 2020 at 02:49 PM, Daniel Damacena wrote:
As far as I know, it's not able to execute OCR. It converts directly.
-
The only way an image PDF, where the image contains text, gets converted to a text-base (or, more likely, image with separate text layer) PDF is via OCR.

I've loved the OCR function of Tracker Software's PDF XChange Viewer for years.  The software is free, and it supports a slew of other languages besides English.  Sadly, it's not accessible for its main function of reading/viewing PDF files, but the OCR process is accessible.  A couple of my clients who were graduate students found its capabilities as far as OCR goes extraordinary, as do I.  I know that OCR has improved radically over the course of the last several decades, but so far it's the most accurate I've found on some pretty crappy source image PDFs.

I need to look into whether there is a command line invocation for the OCR function that would allow output to be redirected conveniently for PDF XChange Viewer.  I never bothered to look into that.
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


 

On Mon, Oct 26, 2020 at 02:49 PM, Daniel Damacena wrote:
I use only the command line tool, not the reader.
-
Also, on further inspection, I see the pdftotext tool does literally what it says.  The resulting output is in plain text.  And I'm virtually certain that would be easier to navigate than anything in native PDF format with a text layer would be.

Thanks again for this. It looks like it could be very, very handy indeed in certain circumstances.
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Rob Hudson
 

I could google it I know. But can you provide a link for this software you're discussing?

----- Original Message -----
From: "Brian Vogel" <@britechguy>
To: nvda@nvda.groups.io
Date: Mon, 26 Oct 2020 12:19:18 -0700
Subject: Re: [nvda] NVDA Addon PDF2Text and python 3

On Mon, Oct 26, 2020 at 02:49 PM, Daniel Damacena wrote:


As far as I know, it's not able to execute OCR. It converts directly.
-
The only way an image PDF, where the image contains text, gets converted to a text-base (or, more likely, image with separate text layer) PDF is via OCR.

I've loved the OCR function of Tracker Software's PDF XChange Viewer ( https://www.tracker-software.com/product/downloads/discontinued ) for years. The software is free, and it supports a slew of other languages ( https://www.tracker-software.com/pdf-xchange-ocr ) besides English. Sadly, it's not accessible for its main function of reading/viewing PDF files, but the OCR process is accessible. A couple of my clients who were graduate students found its capabilities as far as OCR goes extraordinary, as do I. I know that OCR has improved radically over the course of the last several decades, but so far it's the most accurate I've found on some pretty crappy source image PDFs.

I need to look into whether there is a command line invocation for the OCR function that would allow output to be redirected conveniently for PDF XChange Viewer. I never bothered to look into that.

--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041

Its hard waking up and realizing its not always black and white.

~ Kelley Boorn







 

On Mon, Oct 26, 2020 at 03:32 PM, Rob Hudson wrote:
But can you provide a link for this software you're discussing?
-
I already did, right in the e-mail message.  It is not my custom, nor common custom in general anymore, to provide naked URLs.

There are click-through links for both the software and the language packs in the original message.
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Rob Hudson
 

Not here there aren't.
So I guesss I ask Aunt Google.

----- Original Message -----
From: "Brian Vogel" <@britechguy>
To: nvda@nvda.groups.io
Date: Mon, 26 Oct 2020 12:50:22 -0700
Subject: Re: [nvda] NVDA Addon PDF2Text and python 3

On Mon, Oct 26, 2020 at 03:32 PM, Rob Hudson wrote:


But can you provide a link for this software you're discussing?
-
I already did, right in the e-mail message. It is not my custom, nor common custom in general anymore, to provide naked URLs.

There are click-through links for both the software and the language packs in the original message.

--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041

Its hard waking up and realizing its not always black and white.

~ Kelley Boorn







 

On Mon, Oct 26, 2020 at 04:45 PM, Rob Hudson wrote:
Not here there aren't.
-
Then you need to check your mail settings.  You had my message bottom quoted on the one asking for links, and the links themselves appear in that very bottom quoted material.  I have no idea why they would not be visible to you unless you have the screen reader set in reading mode not to announce links (which is often very useful) and are not using the links list function to see what's there if that's the case.

You need to figure out what's going on at your end, as mine won't be the only links you're not seeing.
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


 

Rob,

For the purposes of your testing I am quoting the single line from my original message again, here, so you can determine whether you see the links this time:

I've loved the OCR function of Tracker Software's PDF XChange Viewer for years.  The software is free, and it supports a slew of other languages besides English.

"PDF XChange Viewer" and "slew of other languages" are the click-through links in the preceding sentence.
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Daniel Damacena
 

How do I use it? I have installed it already, but I did not understand how to use the OCR function.

Thank you so much!

Em 26/10/2020 16:19, Brian Vogel escreveu:

On Mon, Oct 26, 2020 at 02:49 PM, Daniel Damacena wrote:
As far as I know, it's not able to execute OCR. It converts directly.
-
The only way an image PDF, where the image contains text, gets converted to a text-base (or, more likely, image with separate text layer) PDF is via OCR.

I've loved the OCR function of Tracker Software's PDF XChange Viewer for years.  The software is free, and it supports a slew of other languages besides English.  Sadly, it's not accessible for its main function of reading/viewing PDF files, but the OCR process is accessible.  A couple of my clients who were graduate students found its capabilities as far as OCR goes extraordinary, as do I.  I know that OCR has improved radically over the course of the last several decades, but so far it's the most accurate I've found on some pretty crappy source image PDFs.

I need to look into whether there is a command line invocation for the OCR function that would allow output to be redirected conveniently for PDF XChange Viewer.  I never bothered to look into that.
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


 

Daniel, see this post in the topic, Free & Good OCR Software for Image Scanned PDFs, I made all the way back in January 2018.
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Louise Pfau
 

Hi Brian.  Your example links, as well as the links from the post that were originally asked about have appeared correctly.  I didn't realize they were called "click through" links.  I was confusing them with "click here" links.  I don't think I've changed any of my settings that would affect the way links are read.

Louise


 

On Tue, Oct 27, 2020 at 03:27 PM, Louise Pfau wrote:
I didn't realize they were called "click through" links.
-
Louise,

            Thanks.  The term "click through" link is mine, as the technical term, hypertext, is so seldom used and not understood by many.

             When the text itself has the link there to "click through" to see what it refers to, that's what I call them.  Perhaps I should begin using the formal term instead.

              I know that when I've been working with clients who want to actually "read an article" (or whatever) that contains hypertext they'll often disable whatever setting it is in their screen reader that announces, "Link," along with the hypertext itself since that makes narrative flow so disjointed.  I'd imagine that some may keep this off all the time and either turn it on "on demand" or bring up the list of links after having read the actual text if they suspect hypertext was present in a message/article/etc.
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Sarah k Alawami
 

I actually have mine make a sound, not nvda as that can't be done I don't think, not at the core, but voiceover makes a mid pitched click when the hyper link is started and a lower pitched click when the link is finished. For example this link would lead to google. if my markdown language didn't brake. If nvda did that, or at least had an option in the core that would be good. I might check the github as I'm following that repo.

--

Sarah Alawami, owner of TFFP. . For more info go to our website.

Check out my adventures with a shadow machine.

to subscribe to the feed click here and you can also follow us on twitter

Our discord is where you will know when we go live on twitch. Feel free to give the channel a follow and see what is up there.

For stream archives, products you can buy and more visit my main lbry page and my tffp lbry page You will also be able to buy some of my products and eBooks there.

Finally, to become a patron and help support the podcast go here

On 27 Oct 2020, at 14:25, Brian Vogel wrote:

On Tue, Oct 27, 2020 at 03:27 PM, Louise Pfau wrote:
I didn't realize they were called "click through" links.
-
Louise,

            Thanks.  The term "click through" link is mine, as the technical term, hypertext, is so seldom used and not understood by many.

             When the text itself has the link there to "click through" to see what it refers to, that's what I call them.  Perhaps I should begin using the formal term instead.

              I know that when I've been working with clients who want to actually "read an article" (or whatever) that contains hypertext they'll often disable whatever setting it is in their screen reader that announces, "Link," along with the hypertext itself since that makes narrative flow so disjointed.  I'd imagine that some may keep this off all the time and either turn it on "on demand" or bring up the list of links after having read the actual text if they suspect hypertext was present in a message/article/etc.
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Gene
 

What are the terms for links with text and links that are just an address written out?

Doesn't hyperlinks refer to both? and what is the original questioner referring to?

Gene

-----Original Message-----
From: Sarah k Alawami
Sent: Tuesday, October 27, 2020 4:47 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] NVDA Addon PDF2Text and python 3



I actually have mine make a sound, not nvda as that can't be done I don't think, not at the core, but voiceover makes a mid pitched click when the hyper link is started and a lower pitched click when the link is finished. For example this link would lead to google. if my markdown language didn't brake. If nvda did that, or at least had an option in the core that would be good. I might check the github as I'm following that repo.

--

Sarah Alawami, owner of TFFP. . For more info go to our website.

Check out my adventures with a shadow machine.

to subscribe to the feed click here and you can also follow us on twitter

Our discord is where you will know when we go live on twitch. Feel free to give the channel a follow and see what is up there.

For stream archives, products you can buy and more visit my main lbry page and my tffp lbry page You will also be able to buy some of my products and eBooks there.

Finally, to become a patron and help support the podcast go here

On 27 Oct 2020, at 14:25, Brian Vogel wrote:



On Tue, Oct 27, 2020 at 03:27 PM, Louise Pfau wrote:
I didn't realize they were called "click through" links.-
Louise,

Thanks. The term "click through" link is mine, as the technical term, hypertext, is so seldom used and not understood by many.

When the text itself has the link there to "click through" to see what it refers to, that's what I call them. Perhaps I should begin using the formal term instead.

I know that when I've been working with clients who want to actually "read an article" (or whatever) that contains hypertext they'll often disable whatever setting it is in their screen reader that announces, "Link," along with the hypertext itself since that makes narrative flow so disjointed. I'd imagine that some may keep this off all the time and either turn it on "on demand" or bring up the list of links after having read the actual text if they suspect hypertext was present in a message/article/etc.

--


Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041

It’s hard waking up and realizing it’s not always black and white.

~ Kelley Boorn


 

On Tue, Oct 27, 2020 at 07:02 PM, Gene wrote:
What are the terms for links with text and links that are just an address written out?
The first are hypertext, and the second are hyperlinks (or just links, and even many refer to hypertext, as links).

All links, whether as hypertext or hyperlinks, perform the function of directing you to a web resource.

But my much earlier point is that it is very rare these days for most writers to include "naked" hyperlinks.  Hypertext is far more common, and I use it almost exclusively, as do virtually all online publications.  I can't explain why hypertext would not have been announced by the screen reader, either, and the original poster who stated they were not has never clarified further.
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

It’s hard waking up and realizing it’s not always black and white.

     ~ Kelley Boorn

 


Gene
 

I hope the original poster gives more information. What e-mail program is being used and how messages are read may be important, as text or HTML.

Gene

-----Original Message-----
From: Brian Vogel
Sent: Tuesday, October 27, 2020 6:08 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] NVDA Addon PDF2Text and python 3

On Tue, Oct 27, 2020 at 07:02 PM, Gene wrote:
What are the terms for links with text and links that are just an address written out?The first are hypertext, and the second are hyperlinks (or just links, and even many refer to hypertext, as links).

All links, whether as hypertext or hyperlinks, perform the function of directing you to a web resource.

But my much earlier point is that it is very rare these days for most writers to include "naked" hyperlinks. Hypertext is far more common, and I use it almost exclusively, as do virtually all online publications. I can't explain why hypertext would not have been announced by the screen reader, either, and the original poster who stated they were not has never clarified further.

--


Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041

It’s hard waking up and realizing it’s not always black and white.

~ Kelley Boorn