Re: New version of TesseractOCR add-on


Rui Fontes
 

Hello!


Since January we have published several new versions...

To get the most recent, use the following link:

 https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addon


Best regards,

Rui Fontes
NVDA portuguese team


Às 15:02 de 15/07/2022, Dave Grossoehme escreveu:

Hi Rui:  When was this update written?  I have your app for my add on for NVDA, but it is dated January of 2022.  Is that the latest update?  If not, I missed the address to go to update, if it won't update, from the alt + h key in the NVDA Add on's.  I apologize for being a little late on this information.  I had in my mind that this was a different ADD On other than the Now app.  Can you be so, kind to help out here?  Thanks in advance.

Dave


On 7/13/2022 11:59 AM, Rui Fontes wrote:

Here is the documentation:


TesseractOCR
• Authors: Rui Fontes rui.fontes@... and Angelo Abrantes ampa4374@...
• Updated in 13/07/2022
• Download stable version
• Compatibility: NVDA version 2019.3 and beyond
Informations
This add-on uses the free and open source Tesseract OCR engine, to perform optical character recognition on an image file, PDF, JPG, TIF or other, without the need to open it. It also uses wia-cmd-scanner to be able to access WIA enabled scanners and perform OCR to a paper document. In the NVDA menu, Preferences, a TesseractOCR section is added, where you can configure the languages to be used in recognition and the type of documents to be recognized. With the exception of English and Portuguese, which are already included in add-on, the other languages will be downloaded and installed when you select a language that does not already exist in the add-on. Note that as the number of selected recognition languages increases, the OCR process will take longer. We therefore recommend that you use only the languages you need. Note also that the quality of recognition may vary according to the order of languages. Therefore, if the recognition result is not satisfactory, you may want to try another language ordering.
Shortcut
The default commands are: Windows+Control+r - to recognize the selected document; Windows+Control+w - to scan and recognize a document through the scanner.
Then just wait that ocr.txt opens with the recognized text. If you want to preserve the recognized text, don't forget to save the document under another name and in another location, as all files in the temporary directory are deleted at the start of the next OCR process!
This commands can be modified in the "Input gestures" dialog in the "TesseractOCR" section.
Automatic update
This add-on includes an automatic update feature. The check for a new version will be executed everytime NVDA is loaded. If you want this, go to NVDA, Preferences, Options and in the add-on category check the check box.
Known problems
• In some systems it is possible that add-on do not work due to a comtypes error... In some machines it is enough going to the temp folder, and deleting the comtypes_cache folder.
• When selecting the "Various" option in the "Documents type" combobox, the recognized text probably appear with many blank lines This is a known problem with Tesseract, and, without consumming lots of processing time, I haven't yet found any solution. But, I still haven't given up!
Languages supported
The supported languages in this version are: Afrikans Albanian Amharik Arabic Armenian Assamese Azerbaijani (Latin) Basque Belarusian Bengali Bosnian Breton Bulgarian Burnese Catalan/Valencian Cebuano Cherokee Chinese simplified Chinese traditional Corsican Croatian Czech Dannish Deutch Dhivehi Dutch (Flemish) Dzongkha English Esperanto Estonian Faroese Filipino Finnish French Galician Georgian Greek Gujarati Haitian Hebrew Hindi Hungarian Icelandic Indonesian Inuktitut Irish Italian Javanese Japanese Kannada Kazakh Khmer (Central) Kirghiz Korean Kurdish Kurmanji Lao Latin Lativia Lituanian Luxembourgish Macedonian Malay Malayalam Maltese Maori Marathi Math / equation detection module Mongolian Nepali Norwegian Occitan Oriya Panjabi Pashto Persian Polish Portuguese Quechua Romanian/Moldave Russian Sanskrit Scottish Gaelic Serbian (Latin) Slovak) Slovenian) Sindhi Sinhalese Spanish Sundanese Swahili Swedish Syriac Tajik Tamil Tatar Telugu Thai Tibetan Tigrinya Tonga Turkish Uighur Ukrainian Urdu Uzbek (Latin) Vietnamese Welsh West Frisian Yiddish Yoruba
Image types supported
This add-on supports the following types of files: PDF jpg tif png bmp pnm pbm pgm jp2 gif jfif jpeg tiff spix webp


Best regards,

Rui Fontes
NVDA portuguese team


Às 15:09 de 13/07/2022, Anthony tom escreveu:
How is it used?

Join nvda@nvda.groups.io to automatically receive all group messages.