Here is the documentation:
TesseractOCR
• Authors: Rui Fontes rui.fontes@... and
Angelo Abrantes ampa4374@...
• Updated in 13/07/2022
• Download stable version
• Compatibility: NVDA version 2019.3 and beyond
Informations
This add-on uses the free and open source Tesseract OCR engine,
to perform optical character recognition on an image file, PDF,
JPG, TIF or other, without the need to open it. It also uses
wia-cmd-scanner to be able to access WIA enabled scanners and
perform OCR to a paper document. In the NVDA menu, Preferences,
a TesseractOCR section is added, where you can configure the
languages to be used in recognition and the type of documents to
be recognized. With the exception of English and Portuguese,
which are already included in add-on, the other languages will
be downloaded and installed when you select a language that does
not already exist in the add-on. Note that as the number of
selected recognition languages increases, the OCR process will
take longer. We therefore recommend that you use only the
languages you need. Note also that the quality of recognition
may vary according to the order of languages. Therefore, if the
recognition result is not satisfactory, you may want to try
another language ordering.
Shortcut
The default commands are: Windows+Control+r - to recognize the
selected document; Windows+Control+w - to scan and recognize a
document through the scanner.
Then just wait that ocr.txt opens with the recognized text. If
you want to preserve the recognized text, don't forget to save
the document under another name and in another location, as all
files in the temporary directory are deleted at the start of the
next OCR process!
This commands can be modified in the "Input gestures" dialog in
the "TesseractOCR" section.
Automatic update
This add-on includes an automatic update feature. The check for
a new version will be executed everytime NVDA is loaded. If you
want this, go to NVDA, Preferences, Options and in the add-on
category check the check box.
Known problems
• In some systems it is possible that add-on do not work due to
a comtypes error... In some machines it is enough going to the
temp folder, and deleting the comtypes_cache folder.
• When selecting the "Various" option in the "Documents type"
combobox, the recognized text probably appear with many blank
lines This is a known problem with Tesseract, and, without
consumming lots of processing time, I haven't yet found any
solution. But, I still haven't given up!
Languages supported
The supported languages in this version are: Afrikans Albanian
Amharik Arabic Armenian Assamese Azerbaijani (Latin) Basque
Belarusian Bengali Bosnian Breton Bulgarian Burnese
Catalan/Valencian Cebuano Cherokee Chinese simplified Chinese
traditional Corsican Croatian Czech Dannish Deutch Dhivehi Dutch
(Flemish) Dzongkha English Esperanto Estonian Faroese Filipino
Finnish French Galician Georgian Greek Gujarati Haitian Hebrew
Hindi Hungarian Icelandic Indonesian Inuktitut Irish Italian
Javanese Japanese Kannada Kazakh Khmer (Central) Kirghiz Korean
Kurdish Kurmanji Lao Latin Lativia Lituanian Luxembourgish
Macedonian Malay Malayalam Maltese Maori Marathi Math / equation
detection module Mongolian Nepali Norwegian Occitan Oriya
Panjabi Pashto Persian Polish Portuguese Quechua
Romanian/Moldave Russian Sanskrit Scottish Gaelic Serbian
(Latin) Slovak) Slovenian) Sindhi Sinhalese Spanish Sundanese
Swahili Swedish Syriac Tajik Tamil Tatar Telugu Thai Tibetan
Tigrinya Tonga Turkish Uighur Ukrainian Urdu Uzbek (Latin)
Vietnamese Welsh West Frisian Yiddish Yoruba
Image types supported
This add-on supports the following types of files: PDF jpg tif
png bmp pnm pbm pgm jp2 gif jfif jpeg tiff spix webp
Best regards,
Rui Fontes
NVDA portuguese team
Às 15:09 de 13/07/2022, Anthony tom
escreveu:
How is it
used?