Have you tried the new version of TesseractOCR add-on?
I performed the OCR to the file in the issue, and I sent you the
file and the result...
Best regards,
Rui Fontes
NVDA portuguese team
Às 21:18 de 30/08/2022, nvdainth@... escreveu:
toggle quoted message
Show quoted text
|
|
Hi Rui Fontes
I have same essue in this topic for Thai language https://github.com/tesseract-ocr/tesseract/issues/2702
how to change alittle of code to fix it in your add-on?
|
|
toggle quoted message
Show quoted text
Hi Rui: When was this update written? I have your app for my
add on for NVDA, but it is dated January of 2022. Is that the
latest update? If not, I missed the address to go to update, if
it won't update, from the alt + h key in the NVDA Add on's. I
apologize for being a little late on this information. I had in
my mind that this was a different ADD On other than the Now
app. Can you be so, kind to help out here? Thanks in advance.
Dave
On 7/13/2022 11:59 AM, Rui Fontes
wrote:
Here is the documentation:
TesseractOCR
• Authors: Rui Fontes rui.fontes@... and
Angelo Abrantes ampa4374@...
• Updated in 13/07/2022
• Download stable version
• Compatibility: NVDA version 2019.3 and beyond
Informations
This add-on uses the free and open source Tesseract OCR
engine, to perform optical character recognition on an image
file, PDF, JPG, TIF or other, without the need to open it. It
also uses wia-cmd-scanner to be able to access WIA enabled
scanners and perform OCR to a paper document. In the NVDA
menu, Preferences, a TesseractOCR section is added, where you
can configure the languages to be used in recognition and the
type of documents to be recognized. With the exception of
English and Portuguese, which are already included in add-on,
the other languages will be downloaded and installed when you
select a language that does not already exist in the add-on.
Note that as the number of selected recognition languages
increases, the OCR process will take longer. We therefore
recommend that you use only the languages you need. Note also
that the quality of recognition may vary according to the
order of languages. Therefore, if the recognition result is
not satisfactory, you may want to try another language
ordering.
Shortcut
The default commands are: Windows+Control+r - to recognize the
selected document; Windows+Control+w - to scan and recognize a
document through the scanner.
Then just wait that ocr.txt opens with the recognized text. If
you want to preserve the recognized text, don't forget to save
the document under another name and in another location, as
all files in the temporary directory are deleted at the start
of the next OCR process!
This commands can be modified in the "Input gestures" dialog
in the "TesseractOCR" section.
Automatic update
This add-on includes an automatic update feature. The check
for a new version will be executed everytime NVDA is loaded.
If you want this, go to NVDA, Preferences, Options and in the
add-on category check the check box.
Known problems
• In some systems it is possible that add-on do not work due
to a comtypes error... In some machines it is enough going to
the temp folder, and deleting the comtypes_cache folder.
• When selecting the "Various" option in the "Documents type"
combobox, the recognized text probably appear with many blank
lines This is a known problem with Tesseract, and, without
consumming lots of processing time, I haven't yet found any
solution. But, I still haven't given up!
Languages supported
The supported languages in this version are: Afrikans Albanian
Amharik Arabic Armenian Assamese Azerbaijani (Latin) Basque
Belarusian Bengali Bosnian Breton Bulgarian Burnese
Catalan/Valencian Cebuano Cherokee Chinese simplified Chinese
traditional Corsican Croatian Czech Dannish Deutch Dhivehi
Dutch (Flemish) Dzongkha English Esperanto Estonian Faroese
Filipino Finnish French Galician Georgian Greek Gujarati
Haitian Hebrew Hindi Hungarian Icelandic Indonesian Inuktitut
Irish Italian Javanese Japanese Kannada Kazakh Khmer (Central)
Kirghiz Korean Kurdish Kurmanji Lao Latin Lativia Lituanian
Luxembourgish Macedonian Malay Malayalam Maltese Maori Marathi
Math / equation detection module Mongolian Nepali Norwegian
Occitan Oriya Panjabi Pashto Persian Polish Portuguese Quechua
Romanian/Moldave Russian Sanskrit Scottish Gaelic Serbian
(Latin) Slovak) Slovenian) Sindhi Sinhalese Spanish Sundanese
Swahili Swedish Syriac Tajik Tamil Tatar Telugu Thai Tibetan
Tigrinya Tonga Turkish Uighur Ukrainian Urdu Uzbek (Latin)
Vietnamese Welsh West Frisian Yiddish Yoruba
Image types supported
This add-on supports the following types of files: PDF jpg tif
png bmp pnm pbm pgm jp2 gif jfif jpeg tiff spix webp
Best regards,
Rui Fontes
NVDA portuguese team
Às 15:09 de 13/07/2022, Anthony tom
escreveu:
How is it
used?
|
|
Hi Rui: When was this update written? I have your app for my
add on for NVDA, but it is dated January of 2022. Is that the
latest update? If not, I missed the address to go to update, if
it won't update, from the alt + h key in the NVDA Add on's. I
apologize for being a little late on this information. I had in
my mind that this was a different ADD On other than the Now app.
Can you be so, kind to help out here? Thanks in advance.
Dave
On 7/13/2022 11:59 AM, Rui Fontes
wrote:
toggle quoted message
Show quoted text
Here is the documentation:
TesseractOCR
• Authors: Rui Fontes rui.fontes@... and
Angelo Abrantes ampa4374@...
• Updated in 13/07/2022
• Download stable version
• Compatibility: NVDA version 2019.3 and beyond
Informations
This add-on uses the free and open source Tesseract OCR engine,
to perform optical character recognition on an image file, PDF,
JPG, TIF or other, without the need to open it. It also uses
wia-cmd-scanner to be able to access WIA enabled scanners and
perform OCR to a paper document. In the NVDA menu, Preferences,
a TesseractOCR section is added, where you can configure the
languages to be used in recognition and the type of documents to
be recognized. With the exception of English and Portuguese,
which are already included in add-on, the other languages will
be downloaded and installed when you select a language that does
not already exist in the add-on. Note that as the number of
selected recognition languages increases, the OCR process will
take longer. We therefore recommend that you use only the
languages you need. Note also that the quality of recognition
may vary according to the order of languages. Therefore, if the
recognition result is not satisfactory, you may want to try
another language ordering.
Shortcut
The default commands are: Windows+Control+r - to recognize the
selected document; Windows+Control+w - to scan and recognize a
document through the scanner.
Then just wait that ocr.txt opens with the recognized text. If
you want to preserve the recognized text, don't forget to save
the document under another name and in another location, as all
files in the temporary directory are deleted at the start of the
next OCR process!
This commands can be modified in the "Input gestures" dialog in
the "TesseractOCR" section.
Automatic update
This add-on includes an automatic update feature. The check for
a new version will be executed everytime NVDA is loaded. If you
want this, go to NVDA, Preferences, Options and in the add-on
category check the check box.
Known problems
• In some systems it is possible that add-on do not work due to
a comtypes error... In some machines it is enough going to the
temp folder, and deleting the comtypes_cache folder.
• When selecting the "Various" option in the "Documents type"
combobox, the recognized text probably appear with many blank
lines This is a known problem with Tesseract, and, without
consumming lots of processing time, I haven't yet found any
solution. But, I still haven't given up!
Languages supported
The supported languages in this version are: Afrikans Albanian
Amharik Arabic Armenian Assamese Azerbaijani (Latin) Basque
Belarusian Bengali Bosnian Breton Bulgarian Burnese
Catalan/Valencian Cebuano Cherokee Chinese simplified Chinese
traditional Corsican Croatian Czech Dannish Deutch Dhivehi Dutch
(Flemish) Dzongkha English Esperanto Estonian Faroese Filipino
Finnish French Galician Georgian Greek Gujarati Haitian Hebrew
Hindi Hungarian Icelandic Indonesian Inuktitut Irish Italian
Javanese Japanese Kannada Kazakh Khmer (Central) Kirghiz Korean
Kurdish Kurmanji Lao Latin Lativia Lituanian Luxembourgish
Macedonian Malay Malayalam Maltese Maori Marathi Math / equation
detection module Mongolian Nepali Norwegian Occitan Oriya
Panjabi Pashto Persian Polish Portuguese Quechua
Romanian/Moldave Russian Sanskrit Scottish Gaelic Serbian
(Latin) Slovak) Slovenian) Sindhi Sinhalese Spanish Sundanese
Swahili Swedish Syriac Tajik Tamil Tatar Telugu Thai Tibetan
Tigrinya Tonga Turkish Uighur Ukrainian Urdu Uzbek (Latin)
Vietnamese Welsh West Frisian Yiddish Yoruba
Image types supported
This add-on supports the following types of files: PDF jpg tif
png bmp pnm pbm pgm jp2 gif jfif jpeg tiff spix webp
Best regards,
Rui Fontes
NVDA portuguese team
Às 15:09 de 13/07/2022, Anthony tom
escreveu:
How is it
used?
|
|
Yes, you can see in documentation the several languages it support:
Afrikans Albanian Amharik Arabic Armenian Assamese Azerbaijani (Latin) Basque Belarusian Bengali Bosnian Breton Bulgarian Burnese Catalan/Valencian Cebuano Cherokee Chinese simplified Chinese traditional Corsican Croatian Czech Dannish Deutch Dhivehi Dutch (Flemish) Dzongkha English Esperanto Estonian Faroese Filipino Finnish French Galician Georgian Greek Gujarati Haitian Hebrew Hindi Hungarian Icelandic Indonesian Inuktitut Irish Italian Javanese Japanese Kannada Kazakh Khmer (Central) Kirghiz Korean Kurdish Kurmanji Lao Latin Lativia Lituanian Luxembourgish Macedonian Malay Malayalam Maltese Maori Marathi Math / equation detection module Mongolian Nepali Norwegian Occitan Oriya Panjabi Pashto Persian Polish Portuguese Quechua Romanian/Moldave Russian Sanskrit Scottish Gaelic Serbian (Latin) Slovak) Slovenian) Sindhi Sinhalese Spanish Sundanese Swahili Swedish Syriac Tajik Tamil Tatar Telugu Thai Tibetan Tigrinya Tonga Turkish Uighur Ukrainian Urdu Uzbek (Latin) Vietnamese Welsh West Frisian Yiddish Yoruba
Best regards,
Rui Fontes NVDA portuguese team
Às 08:19 de 15/07/2022, mukesh jain escreveu:
toggle quoted message
Show quoted text
hello, does it support Hindi language? thanks, Mukesh
On 7/14/22, Rui Fontes <rui.fontes@...> wrote:
Hello!
Yes, you should select Tamil, and if necessary, other languages, and place Tamil in first place.
Regarding your problem navegating the results, it is strange since it is a normal text file in the NotePad application...
I suppose the threading problem updating is already solved...
Waiting for yours future observations...
Best regards,
Rui Fontes NVDA portuguese team
Às 15:24 de 14/07/2022, Ravindran V.S. escreveu:
Hello,
Thank you for clarifying. So, it will automatically pick the language in the selected language list In the order. Mean-wile, I just encountered a new problem now, when I started my PC. Once the Windows loaded, NVDA startup sound came, but no voice to speak. Tried to restart the NVDA few times with the shortcut key.(Ctrl+Alt+N), same result. Then loaded alternative screen reader, and it announced " New version of TesseractOCR add-on is available; do you want to install? Yes/ No" When I clicked "Yes" took a short while but NVDA did not restart. But the voice started to speak. Repeated to restart NVDA,the result was as previous. Then saw this link and updated via this download. Now no issue. Only this afternoon, before shutting the PC, I checked the checkbox to check for new updates in the TesseractOCR add-on settings in NVDA. I use Windows 10 64bit; NVDA 2022.1; Vocalizer voices. Also, the OCR results in Tamil language seems to be bit unclear. I mean, after the OCR the results does not seems to be in an order to make the sentences meaningful. And moving to them with the normal cursor is difficult. Have to use the Object navigation. This is my initial experience. Wil give it more attempts and confirm. Please advice if am I missing anything. Thanks, Ravi. V.S.Ravindran. Excuses leads to failure!””
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Thursday, July 14, 2022 5:39 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on
Hello!
It is already available a new version, 2022.07.13.
The change log is:
- Corrected the threading for the update routine; - Updated turkish translation; - Small code corrections...
The direct link is:
https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addon
In the NVDA, Preferences, Options, you will find a TesseractOCR section.
There you can select the languages to be used in the recognition process and its order...
Best regards,
Rui Fontes NVDA portuguese team
Às 06:24 de 14/07/2022, Ravindran V.S. escreveu:
Hello, Just a question about the below : - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;""
How can we select the second language? Where are these options please? I have added the required second language(Tamil) in the list. I am running Win 10 64 bit; NVDA 2022.1; TesseractOCR add-on v: 2022.07(downloaded from the direct link you shared.)
Thanks, Ravi V.S.Ravindran. Excuses leads to failure!””
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Wednesday, July 13, 2022 4:56 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on
Hello!
From 2022.06 to 2022.06.27:
- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation.
From 2022.06.27 to 2022.07:
- Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian.
Best regards,
Rui Fontes NVDA portuguese team
Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones? Brian
|
|
hello, does it support Hindi language? thanks, Mukesh
toggle quoted message
Show quoted text
On 7/14/22, Rui Fontes <rui.fontes@...> wrote: Hello!
Yes, you should select Tamil, and if necessary, other languages, and place Tamil in first place.
Regarding your problem navegating the results, it is strange since it is a normal text file in the NotePad application...
I suppose the threading problem updating is already solved...
Waiting for yours future observations...
Best regards,
Rui Fontes NVDA portuguese team
Às 15:24 de 14/07/2022, Ravindran V.S. escreveu:
Hello,
Thank you for clarifying. So, it will automatically pick the language in the selected language list In the order. Mean-wile, I just encountered a new problem now, when I started my PC. Once the Windows loaded, NVDA startup sound came, but no voice to speak. Tried to restart the NVDA few times with the shortcut key.(Ctrl+Alt+N), same result. Then loaded alternative screen reader, and it announced " New version of TesseractOCR add-on is available; do you want to install? Yes/ No" When I clicked "Yes" took a short while but NVDA did not restart. But the voice started to speak. Repeated to restart NVDA,the result was as previous. Then saw this link and updated via this download. Now no issue. Only this afternoon, before shutting the PC, I checked the checkbox to check for new updates in the TesseractOCR add-on settings in NVDA. I use Windows 10 64bit; NVDA 2022.1; Vocalizer voices. Also, the OCR results in Tamil language seems to be bit unclear. I mean, after the OCR the results does not seems to be in an order to make the sentences meaningful. And moving to them with the normal cursor is difficult. Have to use the Object navigation. This is my initial experience. Wil give it more attempts and confirm. Please advice if am I missing anything. Thanks, Ravi. V.S.Ravindran. Excuses leads to failure!””
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Thursday, July 14, 2022 5:39 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on
Hello!
It is already available a new version, 2022.07.13.
The change log is:
- Corrected the threading for the update routine; - Updated turkish translation; - Small code corrections...
The direct link is:
https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addon
In the NVDA, Preferences, Options, you will find a TesseractOCR section.
There you can select the languages to be used in the recognition process and its order...
Best regards,
Rui Fontes NVDA portuguese team
Às 06:24 de 14/07/2022, Ravindran V.S. escreveu:
Hello, Just a question about the below : - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;""
How can we select the second language? Where are these options please? I have added the required second language(Tamil) in the list. I am running Win 10 64 bit; NVDA 2022.1; TesseractOCR add-on v: 2022.07(downloaded from the direct link you shared.)
Thanks, Ravi V.S.Ravindran. Excuses leads to failure!””
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Wednesday, July 13, 2022 4:56 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on
Hello!
From 2022.06 to 2022.06.27:
- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation.
From 2022.06.27 to 2022.07:
- Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian.
Best regards,
Rui Fontes NVDA portuguese team
Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones? Brian
|
|
Hello!
Yes, you should select Tamil, and if necessary, other languages, and place Tamil in first place.
Regarding your problem navegating the results, it is strange since it is a normal text file in the NotePad application...
I suppose the threading problem updating is already solved...
Waiting for yours future observations...
Best regards,
Rui Fontes NVDA portuguese team
Às 15:24 de 14/07/2022, Ravindran V.S. escreveu:
toggle quoted message
Show quoted text
Hello,
Thank you for clarifying. So, it will automatically pick the language in the selected language list In the order. Mean-wile, I just encountered a new problem now, when I started my PC. Once the Windows loaded, NVDA startup sound came, but no voice to speak. Tried to restart the NVDA few times with the shortcut key.(Ctrl+Alt+N), same result. Then loaded alternative screen reader, and it announced " New version of TesseractOCR add-on is available; do you want to install? Yes/ No" When I clicked "Yes" took a short while but NVDA did not restart. But the voice started to speak. Repeated to restart NVDA,the result was as previous. Then saw this link and updated via this download. Now no issue. Only this afternoon, before shutting the PC, I checked the checkbox to check for new updates in the TesseractOCR add-on settings in NVDA. I use Windows 10 64bit; NVDA 2022.1; Vocalizer voices. Also, the OCR results in Tamil language seems to be bit unclear. I mean, after the OCR the results does not seems to be in an order to make the sentences meaningful. And moving to them with the normal cursor is difficult. Have to use the Object navigation. This is my initial experience. Wil give it more attempts and confirm. Please advice if am I missing anything. Thanks, Ravi. V.S.Ravindran. Excuses leads to failure!””
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Thursday, July 14, 2022 5:39 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on
Hello!
It is already available a new version, 2022.07.13.
The change log is:
- Corrected the threading for the update routine; - Updated turkish translation; - Small code corrections...
The direct link is:
https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addon
In the NVDA, Preferences, Options, you will find a TesseractOCR section.
There you can select the languages to be used in the recognition process and its order...
Best regards,
Rui Fontes NVDA portuguese team
Às 06:24 de 14/07/2022, Ravindran V.S. escreveu:
Hello, Just a question about the below : - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;""
How can we select the second language? Where are these options please? I have added the required second language(Tamil) in the list. I am running Win 10 64 bit; NVDA 2022.1; TesseractOCR add-on v: 2022.07(downloaded from the direct link you shared.) Thanks, Ravi V.S.Ravindran. Excuses leads to failure!””
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Wednesday, July 13, 2022 4:56 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on
Hello!
From 2022.06 to 2022.06.27:
- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation.
From 2022.06.27 to 2022.07:
- Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian.
Best regards,
Rui Fontes NVDA portuguese team
Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones? Brian
|
|
Hello,
Thank you for clarifying. So, it will automatically pick the language in the selected language list In the order. Mean-wile, I just encountered a new problem now, when I started my PC. Once the Windows loaded, NVDA startup sound came, but no voice to speak. Tried to restart the NVDA few times with the shortcut key.(Ctrl+Alt+N), same result. Then loaded alternative screen reader, and it announced " New version of TesseractOCR add-on is available; do you want to install? Yes/ No" When I clicked "Yes" took a short while but NVDA did not restart. But the voice started to speak. Repeated to restart NVDA,the result was as previous. Then saw this link and updated via this download. Now no issue. Only this afternoon, before shutting the PC, I checked the checkbox to check for new updates in the TesseractOCR add-on settings in NVDA. I use Windows 10 64bit; NVDA 2022.1; Vocalizer voices. Also, the OCR results in Tamil language seems to be bit unclear. I mean, after the OCR the results does not seems to be in an order to make the sentences meaningful. And moving to them with the normal cursor is difficult. Have to use the Object navigation. This is my initial experience. Wil give it more attempts and confirm. Please advice if am I missing anything. Thanks, Ravi. V.S.Ravindran. Excuses leads to failure!””
toggle quoted message
Show quoted text
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Thursday, July 14, 2022 5:39 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on Hello! It is already available a new version, 2022.07.13. The change log is: - Corrected the threading for the update routine; - Updated turkish translation; - Small code corrections... The direct link is: https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addonIn the NVDA, Preferences, Options, you will find a TesseractOCR section. There you can select the languages to be used in the recognition process and its order... Best regards, Rui Fontes NVDA portuguese team Às 06:24 de 14/07/2022, Ravindran V.S. escreveu: Hello, Just a question about the below : - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;""
How can we select the second language? Where are these options please? I have added the required second language(Tamil) in the list. I am running Win 10 64 bit; NVDA 2022.1; TesseractOCR add-on v: 2022.07(downloaded from the direct link you shared.) Thanks, Ravi V.S.Ravindran. Excuses leads to failure!””
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Wednesday, July 13, 2022 4:56 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on
Hello!
From 2022.06 to 2022.06.27:
- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation.
From 2022.06.27 to 2022.07:
- Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian.
Best regards,
Rui Fontes NVDA portuguese team
Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones? Brian
|
|
Hello! It is already available a new version, 2022.07.13. The change log is: - Corrected the threading for the update routine; - Updated turkish translation; - Small code corrections... The direct link is: https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addonIn the NVDA, Preferences, Options, you will find a TesseractOCR section. There you can select the languages to be used in the recognition process and its order... Best regards, Rui Fontes NVDA portuguese team Às 06:24 de 14/07/2022, Ravindran V.S. escreveu:
toggle quoted message
Show quoted text
Hello, Just a question about the below : - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;""
How can we select the second language? Where are these options please? I have added the required second language(Tamil) in the list. I am running Win 10 64 bit; NVDA 2022.1; TesseractOCR add-on v: 2022.07(downloaded from the direct link you shared.) Thanks, Ravi V.S.Ravindran. Excuses leads to failure!””
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Wednesday, July 13, 2022 4:56 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on
Hello!
From 2022.06 to 2022.06.27:
- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation.
From 2022.06.27 to 2022.07:
- Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian.
Best regards,
Rui Fontes NVDA portuguese team
Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones? Brian
|
|
That is not possible since we do not know how long the process will take...
Best regards,
Rui Fontes NVDA portuguese team
Às 06:43 de 14/07/2022, Ravindran V.S. escreveu:
toggle quoted message
Show quoted text
Hello,
- Introduced beeps to signal the add-on is working;"" This is a great and useful addition. Keeps us informed that the OCR is running. Better if it could be a progress beep like in NVDA. So that we could be aware what percentage is completed.
Thanks, Ravi. V.S.Ravindran. Excuses leads to failure!””
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Wednesday, July 13, 2022 4:56 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on
Hello!
From 2022.06 to 2022.06.27:
- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation.
From 2022.06.27 to 2022.07:
- Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian.
Best regards,
Rui Fontes NVDA portuguese team
Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones? Brian
|
|
Hello!
Since version 2022.06.27, it uses Tesseract 5.1 for 32-bit...
So, it is very strange...
By the way, you can update to version 2022.07.13, already released...
The changes are:
- Corrected the threading for the update routine; - Updated turkish translation; - Small code corrections...
Best regards,
Rui Fontes NVDA portuguese team
Às 02:23 de 14/07/2022, mk360 escreveu:
toggle quoted message
Show quoted text
Hi,
About the version, is 2022.7, but it happened when I installed the version that uses Sesseract 5.1 because I run it in a 32 b system.
Note that the english language worked when I installed that version, but spanish doesn't.
El 13/07/2022 a las 21:14, Rui Fontes escribió:
Hello!
Wich version of tesseractOCR?
Your system is 32 or 64-bit?
Cumprimentos,
Rui Fontes Equipa portuguesa do NVDA
Às 00:49 de 14/07/2022, mk360 escreveu:
Hi,
Two problems here:
If I set spanish as my preffered language I gives an error and never display ocr.txt, this is the log using control windows r:
ERROR - stderr (19:42:00.880) - Thread-22 (5292): Exception in thread Thread-22: Traceback (most recent call last): File "threading.pyc", line 926, in _bootstrap_inner File "threading.pyc", line 870, in run File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 176, in _doRoutines self.convertPDFToPNG() File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 113, in convertPDFToPNG self.backgroundProcessing(command) File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 103, in backgroundProcessing p = subprocess.Popen(self.command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, startupinfo=si) File "subprocess.pyc", line 800, in __init__ File "subprocess.pyc", line 1207, in _execute_child OSError: [WinError 216] Esta versión de %1 no es compatible con la versión de Windows que está ejecutando. Compruebe la información de sistema del equipo para consultar si necesita una versión x86 (32 bits) o x64 (64 bits) del programa, y después póngase en contacto con el editor del software
Also, sometimes when I start NVDA it never speak and I need to restart it, here is the log when it start finally with voice feedback:
NVDA initialized ERROR - unhandled exception (19:46:35.521) - MainThread (2396): Traceback (most recent call last): File "wx\core.pyc", line 3407, in <lambda> File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\update.py", line 80, in upgradeVerify r = urllib.request.urlopen(p).read() File "urllib\request.pyc", line 222, in urlopen File "urllib\request.pyc", line 525, in open File "urllib\request.pyc", line 543, in _open File "urllib\request.pyc", line 503, in _call_chain File "urllib\request.pyc", line 1393, in https_open File "urllib\request.pyc", line 1350, in do_open File "http\client.pyc", line 1277, in request File "http\client.pyc", line 1323, in _send_request File "http\client.pyc", line 1272, in endheaders File "http\client.pyc", line 1032, in _send_output File "http\client.pyc", line 972, in send File "http\client.pyc", line 1439, in connect File "http\client.pyc", line 944, in connect File "socket.pyc", line 707, in create_connection File "socket.pyc", line 752, in getaddrinfo LookupError: unknown encoding: idna
El 13/07/2022 a las 7:26, Rui Fontes escribió:
Hello!
From 2022.06 to 2022.06.27:
- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation.
From 2022.06.27 to 2022.07:
- Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian.
Best regards,
Rui Fontes NVDA portuguese team
Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones? Brian
|
|
Hello,
- Introduced beeps to signal the add-on is working;"" This is a great and useful addition. Keeps us informed that the OCR is running. Better if it could be a progress beep like in NVDA. So that we could be aware what percentage is completed.
Thanks, Ravi. V.S.Ravindran. Excuses leads to failure!””
toggle quoted message
Show quoted text
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Wednesday, July 13, 2022 4:56 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on Hello! From 2022.06 to 2022.06.27: - Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation. From 2022.06.27 to 2022.07: - Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian. Best regards, Rui Fontes NVDA portuguese team Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu: What is the difference between the old and new ones? Brian
|
|
Hello, Just a question about the below : - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;""
How can we select the second language? Where are these options please? I have added the required second language(Tamil) in the list. I am running Win 10 64 bit; NVDA 2022.1; TesseractOCR add-on v: 2022.07(downloaded from the direct link you shared.) Thanks, Ravi V.S.Ravindran. Excuses leads to failure!””
toggle quoted message
Show quoted text
-----Original Message----- From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes Sent: Wednesday, July 13, 2022 4:56 PM To: nvda@nvda.groups.io Subject: Re: [nvda] New version of TesseractOCR add-on Hello! From 2022.06 to 2022.06.27: - Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation. From 2022.06.27 to 2022.07: - Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian. Best regards, Rui Fontes NVDA portuguese team Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu: What is the difference between the old and new ones? Brian
|
|
Hi,
About the version, is 2022.7, but it happened when I installed the version that uses Sesseract 5.1 because I run it in a 32 b system.
Note that the english language worked when I installed that version, but spanish doesn't.
toggle quoted message
Show quoted text
El 13/07/2022 a las 21:14, Rui Fontes escribió: Hello!
Wich version of tesseractOCR?
Your system is 32 or 64-bit?
Cumprimentos,
Rui Fontes Equipa portuguesa do NVDA
Às 00:49 de 14/07/2022, mk360 escreveu:
Hi,
Two problems here:
If I set spanish as my preffered language I gives an error and never display ocr.txt, this is the log using control windows r:
ERROR - stderr (19:42:00.880) - Thread-22 (5292): Exception in thread Thread-22: Traceback (most recent call last): File "threading.pyc", line 926, in _bootstrap_inner File "threading.pyc", line 870, in run File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 176, in _doRoutines self.convertPDFToPNG() File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 113, in convertPDFToPNG self.backgroundProcessing(command) File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 103, in backgroundProcessing p = subprocess.Popen(self.command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, startupinfo=si) File "subprocess.pyc", line 800, in __init__ File "subprocess.pyc", line 1207, in _execute_child OSError: [WinError 216] Esta versión de %1 no es compatible con la versión de Windows que está ejecutando. Compruebe la información de sistema del equipo para consultar si necesita una versión x86 (32 bits) o x64 (64 bits) del programa, y después póngase en contacto con el editor del software
Also, sometimes when I start NVDA it never speak and I need to restart it, here is the log when it start finally with voice feedback:
NVDA initialized ERROR - unhandled exception (19:46:35.521) - MainThread (2396): Traceback (most recent call last): File "wx\core.pyc", line 3407, in <lambda> File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\update.py", line 80, in upgradeVerify r = urllib.request.urlopen(p).read() File "urllib\request.pyc", line 222, in urlopen File "urllib\request.pyc", line 525, in open File "urllib\request.pyc", line 543, in _open File "urllib\request.pyc", line 503, in _call_chain File "urllib\request.pyc", line 1393, in https_open File "urllib\request.pyc", line 1350, in do_open File "http\client.pyc", line 1277, in request File "http\client.pyc", line 1323, in _send_request File "http\client.pyc", line 1272, in endheaders File "http\client.pyc", line 1032, in _send_output File "http\client.pyc", line 972, in send File "http\client.pyc", line 1439, in connect File "http\client.pyc", line 944, in connect File "socket.pyc", line 707, in create_connection File "socket.pyc", line 752, in getaddrinfo LookupError: unknown encoding: idna
El 13/07/2022 a las 7:26, Rui Fontes escribió:
Hello!
From 2022.06 to 2022.06.27:
- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation.
From 2022.06.27 to 2022.07:
- Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian.
Best regards,
Rui Fontes NVDA portuguese team
Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones? Brian
|
|
Hello!
Wich version of tesseractOCR?
Your system is 32 or 64-bit?
Cumprimentos,
Rui Fontes Equipa portuguesa do NVDA
Às 00:49 de 14/07/2022, mk360 escreveu:
toggle quoted message
Show quoted text
Hi,
Two problems here:
If I set spanish as my preffered language I gives an error and never display ocr.txt, this is the log using control windows r:
ERROR - stderr (19:42:00.880) - Thread-22 (5292): Exception in thread Thread-22: Traceback (most recent call last): File "threading.pyc", line 926, in _bootstrap_inner File "threading.pyc", line 870, in run File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 176, in _doRoutines self.convertPDFToPNG() File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 113, in convertPDFToPNG self.backgroundProcessing(command) File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 103, in backgroundProcessing p = subprocess.Popen(self.command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, startupinfo=si) File "subprocess.pyc", line 800, in __init__ File "subprocess.pyc", line 1207, in _execute_child OSError: [WinError 216] Esta versión de %1 no es compatible con la versión de Windows que está ejecutando. Compruebe la información de sistema del equipo para consultar si necesita una versión x86 (32 bits) o x64 (64 bits) del programa, y después póngase en contacto con el editor del software
Also, sometimes when I start NVDA it never speak and I need to restart it, here is the log when it start finally with voice feedback:
NVDA initialized ERROR - unhandled exception (19:46:35.521) - MainThread (2396): Traceback (most recent call last): File "wx\core.pyc", line 3407, in <lambda> File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\update.py", line 80, in upgradeVerify r = urllib.request.urlopen(p).read() File "urllib\request.pyc", line 222, in urlopen File "urllib\request.pyc", line 525, in open File "urllib\request.pyc", line 543, in _open File "urllib\request.pyc", line 503, in _call_chain File "urllib\request.pyc", line 1393, in https_open File "urllib\request.pyc", line 1350, in do_open File "http\client.pyc", line 1277, in request File "http\client.pyc", line 1323, in _send_request File "http\client.pyc", line 1272, in endheaders File "http\client.pyc", line 1032, in _send_output File "http\client.pyc", line 972, in send File "http\client.pyc", line 1439, in connect File "http\client.pyc", line 944, in connect File "socket.pyc", line 707, in create_connection File "socket.pyc", line 752, in getaddrinfo LookupError: unknown encoding: idna
El 13/07/2022 a las 7:26, Rui Fontes escribió:
Hello!
From 2022.06 to 2022.06.27:
- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation.
From 2022.06.27 to 2022.07:
- Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian.
Best regards,
Rui Fontes NVDA portuguese team
Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones? Brian
|
|
Hi,
Two problems here:
If I set spanish as my preffered language I gives an error and never display ocr.txt, this is the log using control windows r:
ERROR - stderr (19:42:00.880) - Thread-22 (5292): Exception in thread Thread-22: Traceback (most recent call last): File "threading.pyc", line 926, in _bootstrap_inner File "threading.pyc", line 870, in run File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 176, in _doRoutines self.convertPDFToPNG() File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 113, in convertPDFToPNG self.backgroundProcessing(command) File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 103, in backgroundProcessing p = subprocess.Popen(self.command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, startupinfo=si) File "subprocess.pyc", line 800, in __init__ File "subprocess.pyc", line 1207, in _execute_child OSError: [WinError 216] Esta versión de %1 no es compatible con la versión de Windows que está ejecutando. Compruebe la información de sistema del equipo para consultar si necesita una versión x86 (32 bits) o x64 (64 bits) del programa, y después póngase en contacto con el editor del software
Also, sometimes when I start NVDA it never speak and I need to restart it, here is the log when it start finally with voice feedback:
NVDA initialized ERROR - unhandled exception (19:46:35.521) - MainThread (2396): Traceback (most recent call last): File "wx\core.pyc", line 3407, in <lambda> File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\update.py", line 80, in upgradeVerify r = urllib.request.urlopen(p).read() File "urllib\request.pyc", line 222, in urlopen File "urllib\request.pyc", line 525, in open File "urllib\request.pyc", line 543, in _open File "urllib\request.pyc", line 503, in _call_chain File "urllib\request.pyc", line 1393, in https_open File "urllib\request.pyc", line 1350, in do_open File "http\client.pyc", line 1277, in request File "http\client.pyc", line 1323, in _send_request File "http\client.pyc", line 1272, in endheaders File "http\client.pyc", line 1032, in _send_output File "http\client.pyc", line 972, in send File "http\client.pyc", line 1439, in connect File "http\client.pyc", line 944, in connect File "socket.pyc", line 707, in create_connection File "socket.pyc", line 752, in getaddrinfo LookupError: unknown encoding: idna
toggle quoted message
Show quoted text
El 13/07/2022 a las 7:26, Rui Fontes escribió: Hello!
From 2022.06 to 2022.06.27:
- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit); - Added several more recognition languages; - Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it; - Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary; - Introduced beeps to signal the add-on is working; - Corrected code to avoid the non population of the download languages combobox; - Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4; - Added russian translation.
From 2022.06.27 to 2022.07:
- Allow using any number of recognition languages; - Complete code re-wrote, including: - Split in various modules to make code clear; - End using batch files; - Allow recognize files on Desktop; - Added translation to spanish, french, russian and ukranian.
Best regards,
Rui Fontes NVDA portuguese team
Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones? Brian
|
|
Please, send it directly to me:
rui.fontes@...
Rui Fontes
Às 18:25 de 13/07/2022, farhan israk
escreveu:
toggle quoted message
Show quoted text
How can I send a log here?
|
|
How can I send a log here?
|
|
Here is the documentation:
TesseractOCR
• Authors: Rui Fontes rui.fontes@... and Angelo
Abrantes ampa4374@...
• Updated in 13/07/2022
• Download stable version
• Compatibility: NVDA version 2019.3 and beyond
Informations
This add-on uses the free and open source Tesseract OCR engine, to
perform optical character recognition on an image file, PDF, JPG,
TIF or other, without the need to open it. It also uses
wia-cmd-scanner to be able to access WIA enabled scanners and
perform OCR to a paper document. In the NVDA menu, Preferences, a
TesseractOCR section is added, where you can configure the
languages to be used in recognition and the type of documents to
be recognized. With the exception of English and Portuguese, which
are already included in add-on, the other languages will be
downloaded and installed when you select a language that does not
already exist in the add-on. Note that as the number of selected
recognition languages increases, the OCR process will take longer.
We therefore recommend that you use only the languages you need.
Note also that the quality of recognition may vary according to
the order of languages. Therefore, if the recognition result is
not satisfactory, you may want to try another language ordering.
Shortcut
The default commands are: Windows+Control+r - to recognize the
selected document; Windows+Control+w - to scan and recognize a
document through the scanner.
Then just wait that ocr.txt opens with the recognized text. If you
want to preserve the recognized text, don't forget to save the
document under another name and in another location, as all files
in the temporary directory are deleted at the start of the next
OCR process!
This commands can be modified in the "Input gestures" dialog in
the "TesseractOCR" section.
Automatic update
This add-on includes an automatic update feature. The check for a
new version will be executed everytime NVDA is loaded. If you want
this, go to NVDA, Preferences, Options and in the add-on category
check the check box.
Known problems
• In some systems it is possible that add-on do not work due to a
comtypes error... In some machines it is enough going to the temp
folder, and deleting the comtypes_cache folder.
• When selecting the "Various" option in the "Documents type"
combobox, the recognized text probably appear with many blank
lines This is a known problem with Tesseract, and, without
consumming lots of processing time, I haven't yet found any
solution. But, I still haven't given up!
Languages supported
The supported languages in this version are: Afrikans Albanian
Amharik Arabic Armenian Assamese Azerbaijani (Latin) Basque
Belarusian Bengali Bosnian Breton Bulgarian Burnese
Catalan/Valencian Cebuano Cherokee Chinese simplified Chinese
traditional Corsican Croatian Czech Dannish Deutch Dhivehi Dutch
(Flemish) Dzongkha English Esperanto Estonian Faroese Filipino
Finnish French Galician Georgian Greek Gujarati Haitian Hebrew
Hindi Hungarian Icelandic Indonesian Inuktitut Irish Italian
Javanese Japanese Kannada Kazakh Khmer (Central) Kirghiz Korean
Kurdish Kurmanji Lao Latin Lativia Lituanian Luxembourgish
Macedonian Malay Malayalam Maltese Maori Marathi Math / equation
detection module Mongolian Nepali Norwegian Occitan Oriya Panjabi
Pashto Persian Polish Portuguese Quechua Romanian/Moldave Russian
Sanskrit Scottish Gaelic Serbian (Latin) Slovak) Slovenian) Sindhi
Sinhalese Spanish Sundanese Swahili Swedish Syriac Tajik Tamil
Tatar Telugu Thai Tibetan Tigrinya Tonga Turkish Uighur Ukrainian
Urdu Uzbek (Latin) Vietnamese Welsh West Frisian Yiddish Yoruba
Image types supported
This add-on supports the following types of files: PDF jpg tif png
bmp pnm pbm pgm jp2 gif jfif jpeg tiff spix webp
Best regards,
Rui Fontes
NVDA portuguese team
Às 15:09 de 13/07/2022, Anthony tom
escreveu:
toggle quoted message
Show quoted text
|
|
|
|