New version of TesseractOCR add-on


Ravindran V.S.
 

Hello,

- Introduced beeps to signal the add-on is working;""
This is a great and useful addition. Keeps us informed that the OCR is running. Better if it could be a progress beep like in NVDA. So that we could be aware what percentage is completed.

Thanks,
Ravi.
V.S.Ravindran.
Excuses leads to failure!””

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes
Sent: Wednesday, July 13, 2022 4:56 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] New version of TesseractOCR add-on

Hello!


From 2022.06 to 2022.06.27:

- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit);
- Added several more recognition languages;
- Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;
- Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary;
- Introduced beeps to signal the add-on is working;
- Corrected code to avoid the non population of the download languages combobox;
- Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4;
- Added russian translation.


From 2022.06.27 to 2022.07:

- Allow using any number of recognition languages;
- Complete code re-wrote, including:
- Split in various modules to make code clear;
- End using batch files;
- Allow recognize files on Desktop;
- Added translation to spanish, french, russian and ukranian.


Best regards,

Rui Fontes
NVDA portuguese team


Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones?
Brian


Rui Fontes
 

Hello!


Since version 2022.06.27, it uses Tesseract 5.1 for 32-bit...

So, it is very strange...


By the way, you can update to version 2022.07.13, already released...

The changes are:

- Corrected the threading for the update routine;
- Updated turkish translation;
- Small code corrections...


Best regards,

Rui Fontes
NVDA portuguese team



Às 02:23 de 14/07/2022, mk360 escreveu:

Hi,

About the version, is 2022.7, but it happened when I installed the version that uses Sesseract 5.1 because I run it in a 32 b system.

Note that the english language worked when I installed that version, but spanish doesn't.

El 13/07/2022 a las 21:14, Rui Fontes escribió:
Hello!


Wich version of tesseractOCR?

Your system is 32 or 64-bit?


Cumprimentos,

Rui Fontes
Equipa portuguesa do NVDA



Às 00:49 de 14/07/2022, mk360 escreveu:
Hi,

Two problems here:

If I set spanish as my preffered language I gives an error and never display ocr.txt, this is the log using control windows r:

ERROR - stderr (19:42:00.880) - Thread-22 (5292):
Exception in thread Thread-22:
Traceback (most recent call last):
  File "threading.pyc", line 926, in _bootstrap_inner
  File "threading.pyc", line 870, in run
  File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 176, in _doRoutines
    self.convertPDFToPNG()
  File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 113, in convertPDFToPNG
    self.backgroundProcessing(command)
  File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\__init__.py", line 103, in backgroundProcessing
    p = subprocess.Popen(self.command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, startupinfo=si)
  File "subprocess.pyc", line 800, in __init__
  File "subprocess.pyc", line 1207, in _execute_child
OSError: [WinError 216] Esta versión de %1 no es compatible con la versión de Windows que está ejecutando. Compruebe la información de sistema del equipo para consultar si necesita una versión x86 (32 bits) o x64 (64 bits) del programa, y después póngase en contacto con el editor del software


Also, sometimes when I start NVDA it never speak and I need to restart it, here is the log when it start finally with voice feedback:

NVDA initialized
ERROR - unhandled exception (19:46:35.521) - MainThread (2396):
Traceback (most recent call last):
  File "wx\core.pyc", line 3407, in <lambda>
  File "C:\Users\usuario\AppData\Roaming\nvda\addons\tesseractOCR\globalPlugins\tesseractOCR\update.py", line 80, in upgradeVerify
    r = urllib.request.urlopen(p).read()
  File "urllib\request.pyc", line 222, in urlopen
  File "urllib\request.pyc", line 525, in open
  File "urllib\request.pyc", line 543, in _open
  File "urllib\request.pyc", line 503, in _call_chain
  File "urllib\request.pyc", line 1393, in https_open
  File "urllib\request.pyc", line 1350, in do_open
  File "http\client.pyc", line 1277, in request
  File "http\client.pyc", line 1323, in _send_request
  File "http\client.pyc", line 1272, in endheaders
  File "http\client.pyc", line 1032, in _send_output
  File "http\client.pyc", line 972, in send
  File "http\client.pyc", line 1439, in connect
  File "http\client.pyc", line 944, in connect
  File "socket.pyc", line 707, in create_connection
  File "socket.pyc", line 752, in getaddrinfo
LookupError: unknown encoding: idna

El 13/07/2022 a las 7:26, Rui Fontes escribió:
Hello!


From 2022.06 to 2022.06.27:

- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit);
- Added several more recognition languages;
- Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;
- Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary;
- Introduced beeps to signal the add-on is working;
- Corrected code to avoid the non population of the download languages combobox;
- Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4;
- Added russian translation.


From 2022.06.27 to 2022.07:

- Allow using any number of recognition languages;
- Complete code re-wrote, including:
    - Split in various modules to make code clear;
    - End using batch files;
    - Allow recognize files on Desktop;
- Added translation to spanish, french, russian and ukranian.


Best regards,

Rui Fontes
NVDA portuguese team


Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones?
Brian












Rui Fontes
 

That is not possible since we do not know how long the process will take...


Best regards,

Rui Fontes
NVDA portuguese team


Às 06:43 de 14/07/2022, Ravindran V.S. escreveu:

Hello,

- Introduced beeps to signal the add-on is working;""
This is a great and useful addition. Keeps us informed that the OCR is running. Better if it could be a progress beep like in NVDA. So that we could be aware what percentage is completed.

Thanks,
Ravi.
V.S.Ravindran.
Excuses leads to failure!””

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes
Sent: Wednesday, July 13, 2022 4:56 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] New version of TesseractOCR add-on

Hello!


From 2022.06 to 2022.06.27:

- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit);
- Added several more recognition languages;
- Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;
- Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary;
- Introduced beeps to signal the add-on is working;
- Corrected code to avoid the non population of the download languages combobox;
- Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4;
- Added russian translation.


From 2022.06.27 to 2022.07:

- Allow using any number of recognition languages;
- Complete code re-wrote, including:
- Split in various modules to make code clear;
- End using batch files;
- Allow recognize files on Desktop;
- Added translation to spanish, french, russian and ukranian.


Best regards,

Rui Fontes
NVDA portuguese team


Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones?
Brian








Rui Fontes
 

Hello!


It is already available a new version, 2022.07.13.

The change log is:

- Corrected the threading for the update routine;
- Updated turkish translation;
- Small code corrections...


The direct link is:

https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addon


In the NVDA, Preferences, Options, you will find a TesseractOCR section.

There you can select the languages to be used in the recognition process and its order...


Best regards,

Rui Fontes
NVDA portuguese team


Às 06:24 de 14/07/2022, Ravindran V.S. escreveu:

Hello,
Just a question about the below :
- Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;""

How can we select the second language? Where are these options please?
I have added the required second language(Tamil) in the list.
I am running Win 10 64 bit; NVDA 2022.1; TesseractOCR add-on v: 2022.07(downloaded from the direct link you shared.)
Thanks,
Ravi
V.S.Ravindran.
Excuses leads to failure!””

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes
Sent: Wednesday, July 13, 2022 4:56 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] New version of TesseractOCR add-on

Hello!


From 2022.06 to 2022.06.27:

- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit);
- Added several more recognition languages;
- Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;
- Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary;
- Introduced beeps to signal the add-on is working;
- Corrected code to avoid the non population of the download languages combobox;
- Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4;
- Added russian translation.


From 2022.06.27 to 2022.07:

- Allow using any number of recognition languages;
- Complete code re-wrote, including:
- Split in various modules to make code clear;
- End using batch files;
- Allow recognize files on Desktop;
- Added translation to spanish, french, russian and ukranian.


Best regards,

Rui Fontes
NVDA portuguese team


Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones?
Brian








Ravindran V.S.
 

Hello,

Thank you for clarifying.
So, it will automatically pick the language in the selected language list In the order.
Mean-wile, I just encountered a new problem now, when I started my PC.
Once the Windows loaded, NVDA startup sound came, but no voice to speak. Tried to restart the NVDA few times with the shortcut key.(Ctrl+Alt+N), same result.
Then loaded alternative screen reader, and it announced " New version of TesseractOCR add-on is available; do you want to install? Yes/ No"
When I clicked "Yes" took a short while but NVDA did not restart. But the voice started to speak.
Repeated to restart NVDA,the result was as previous.
Then saw this link and updated via this download.
Now no issue.
Only this afternoon, before shutting the PC, I checked the checkbox to check for new updates in the TesseractOCR add-on settings in NVDA.
I use Windows 10 64bit; NVDA 2022.1; Vocalizer voices.
Also, the OCR results in Tamil language seems to be bit unclear.
I mean, after the OCR the results does not seems to be in an order to make the sentences meaningful.
And moving to them with the normal cursor is difficult. Have to use the Object navigation.
This is my initial experience. Wil give it more attempts and confirm.
Please advice if am I missing anything.
Thanks,
Ravi.
V.S.Ravindran.
Excuses leads to failure!””

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes
Sent: Thursday, July 14, 2022 5:39 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] New version of TesseractOCR add-on

Hello!


It is already available a new version, 2022.07.13.

The change log is:

- Corrected the threading for the update routine;
- Updated turkish translation;
- Small code corrections...


The direct link is:

https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addon


In the NVDA, Preferences, Options, you will find a TesseractOCR section.

There you can select the languages to be used in the recognition process and its order...


Best regards,

Rui Fontes
NVDA portuguese team


Às 06:24 de 14/07/2022, Ravindran V.S. escreveu:
Hello,
Just a question about the below :
- Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;""

How can we select the second language? Where are these options please?
I have added the required second language(Tamil) in the list.
I am running Win 10 64 bit; NVDA 2022.1; TesseractOCR add-on v: 2022.07(downloaded from the direct link you shared.)

Thanks,
Ravi
V.S.Ravindran.
Excuses leads to failure!””

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes
Sent: Wednesday, July 13, 2022 4:56 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] New version of TesseractOCR add-on

Hello!


From 2022.06 to 2022.06.27:

- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit);
- Added several more recognition languages;
- Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;
- Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary;
- Introduced beeps to signal the add-on is working;
- Corrected code to avoid the non population of the download languages combobox;
- Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4;
- Added russian translation.


From 2022.06.27 to 2022.07:

- Allow using any number of recognition languages;
- Complete code re-wrote, including:
- Split in various modules to make code clear;
- End using batch files;
- Allow recognize files on Desktop;
- Added translation to spanish, french, russian and ukranian.


Best regards,

Rui Fontes
NVDA portuguese team


Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones?
Brian









Rui Fontes
 

Hello!


Yes, you should select Tamil, and if necessary, other languages, and place Tamil in first place.


Regarding your problem navegating the results, it is strange since it is a normal text file in the NotePad application...


I suppose the threading problem updating is already solved...


Waiting for yours future observations...


Best regards,

Rui Fontes
NVDA portuguese team



Às 15:24 de 14/07/2022, Ravindran V.S. escreveu:

Hello,

Thank you for clarifying.
So, it will automatically pick the language in the selected language list In the order.
Mean-wile, I just encountered a new problem now, when I started my PC.
Once the Windows loaded, NVDA startup sound came, but no voice to speak. Tried to restart the NVDA few times with the shortcut key.(Ctrl+Alt+N), same result.
Then loaded alternative screen reader, and it announced " New version of TesseractOCR add-on is available; do you want to install? Yes/ No"
When I clicked "Yes" took a short while but NVDA did not restart. But the voice started to speak.
Repeated to restart NVDA,the result was as previous.
Then saw this link and updated via this download.
Now no issue.
Only this afternoon, before shutting the PC, I checked the checkbox to check for new updates in the TesseractOCR add-on settings in NVDA.
I use Windows 10 64bit; NVDA 2022.1; Vocalizer voices.
Also, the OCR results in Tamil language seems to be bit unclear.
I mean, after the OCR the results does not seems to be in an order to make the sentences meaningful.
And moving to them with the normal cursor is difficult. Have to use the Object navigation.
This is my initial experience. Wil give it more attempts and confirm.
Please advice if am I missing anything.
Thanks,
Ravi.
V.S.Ravindran.
Excuses leads to failure!””

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes
Sent: Thursday, July 14, 2022 5:39 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] New version of TesseractOCR add-on

Hello!


It is already available a new version, 2022.07.13.

The change log is:

- Corrected the threading for the update routine;
- Updated turkish translation;
- Small code corrections...


The direct link is:

https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addon


In the NVDA, Preferences, Options, you will find a TesseractOCR section.

There you can select the languages to be used in the recognition process and its order...


Best regards,

Rui Fontes
NVDA portuguese team


Às 06:24 de 14/07/2022, Ravindran V.S. escreveu:
Hello,
Just a question about the below :
- Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;""

How can we select the second language? Where are these options please?
I have added the required second language(Tamil) in the list.
I am running Win 10 64 bit; NVDA 2022.1; TesseractOCR add-on v: 2022.07(downloaded from the direct link you shared.)
Thanks,
Ravi
V.S.Ravindran.
Excuses leads to failure!””

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes
Sent: Wednesday, July 13, 2022 4:56 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] New version of TesseractOCR add-on

Hello!


From 2022.06 to 2022.06.27:

- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit);
- Added several more recognition languages;
- Introduced the option to select a second language to be used in OCR of documents with multiple languages and a button to forget it;
- Introduced a new document type, "With auto-orientation", that allows the OCR engine to rotate the image as necessary;
- Introduced beeps to signal the add-on is working;
- Corrected code to avoid the non population of the download languages combobox;
- Corrected a problem with controlTypes roles preventing compatibility with NVDA 2020.4;
- Added russian translation.


From 2022.06.27 to 2022.07:

- Allow using any number of recognition languages;
- Complete code re-wrote, including:
- Split in various modules to make code clear;
- End using batch files;
- Allow recognize files on Desktop;
- Added translation to spanish, french, russian and ukranian.


Best regards,

Rui Fontes
NVDA portuguese team


Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io escreveu:
What is the difference between the old and new ones?
Brian















mukesh jain
 

hello,
does it support Hindi language?
thanks,
Mukesh

On 7/14/22, Rui Fontes <rui.fontes@...> wrote:
Hello!


Yes, you should select Tamil, and if necessary, other languages, and
place Tamil in first place.


Regarding your problem navegating the results, it is strange since it is
a normal text file in the NotePad application...


I suppose the threading problem updating is already solved...


Waiting for yours future observations...


Best regards,

Rui Fontes
NVDA portuguese team



Às 15:24 de 14/07/2022, Ravindran V.S. escreveu:
Hello,

Thank you for clarifying.
So, it will automatically pick the language in the selected language list
In the order.
Mean-wile, I just encountered a new problem now, when I started my PC.
Once the Windows loaded, NVDA startup sound came, but no voice to speak.
Tried to restart the NVDA few times with the shortcut key.(Ctrl+Alt+N),
same result.
Then loaded alternative screen reader, and it announced " New version of
TesseractOCR add-on is available; do you want to install? Yes/ No"
When I clicked "Yes" took a short while but NVDA did not restart. But the
voice started to speak.
Repeated to restart NVDA,the result was as previous.
Then saw this link and updated via this download.
Now no issue.
Only this afternoon, before shutting the PC, I checked the checkbox to
check for new updates in the TesseractOCR add-on settings in NVDA.
I use Windows 10 64bit; NVDA 2022.1; Vocalizer voices.
Also, the OCR results in Tamil language seems to be bit unclear.
I mean, after the OCR the results does not seems to be in an order to make
the sentences meaningful.
And moving to them with the normal cursor is difficult. Have to use the
Object navigation.
This is my initial experience. Wil give it more attempts and confirm.
Please advice if am I missing anything.
Thanks,
Ravi.
V.S.Ravindran.
Excuses leads to failure!””

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes
Sent: Thursday, July 14, 2022 5:39 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] New version of TesseractOCR add-on

Hello!


It is already available a new version, 2022.07.13.

The change log is:

- Corrected the threading for the update routine;
- Updated turkish translation;
- Small code corrections...


The direct link is:

https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addon


In the NVDA, Preferences, Options, you will find a TesseractOCR section.

There you can select the languages to be used in the recognition process
and its order...


Best regards,

Rui Fontes
NVDA portuguese team


Às 06:24 de 14/07/2022, Ravindran V.S. escreveu:
Hello,
Just a question about the below :
- Introduced the option to select a second language to be used in OCR of
documents with multiple languages and a button to forget it;""

How can we select the second language? Where are these options please?
I have added the required second language(Tamil) in the list.
I am running Win 10 64 bit; NVDA 2022.1; TesseractOCR add-on v:
2022.07(downloaded from the direct link you shared.)

Thanks,
Ravi
V.S.Ravindran.
Excuses leads to failure!””

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes
Sent: Wednesday, July 13, 2022 4:56 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] New version of TesseractOCR add-on

Hello!


From 2022.06 to 2022.06.27:

- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit);
- Added several more recognition languages;
- Introduced the option to select a second language to be used in OCR of
documents with multiple languages and a button to forget it;
- Introduced a new document type, "With auto-orientation", that allows
the OCR engine to rotate the image as necessary;
- Introduced beeps to signal the add-on is working;
- Corrected code to avoid the non population of the download languages
combobox;
- Corrected a problem with controlTypes roles preventing compatibility
with NVDA 2020.4;
- Added russian translation.


From 2022.06.27 to 2022.07:

- Allow using any number of recognition languages;
- Complete code re-wrote, including:
- Split in various modules to make code clear;
- End using batch files;
- Allow recognize files on Desktop;
- Added translation to spanish, french, russian and ukranian.


Best regards,

Rui Fontes
NVDA portuguese team


Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io
escreveu:
What is the difference between the old and new ones?
Brian




















Rui Fontes
 

Yes, you can see in documentation the several languages it support:

Afrikans
Albanian
Amharik
Arabic
Armenian
Assamese
Azerbaijani (Latin)
Basque
Belarusian
Bengali
Bosnian
Breton
Bulgarian
Burnese
Catalan/Valencian
Cebuano
Cherokee
Chinese simplified
Chinese traditional
Corsican
Croatian
Czech
Dannish
Deutch
Dhivehi
Dutch (Flemish)
Dzongkha
English
Esperanto
Estonian
Faroese
Filipino
Finnish
French
Galician
Georgian
Greek
Gujarati
Haitian
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Inuktitut
Irish
Italian
Javanese
Japanese
Kannada
Kazakh
Khmer (Central)
Kirghiz
Korean
Kurdish Kurmanji
Lao
Latin
Lativia
Lituanian
Luxembourgish
Macedonian
Malay
Malayalam
Maltese
Maori
Marathi
Math / equation detection module
Mongolian
Nepali
Norwegian
Occitan
Oriya
Panjabi
Pashto
Persian
Polish
Portuguese
Quechua
Romanian/Moldave
Russian
Sanskrit
Scottish Gaelic
Serbian (Latin)
Slovak)
Slovenian)
Sindhi
Sinhalese
Spanish
Sundanese
Swahili
Swedish
Syriac
Tajik
Tamil
Tatar
Telugu
Thai
Tibetan
Tigrinya
Tonga
Turkish
Uighur
Ukrainian
Urdu
Uzbek (Latin)
Vietnamese
Welsh
West Frisian
Yiddish
Yoruba

Best regards,

Rui Fontes
NVDA portuguese team



Às 08:19 de 15/07/2022, mukesh jain escreveu:

hello,
does it support Hindi language?
thanks,
Mukesh

On 7/14/22, Rui Fontes <rui.fontes@...> wrote:
Hello!


Yes, you should select Tamil, and if necessary, other languages, and
place Tamil in first place.


Regarding your problem navegating the results, it is strange since it is
a normal text file in the NotePad application...


I suppose the threading problem updating is already solved...


Waiting for yours future observations...


Best regards,

Rui Fontes
NVDA portuguese team



Às 15:24 de 14/07/2022, Ravindran V.S. escreveu:
Hello,

Thank you for clarifying.
So, it will automatically pick the language in the selected language list
In the order.
Mean-wile, I just encountered a new problem now, when I started my PC.
Once the Windows loaded, NVDA startup sound came, but no voice to speak.
Tried to restart the NVDA few times with the shortcut key.(Ctrl+Alt+N),
same result.
Then loaded alternative screen reader, and it announced " New version of
TesseractOCR add-on is available; do you want to install? Yes/ No"
When I clicked "Yes" took a short while but NVDA did not restart. But the
voice started to speak.
Repeated to restart NVDA,the result was as previous.
Then saw this link and updated via this download.
Now no issue.
Only this afternoon, before shutting the PC, I checked the checkbox to
check for new updates in the TesseractOCR add-on settings in NVDA.
I use Windows 10 64bit; NVDA 2022.1; Vocalizer voices.
Also, the OCR results in Tamil language seems to be bit unclear.
I mean, after the OCR the results does not seems to be in an order to make
the sentences meaningful.
And moving to them with the normal cursor is difficult. Have to use the
Object navigation.
This is my initial experience. Wil give it more attempts and confirm.
Please advice if am I missing anything.
Thanks,
Ravi.
V.S.Ravindran.
Excuses leads to failure!””

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes
Sent: Thursday, July 14, 2022 5:39 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] New version of TesseractOCR add-on

Hello!


It is already available a new version, 2022.07.13.

The change log is:

- Corrected the threading for the update routine;
- Updated turkish translation;
- Small code corrections...


The direct link is:

https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addon


In the NVDA, Preferences, Options, you will find a TesseractOCR section.

There you can select the languages to be used in the recognition process
and its order...


Best regards,

Rui Fontes
NVDA portuguese team


Às 06:24 de 14/07/2022, Ravindran V.S. escreveu:
Hello,
Just a question about the below :
- Introduced the option to select a second language to be used in OCR of
documents with multiple languages and a button to forget it;""

How can we select the second language? Where are these options please?
I have added the required second language(Tamil) in the list.
I am running Win 10 64 bit; NVDA 2022.1; TesseractOCR add-on v:
2022.07(downloaded from the direct link you shared.)

Thanks,
Ravi
V.S.Ravindran.
Excuses leads to failure!””

-----Original Message-----
From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Rui Fontes
Sent: Wednesday, July 13, 2022 4:56 PM
To: nvda@nvda.groups.io
Subject: Re: [nvda] New version of TesseractOCR add-on

Hello!


From 2022.06 to 2022.06.27:

- Updated Tesseract from version 5.0 Alpha (64-bit) to 5.1 (32-bit);
- Added several more recognition languages;
- Introduced the option to select a second language to be used in OCR of
documents with multiple languages and a button to forget it;
- Introduced a new document type, "With auto-orientation", that allows
the OCR engine to rotate the image as necessary;
- Introduced beeps to signal the add-on is working;
- Corrected code to avoid the non population of the download languages
combobox;
- Corrected a problem with controlTypes roles preventing compatibility
with NVDA 2020.4;
- Added russian translation.


From 2022.06.27 to 2022.07:

- Allow using any number of recognition languages;
- Complete code re-wrote, including:
- Split in various modules to make code clear;
- End using batch files;
- Allow recognize files on Desktop;
- Added translation to spanish, french, russian and ukranian.


Best regards,

Rui Fontes
NVDA portuguese team


Às 07:12 de 13/07/2022, Brian's Mail list account via groups.io
escreveu:
What is the difference between the old and new ones?
Brian


















Dave Grossoehme
 

Hi Rui:  When was this update written?  I have your app for my add on for NVDA, but it is dated January of 2022.  Is that the latest update?  If not, I missed the address to go to update, if it won't update, from the alt + h key in the NVDA Add on's.  I apologize for being a little late on this information.  I had in my mind that this was a different ADD On other than the Now app.  Can you be so, kind to help out here?  Thanks in advance.

Dave


On 7/13/2022 11:59 AM, Rui Fontes wrote:

Here is the documentation:


TesseractOCR
• Authors: Rui Fontes rui.fontes@... and Angelo Abrantes ampa4374@...
• Updated in 13/07/2022
• Download stable version
• Compatibility: NVDA version 2019.3 and beyond
Informations
This add-on uses the free and open source Tesseract OCR engine, to perform optical character recognition on an image file, PDF, JPG, TIF or other, without the need to open it. It also uses wia-cmd-scanner to be able to access WIA enabled scanners and perform OCR to a paper document. In the NVDA menu, Preferences, a TesseractOCR section is added, where you can configure the languages to be used in recognition and the type of documents to be recognized. With the exception of English and Portuguese, which are already included in add-on, the other languages will be downloaded and installed when you select a language that does not already exist in the add-on. Note that as the number of selected recognition languages increases, the OCR process will take longer. We therefore recommend that you use only the languages you need. Note also that the quality of recognition may vary according to the order of languages. Therefore, if the recognition result is not satisfactory, you may want to try another language ordering.
Shortcut
The default commands are: Windows+Control+r - to recognize the selected document; Windows+Control+w - to scan and recognize a document through the scanner.
Then just wait that ocr.txt opens with the recognized text. If you want to preserve the recognized text, don't forget to save the document under another name and in another location, as all files in the temporary directory are deleted at the start of the next OCR process!
This commands can be modified in the "Input gestures" dialog in the "TesseractOCR" section.
Automatic update
This add-on includes an automatic update feature. The check for a new version will be executed everytime NVDA is loaded. If you want this, go to NVDA, Preferences, Options and in the add-on category check the check box.
Known problems
• In some systems it is possible that add-on do not work due to a comtypes error... In some machines it is enough going to the temp folder, and deleting the comtypes_cache folder.
• When selecting the "Various" option in the "Documents type" combobox, the recognized text probably appear with many blank lines This is a known problem with Tesseract, and, without consumming lots of processing time, I haven't yet found any solution. But, I still haven't given up!
Languages supported
The supported languages in this version are: Afrikans Albanian Amharik Arabic Armenian Assamese Azerbaijani (Latin) Basque Belarusian Bengali Bosnian Breton Bulgarian Burnese Catalan/Valencian Cebuano Cherokee Chinese simplified Chinese traditional Corsican Croatian Czech Dannish Deutch Dhivehi Dutch (Flemish) Dzongkha English Esperanto Estonian Faroese Filipino Finnish French Galician Georgian Greek Gujarati Haitian Hebrew Hindi Hungarian Icelandic Indonesian Inuktitut Irish Italian Javanese Japanese Kannada Kazakh Khmer (Central) Kirghiz Korean Kurdish Kurmanji Lao Latin Lativia Lituanian Luxembourgish Macedonian Malay Malayalam Maltese Maori Marathi Math / equation detection module Mongolian Nepali Norwegian Occitan Oriya Panjabi Pashto Persian Polish Portuguese Quechua Romanian/Moldave Russian Sanskrit Scottish Gaelic Serbian (Latin) Slovak) Slovenian) Sindhi Sinhalese Spanish Sundanese Swahili Swedish Syriac Tajik Tamil Tatar Telugu Thai Tibetan Tigrinya Tonga Turkish Uighur Ukrainian Urdu Uzbek (Latin) Vietnamese Welsh West Frisian Yiddish Yoruba
Image types supported
This add-on supports the following types of files: PDF jpg tif png bmp pnm pbm pgm jp2 gif jfif jpeg tiff spix webp


Best regards,

Rui Fontes
NVDA portuguese team


Às 15:09 de 13/07/2022, Anthony tom escreveu:
How is it used?


Rui Fontes
 

Hello!


Since January we have published several new versions...

To get the most recent, use the following link:

 https://github.com/ruifontes/tesseractOCR/releases/download/2022.07.13/tesseractOCR-2022.07.13.nvda-addon


Best regards,

Rui Fontes
NVDA portuguese team


Às 15:02 de 15/07/2022, Dave Grossoehme escreveu:

Hi Rui:  When was this update written?  I have your app for my add on for NVDA, but it is dated January of 2022.  Is that the latest update?  If not, I missed the address to go to update, if it won't update, from the alt + h key in the NVDA Add on's.  I apologize for being a little late on this information.  I had in my mind that this was a different ADD On other than the Now app.  Can you be so, kind to help out here?  Thanks in advance.

Dave


On 7/13/2022 11:59 AM, Rui Fontes wrote:

Here is the documentation:


TesseractOCR
• Authors: Rui Fontes rui.fontes@... and Angelo Abrantes ampa4374@...
• Updated in 13/07/2022
• Download stable version
• Compatibility: NVDA version 2019.3 and beyond
Informations
This add-on uses the free and open source Tesseract OCR engine, to perform optical character recognition on an image file, PDF, JPG, TIF or other, without the need to open it. It also uses wia-cmd-scanner to be able to access WIA enabled scanners and perform OCR to a paper document. In the NVDA menu, Preferences, a TesseractOCR section is added, where you can configure the languages to be used in recognition and the type of documents to be recognized. With the exception of English and Portuguese, which are already included in add-on, the other languages will be downloaded and installed when you select a language that does not already exist in the add-on. Note that as the number of selected recognition languages increases, the OCR process will take longer. We therefore recommend that you use only the languages you need. Note also that the quality of recognition may vary according to the order of languages. Therefore, if the recognition result is not satisfactory, you may want to try another language ordering.
Shortcut
The default commands are: Windows+Control+r - to recognize the selected document; Windows+Control+w - to scan and recognize a document through the scanner.
Then just wait that ocr.txt opens with the recognized text. If you want to preserve the recognized text, don't forget to save the document under another name and in another location, as all files in the temporary directory are deleted at the start of the next OCR process!
This commands can be modified in the "Input gestures" dialog in the "TesseractOCR" section.
Automatic update
This add-on includes an automatic update feature. The check for a new version will be executed everytime NVDA is loaded. If you want this, go to NVDA, Preferences, Options and in the add-on category check the check box.
Known problems
• In some systems it is possible that add-on do not work due to a comtypes error... In some machines it is enough going to the temp folder, and deleting the comtypes_cache folder.
• When selecting the "Various" option in the "Documents type" combobox, the recognized text probably appear with many blank lines This is a known problem with Tesseract, and, without consumming lots of processing time, I haven't yet found any solution. But, I still haven't given up!
Languages supported
The supported languages in this version are: Afrikans Albanian Amharik Arabic Armenian Assamese Azerbaijani (Latin) Basque Belarusian Bengali Bosnian Breton Bulgarian Burnese Catalan/Valencian Cebuano Cherokee Chinese simplified Chinese traditional Corsican Croatian Czech Dannish Deutch Dhivehi Dutch (Flemish) Dzongkha English Esperanto Estonian Faroese Filipino Finnish French Galician Georgian Greek Gujarati Haitian Hebrew Hindi Hungarian Icelandic Indonesian Inuktitut Irish Italian Javanese Japanese Kannada Kazakh Khmer (Central) Kirghiz Korean Kurdish Kurmanji Lao Latin Lativia Lituanian Luxembourgish Macedonian Malay Malayalam Maltese Maori Marathi Math / equation detection module Mongolian Nepali Norwegian Occitan Oriya Panjabi Pashto Persian Polish Portuguese Quechua Romanian/Moldave Russian Sanskrit Scottish Gaelic Serbian (Latin) Slovak) Slovenian) Sindhi Sinhalese Spanish Sundanese Swahili Swedish Syriac Tajik Tamil Tatar Telugu Thai Tibetan Tigrinya Tonga Turkish Uighur Ukrainian Urdu Uzbek (Latin) Vietnamese Welsh West Frisian Yiddish Yoruba
Image types supported
This add-on supports the following types of files: PDF jpg tif png bmp pnm pbm pgm jp2 gif jfif jpeg tiff spix webp


Best regards,

Rui Fontes
NVDA portuguese team


Às 15:09 de 13/07/2022, Anthony tom escreveu:
How is it used?


nvdainth@...
 

Hi Rui Fontes

I have same essue in this topic for Thai language
https://github.com/tesseract-ocr/tesseract/issues/2702

how to change alittle of code to fix it in your add-on?


Rui Fontes
 

Have you tried the new version of TesseractOCR add-on?


I performed the OCR to the file in the issue, and I sent you the file and the result...


Best regards,

Rui Fontes
NVDA portuguese team
Às 21:18 de 30/08/2022, nvdainth@... escreveu:

Hi Rui Fontes

I have same essue in this topic for Thai language
https://github.com/tesseract-ocr/tesseract/issues/2702

how to change alittle of code to fix it in your add-on?