Google Cloud Voices


Luke Davis
 

On Jan 8, Brian's Mail list account via groups.io wrote:

Maybe for those producing audio from text they might be useful, but if its pay as you go, it sounds like it won't be used by anyone.
Oh, it will. Voices like this are already being used to perform "indistinguishable from human" voice overs for Youtube videos, and similar.

It is a lot cheaper to hire a high quality computerized voice, or even two, to read your video ad copy, than it is to hire a good quality human. And every time the ad changes slightly, for a human it is a whole new job in some cases. For a bot it is not.

Of course, if you're blind, and therefore used to listening to artificial voices--or just parsing spoken content at a high level in general--the "human indistinguishibility" for these voices is questionable, and I often notice them in Youtube ads and other such situations. But their more difficult to pick out than the highest quality voices from even a few years ago, and I suspect that the casual listener will have a more difficult time of it.
Even for me, If they only read a sentense or two, I'm sure I miss some of them.

I have read claims from these companies, that 90% of listeners think it's a human speaking. Some even go as far as quoting higher percentages. Though all of that is marketing puffery to be sure, I'm also sure there is a good amount of truth to it.

My "head standard" for indistinguishable from human voices, is the computer from the Enterprise-D. So until they can mimic the life-likeness of the voice of Majel Barrett, I ain't buying. :)

In truth, though, their practically there with the voice quality itself. Where they have yet to make the progress, in my opinion, is in the randomized expressiveness of a human speaker. If you listen to any of them long enough, they will eventually get the emphasis wrong on something, that will just sound unnatural. So until the algo generating the voice, can also understand the context of the text, and how to apply it to speech (which will include look-back and look-ahead), I think they will continue to miss the mark. OpenAI is probably on its way there though, with GPT3 and such.

Luke


tim
 

Have fun there even a email from them just tells you use there help page for answers.

Most useless company for help on the internet.

On 1/7/2023 11:22 AM, Ame wrote:

Right.  But, I was thinking they may have them to purchase and download.  I should’ve clarified that.  Didn’t think about it.  Lol The Ten voices I found are really good and the adjustments you can make add flexibility you don’t have with a lot of voices.  I just tried to contact sales via chat and it’s all automated.  I’d like to be able to email them and get a response from a real human. 


 

Hi all,

It appears Google Cloud voices cannot be used as a speech synthesizer as people hope (at this time). As Luke pointed out and as the documentation states, users can send between 1 million to 4 million characters for free per month depending on voice type, then it is 4 to 16 United States dollars per one million characters per month. The engine can also support SSML (speech synthesis markup language) commands, and these commands do count as text as far as billing is concerned.

To demonstrate, imagine you as an individual decides to purchase these voices to be used as a speech syntehszier. Every time NVDA must say something, it must send the entire text to the cloud for processing. For example, when NVDA says, "Loading NVDA, please wait," that's already 25 characters. If you add SSML commands in the middle of this text (say, adding a pause between "NVDA" and "please"), that adds to the character count. Now suppose you need to read by paragraph and you come across a quite lengthy paragraph, say over 400 characters. Even if you intend to read just one sentence from this paragraph (say, about 60 characters), NVDA must send the entire paragraph to the cloud for processing. Now suppose you are in the middle of a conversation on a social network site and you get replies in real-time. Then NVDA must send not only the text, but also additional text that might be generated. Given that screen readers must work with text but also other information from apps and generate appropriate text for various situations (including text that might be cut off when you press Control key), you can see how this comes up to more than one million characters per month.

To make things a bit complicated, suppose Google offers a generous discount to organizations such as NV Access, knowing that some may need higher character count limit (say, 10 to 20 million characters). Imagine that an add-on is developed to interface with cloud voices and it gains popularity (Internet access is required). If NV Access is resposible for paying usage bills, then you can imagine a situation where sooner or later, a donation campaign might be established to help NV Access cover costs for these voices (at least NV Access may receive praise for going outside the box and embracing latest technology, while receiving criticism for not looking for better alternatives in terms of costs and leaving folks without Internet access behind).

If you want a real-life example of all this, take a look at a once promising add-on called Online Image Describer. The idea behind this add-on was to use the power of the cloud to describe images you come across. But since it used a cloud service, it had to deal with API limits (this is stated in the add-on documentation). If I remember correctly, this add-on was discontinued some time ago due to costs and API limits.

The key takeaway is this: just because technology looks promising doesn't mean it is applicable everywhere - there are costs involved (there is no such thing as free lunch; someone must pay for services like this). When in doubt, read the documentation carefully to see if a service comes with quotas like the coud voices we are discussing. Therefore, I'm afraid Gogle Cloud voices cannot be used in NVDA in any form for now.

Cheers,

Jsoeph


Brian's Mail list account
 

So what is the use case for them, is this the next incarnation of a smart assistant that you get when you call a number?

Besides, you may need a voice when not on line. Perhaps when computing power is enough locally then is the time for these sort of voices.
To be honest, no matter how 'real' they sound they will hardly be of much use if productivity is your aim with a screenreader. They just don't speed up intelligibly in my experience.

Maybe for those producing audio from text they might be useful, but if its pay as you go, it sounds like it won't be used by anyone.
Brian

--
bglists@...
Sent via blueyonder.(Virgin media)
Please address personal E-mail to:-
briang1@..., putting 'Brian Gaff'
in the display name field.

----- Original Message -----
From: "Luke Davis" <luke@...>
To: <nvda@nvda.groups.io>
Sent: Sunday, January 08, 2023 12:01 AM
Subject: Re: [nvda] Google Cloud Voices


Ame wrote:

Right. But, I was thinking they may have them to purchase and download.
Aren't they called "cloud voices"? According to the subject line they are. If
so, by definition, they aren't downloadable.

Further, a brief look at their docs, indicate that these are trained model
generated, and everything they talk about (including their pricing) indicates a
cloud based solution is all they offer. Their pricing is only given in
per-character increments, unless you request a quote for some other workload. At
least, from as much of it as I read. I'm not saying it's impossible, just that I
didn't find anything to suggest you can do that.

That implies they don't want you to be able to have unlimited (pay-once) access,
which a screen reader would require, and you would likely want if you didn't
want a rather huge bill.

Everywhere I have come across these neural-net generated voices, be it from
Google, Microsoft, or that company that did the fake Joe Rogan podcast with
Steve Jobs [*], they sell on a cloud-based constant-connection model, which is
not well suited to a screen reader.

[[*]] https://www.youtube.com/watch?v=rbK5Q9y7QEw

Gene wrote:

I also doubt screen-readers are designed to use such voices.
I'm curious whether ChromeOS can use them. If not, I'd suspect that is a
significant clue.
Although again, with the per-character pricing, it seems highly unlikely that
screen readers are the intended application.

Not that they couldn't be used that way if Google chose: they state that one of
the model-based voices in the cloud line is the source for Google Assistant,
which does suggest that they could package these into locally runnable forms if
they chose.

Luke


Luke Davis
 

Ame wrote:

Right.  But, I was thinking they may have them to purchase and download.
Aren't they called "cloud voices"? According to the subject line they are. If so, by definition, they aren't downloadable.

Further, a brief look at their docs, indicate that these are trained model generated, and everything they talk about (including their pricing) indicates a cloud based solution is all they offer. Their pricing is only given in per-character increments, unless you request a quote for some other workload. At least, from as much of it as I read. I'm not saying it's impossible, just that I didn't find anything to suggest you can do that.

That implies they don't want you to be able to have unlimited (pay-once) access, which a screen reader would require, and you would likely want if you didn't want a rather huge bill.

Everywhere I have come across these neural-net generated voices, be it from Google, Microsoft, or that company that did the fake Joe Rogan podcast with Steve Jobs [*], they sell on a cloud-based constant-connection model, which is not well suited to a screen reader.

[[*]] https://www.youtube.com/watch?v=rbK5Q9y7QEw

Gene wrote:

I also doubt screen-readers are designed to use such voices.
I'm curious whether ChromeOS can use them. If not, I'd suspect that is a significant clue.
Although again, with the per-character pricing, it seems highly unlikely that screen readers are the intended application.

Not that they couldn't be used that way if Google chose: they state that one of the model-based voices in the cloud line is the source for Google Assistant, which does suggest that they could package these into locally runnable forms if they chose.

Luke


 

On Sat, Jan 7, 2023 at 11:22 AM, Ame wrote:
I just tried to contact sales via chat and it’s all automated.  I’d like to be able to email them and get a response from a real human. 
-
You might get a "real human" way in via one of the following:

Google Accessibility Get In Touch Page

 

Contact the Google Disability Support Team

 
--

Brian Virginia, USA Windows 11 Pro, 64-Bit, Version 22H2, Build 22621; Office 2016, Version 16.0.15726.20188, 32-bit

It is much easier to be critical than to be correct.

       ~ Benjamin Disraeli, 1804-1881


Ame
 

Right.  But, I was thinking they may have them to purchase and download.  I should’ve clarified that.  Didn’t think about it.  Lol The Ten voices I found are really good and the adjustments you can make add flexibility you don’t have with a lot of voices.  I just tried to contact sales via chat and it’s all automated.  I’d like to be able to email them and get a response from a real human. 


Gene
 

I would think there would be at least some, but I also doubt screen-readers are designed to use such voices.  I don't recall hearing of a screen-reader using a voice not fully installed on the system.

Gene

On 1/7/2023 9:28 AM, Chris Smart wrote:

If the processing for those voices is done online, wouldn’t that introduce significant latency?

 

 

 

 

From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Ame
Sent: January 7, 2023 10:12 AM
To: nvda@nvda.groups.io Integration <nvda@nvda.groups.io>
Subject: [nvda] Google Cloud Voices

 

Hi all!  I’m forever scouring the net for new voices.  I like Windows OneCore Mark but I’m getting bored.  I stumbled across Google Cloud and I love their ten WaveNet voices.  I wonder if there’s a way to get those and use them with NVDA. 

 

Thanks in advance for your feedback.



Chris Smart
 

If the processing for those voices is done online, wouldn’t that introduce significant latency?

 

 

 

 

From: nvda@nvda.groups.io <nvda@nvda.groups.io> On Behalf Of Ame
Sent: January 7, 2023 10:12 AM
To: nvda@nvda.groups.io Integration <nvda@nvda.groups.io>
Subject: [nvda] Google Cloud Voices

 

Hi all!  I’m forever scouring the net for new voices.  I like Windows OneCore Mark but I’m getting bored.  I stumbled across Google Cloud and I love their ten WaveNet voices.  I wonder if there’s a way to get those and use them with NVDA. 

 

Thanks in advance for your feedback.


Ame
 

Hi all!  I’m forever scouring the net for new voices.  I like Windows OneCore Mark but I’m getting bored.  I stumbled across Google Cloud and I love their ten WaveNet voices.  I wonder if there’s a way to get those and use them with NVDA. 

 

Thanks in advance for your feedback.