Text to Speech software

Thu Jul 4 00:33:20 UTC 2024

At Wed, 03 Jul 2024 23:35:40 +0100 "Ubuntu user technical support,? not for general discussions" <ubuntu-users at lists.ubuntu.com> wrote:

> 
> Hi
> 
> Looking for good  "Text to Speech" software, would like something that does not 
> sound like Stephen Hawkins trying decide on lunch choices but is not too hard 
> to install and setup, a search around the net found ESpeakm all others I found 
> seemn to be aimed at software devs to combine in other projects, and need very 
> high coding levels to use (so really a /w dev)
> 
> I don't mind if it's CmdLine or GUI, but it should sound natural(ish) Enflish 
> is fine so multi-language support is not that important, a few voices would be 
> nice, but a good male & female will do at a push

The problem with "cheap" "Text to Speech" software is that it is phoneme based. 
And there are two problems: English spelling is horrible from a phoneme point 
of view: many English words are NOT spelled to match spoken English. The 
second problem is that with "natural" spoken English, some words are spoken 
differently (actually different phonemes) depending on context. So you have to 
do things: convert the text to a sequence of phonemes, but not just a 
word-by-word lookup and replace, but analysing whole sentences.

The "Stephen Hawkins" (aka "Speak And Spell") style of "Text to Speech"
systems either do a direct latin alphabet => phoneme translation or a simple
word=>phoneme sequence lookup. While both of these do produce generally
intellegable speech, it does sound "strange" and "unnatural" (clasic 1950s
SciFi movie evil robot).

Note: most of the speech oriented assistant systems are NOT generally using
true "Text to Speech", but are mostly using recorded voice. This is generally
also true for phone answering systems. For some things words (like digits,
letters, months, state names, etc), but often whole phrases.  I supose some AI 
systems might be doing "intellegent" phoneme generation and maybe might be 
"modulating" the phonemes to produce "natural" human voice (as opposed to 
gender netural "Stephen Hawkins" / "Speak And Spell" phonemes).

> 
> thanks
> 

-- 
Robert Heller             -- Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software        -- Custom Software Services
http://www.deepsoft.com/  -- Linux Administration Services
heller at deepsoft.com       -- Webhosting Services