Speech recognition

Fri Jul 4 17:23:46 UTC 2008

On Thursday 03 July 2008 16:44, Derek Broughton wrote:
> Dotan Cohen wrote:
> > 2008/7/3 David Fletcher <kubuntu-users at thefletchers.net>:
> >> I suspect this is a long shot:-
> >>
> >> Does anybody know of any speech recognition software that can be used
> >> with Kubuntu?
> >>
> >> I have an acquaintance who has never owned a computer, but wants to use
> >> one to write a book. He can be nudged in the direction of using Kubuntu
> >> and LaTeX I think, but he says he's not got time to type it. He wants to
> >> dictate it directly into the text editor.
> >>
> >> Does anybody know of any applications that can do this please? I've
> >> tried all the search strings I can think of in the Adept installer, and
> >> Google has turned up nothing for me.
> >
> > He wants a secretary, not a computer. Speech recognition is used for
> > sending preconfigured commands to the computer, not for dictating
> > arbitrary text.
>
> With current state of the art, it _should_ be possible to dictate
> reasonably arbitrary text, but I really can't see how anybody who can't
> take the time to _write_ a book, will ever get a book written.
>
> > For instance, while your friend might dictate one
> > character discussing a new display with another character, the reader
> > might be surprised to see the characters discussing a nudist play.
>
> Right.  Just like OCR, however accurate it is, you'll spend a lot of time
> fixing the transcription errors.
> --
> derek

I  somehow that we're still a long way from getting accurate text from speech. 
It works well on Star Trek, where you can have a conversion with the 
computer, and Cap't Picard can submit his log. I remember one Star Trek film, 
where they went back in time, and Scottie tried to talk to an ancient 
computer, then realised that you had to type your request on a keyboard "of 
all things".

Text to speech works ok, but there you have a synthesizer (or is that 
synthesiser), which changes the text to speech. Words like "there", "their", 
and "they're", all sound the same, but have different meanings, but from the 
listeners viewpoint, the context tells you which is which.

Going the other way, speech to text, it's a whole different ballgame. Looking 
at the example above, and assuming that everone spoke exactly the same way 
(no problems with different dialects), the computer would still need to 
understand the context of what was being dictated, so as to print "there", 
"their", or "they're". of course when you bring different dialects into the 
equation, it gets really complex.

The differences I've found, are usually with the pronunciation of vowels, 
which can very often, at first, make it difficult to understand what someone 
is saying, but you sort of get tuned in after a while, but we are humaan, and 
not a computer.

In the UK there are some strong dialects, Geordie, Glaswegian (in scotland), 
and many others. Looking at 2 examples from Wales, and Northern Ireland, the 
word tongue in Wales is pronounced as tong, and the word film in Northern 
Ireland is pronounced as filim, and the name of the actor who plays Chief 
O'Brien in Star Trek, who's first name is Colm, is pronounced Colim.

At this point in time, I personally see a problem in computers converting 
speech to text.

I recently listened to a broadcast on the BBC's world service "Digital 
Planet", and Amtrak in the US seem to be using speech communication to a 
computer to get info for train times, etc.

I recently had a problem with a parcel not being delivered in France, and 
contacting Chronopost by telephone, was asked to speak my parcel reference No 
into the machine. On the premise that you ask, so I do, I spoke each letter, 
and number into the phone. Nothing. Then I'm asked to repeat the parcel 
reference, which I do, but still nothing. To be fair, I'm English, and 
perhaps the computer has some problem with my pronunciation. Now I appreciate 
that this was direct communication by speech with another machine, but I 
believe that accurate speech to text is going to take quite some time to 
achieve.

Just some observations, and comments.

Nigel.