Voice Recognition for Linux

Marvin Raaijmakers marvin.nospam at gmail.com
Tue Feb 20 13:12:30 GMT 2007


Well I have a little experience with Sphinx2. A few years ago I played a
bit with perlbox voice (http://www.perlbox.org/). This application uses
sphinx2 for launching applications by voice commands. That worked quite
well, but it isn't the thing you're looking for (= using voice
recognition for writing texts). But maybe such an application can be
build by using sphinx.

- Marvin Raaijmakers

On Tue, 2007-02-20 at 12:38 +0000, Chris Hayes wrote:
> Thanks or the feedback Eric. Is it really this hopeless? You talked
> about the Sphinx projects being okay - but not ready for normal users.
> To what extent are they capable? I'd really love to know if you or
> anyone else has tried them. 
> 
> I have looked into them but haven't had the time (and not being a very
> capable technical user) to get them going, orto get them going nicely.
> If I knew how well they worked, I'd probably be more inclined to use
> the time I don't have getting them working. 
> 
> Chris Hayes
> 
> 
> On 19/02/07, Eric S. Johansson <esj at harvee.org> wrote:
>         Chris Hayes wrote:
>         > Hi - I was wondering whether anyone here might know about
>         what voice
>         > recognition software is currently available for Linux.
>         
>         (warning, I am an unrepentant curmudgeon and negative
>         filter.  Interpret 
>         the following accordingly.  If I'm wrong on any points, and
>         someone
>         wants to correct me, I will gladly learn.)
>         
>         In a nutshell, not much.  Sphinx 4, and others of its family,
>         you have
>         some fairly decent recognition systems.  However, they are not
>         ready for 
>         prime time because if they were, people would be using them
>         for desktop
>         recognition.  while the recognition engines may work well, a
>         lot of the
>         ancillary pieces such as training, dealing with microphone
>         switching, 
>         dictionary management etc. are not quite there yet.  On the
>         other hand,
>         the same shortcomings can be laid at the feet of Linux and
>         Windows audio
>         subsystems.
>         
>         from my perspective, the only usable speech recognition for
>         end users is 
>         naturally speaking.  There may be something on a Macintosh but
>         I don't
>         have any experience there.  The reason I say NaturallySpeaking
>         is the
>         only usable one is because it's a large vocabulary continuous
>         speech 
>         recognition system people used to get work done.  Recognition
>         engine,
>         language model, sound system interface, etc. etc.. have had
>         many years
>         to evolve.  nuance has had a couple of years to screw it up
>         and they've 
>         done a wonderful job at it.  I think the only positive
>         contribution they
>         have made during their stewardship of the product is the
>         addition of a
>         Bluetooth microphone audio model.
>         
>         The only way to get good speech recognition on Linux is for
>         someone to 
>         drop a small number of millions of dollars into nuance's lap
>         and pray.
>         Not a good solution.
>         
>         I've been thinking about an alternative model for a couple of
>         years in
>         between other projects but I do believe the best solution
>         (best defined 
>         as getting handicapped people working), would be to make use
>         of Windows
>         and Linux via virtual machines.  Since virtual machines do
>         horrible
>         things to sound systems, I would recommend using Windows as a
>         host OS
>         with speech recognition, a mediator to transfer
>         characters/commands/keystrokes to the Linux environment and a
>         mediator
>         to return window state information such as screen content,
>         application
>         running etc. etc.)
>         
>         There has been a primitive instance (which this has been taken
>         off the
>         net) to show the technique is fundamentally sound.  a full
>         function
>         mediator, while difficult, is a couple orders of magnitude or
>         more
>         easier to build than moving a large and complicated windows
>         application 
>         to Linux.
>         
>         in the short-term, run Linux on a virtual machine,  display
>         apps via X11
>         server, and use something like natpython and one of its macro
>         packages
>         to build commands for Linux applications.  nattext still bite
>         you in the 
>         ass  with all the random characters and inserts in
>         applications but,
>         that's nuances contribution.
>         
>         ---eric
>         
>         --
>         Speech-recognition in use.  It makes mistakes, I correct some.
> 




More information about the Ubuntu-accessibility mailing list