Voice Recognition for Linux
Henrik Nilsen Omma
henrik at ubuntu.com
Thu Feb 22 20:25:53 GMT 2007
Eric S. Johansson wrote:
> (warning, I am an unrepentant curmudgeon and negative filter. Interpret
> the following accordingly. If I'm wrong on any points, and someone
> wants to correct me, I will gladly learn.)
>
> In a nutshell, not much.
I agree that it does look limited at the moment and that Naturally
Speaking is the only viable path. Via Voice is outdated on Linux and the
Windows version of NS is better anyway.
As an unrepentant optimist though I can see the following path forward:
In short: Create a copy-left (GPL) tool to transfer text from Naturally
Speaking on Windows to Linux.
A few starts have been made on this, but it needs to be organised as a
proper community project and driven forward by several people. The user
interface should aim to be better than what the native Windows NS
version has. It should be speech engine and OS agnostic. That way you'll
get people using it to transfer speech between all sorts of different
systems, and it will get more use and development. You should be able to
easily plug in a free engine like Sphinx (so these will be encouraged to
improve) or even Vista's native system, which will be very widespread.
My biggest gripe with NS is the editing interface. The actual
recognition is quite good IMO, but when you do make a mistake it is very
awkward to fix it without using the keyboard. If you give an edit
command and that is not understood correctly either then you get a
meaningless sentence and you are no longer able to easily correct the
one you originally wanted to fix. The end result is that to totally lose
the flow of what you were trying to express.
The user interface is what we would have to reconstruct in whole or in
part anyway, so it's no big loss. We should make it much more
configurable so you can work around whatever shortcomings it has and
encourage community contributions to improving usability. Use the NS
macro system to send custom commands and use scripting on the receiving
end to allow it to adapt to applications.
I presume the macro functionality in NS is configured so that the
pattern recognition is quite good on the macros you define yourself. So
when you say 'Paste in my address' it generally works. We can (ab)use
this macro facility for our own editing needs. We would define a set of
macros that would be processed by the NS engine and would give us a know
and parseable string.
So saying 'Macro: delete sentence' would actually insert the text
**MACRO-DELETE-SENTENCE** into the text stream. If you were watching the
text on the Windows system the real text would be interspersed with such
commands, but on the Linux system receiving the stream it would just Do
the Right Thing. The big advantage is that it's very configurable this
way so we can make it do what we want.
We might eventually be able to get the engine running in Wine. Frankly
I'm not too interested in having the whole NS run in Wine because of the
interface. If we can make a better interface and can demonstrate a need
for speech recognition (a commercial need) then we may well see the
owners of the code port the speech engine to Linux. Low latency kernels
should be a big draw for them as well.
Now we just need someone willing to go on the barricades and front such
a project :)
Perhaps we can start this off as a Google Summer of Code project.
Henrik
More information about the Ubuntu-accessibility
mailing list