Acoustic Models for HUD

Steve Langasek steve.langasek at
Mon Feb 25 19:41:02 UTC 2013

On Mon, Feb 25, 2013 at 07:13:43PM +0100, Siegfried-Angel Gevatter Pujals wrote:
> It's great to hear that voice recognition in Ubuntu is finally getting some
> love :).

> The English Voxforge models are currently packaged in julius-voxforge.
> There I did go with the nightly builds there, since in addition to the time
> and disk size (which IMHO is already enough of a reason), it needed HTK to
> build, which is not redistributable. It'd also be interested in more
> opinions though.

In terms of freeness of the OS, depending on non-redistributable tools for
building the data files is more of an issue than whether we actually process
them at package build time.  Is this a julius-specific requirement, or does
it also affect sphinx?

Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                          
slangasek at                                     vorlon at

> Am Montag, 25. Februar 2013 schrieb Ted Gould :
> > **
> > Howdy,
> >
> > As some folks may have noticed we're working on a voice input feature in
> > HUD.  Part of what that requires is acoustic models to be available to
> > understand the speech coming in.  Currently in Ubuntu there are a couple of
> > these, but we need to get to the point of providing for various languages
> > and having a way to update these continuously as the data gets better.
> >
> > So that leads to the question: How do we want these to look in Ubuntu?
> >
> > The best open source for training data appears to be Voxforge<>,
> > a collection of samples based on known text.  These samples can then be
> > used to compile the acoustical model that the various libraries need.  This
> > takes significant amounts of CPU time.  Their most complete language is
> > English, which has about 100 hours of audio, and takes about 10 CPU hours
> > to compile the models that Sphinx needs.  While English is the most
> > complete, I think it's important to realize that the best/worst case
> > scenario that supports all languages well could result in easily over a
> > thousand hours of CPU time.
> >
> > So if we think of things in the classic source vs. binary split, it seems
> > like the Voxforge data is the source and we should make a source package
> > that then builds these binary models.  But, at some level, we're just
> > exchanging binary data (sound files) for different binary files (acoustic
> > models).  Would it make more sense to package something like the Voxforge
> > nightly builds<>for use in Ubuntu?
> >
> > I'd love to hear people's thoughts on this.  I'm leaning towards putting
> > the Voxforge data as a source package, as it is our source, but I'm worried
> > about the impact it may have on rebuilding the archive.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <>

More information about the ubuntu-devel mailing list