Acoustic Models for HUD
Steve Langasek
steve.langasek at ubuntu.com
Mon Feb 25 19:41:02 UTC 2013
On Mon, Feb 25, 2013 at 07:13:43PM +0100, Siegfried-Angel Gevatter Pujals wrote:
> It's great to hear that voice recognition in Ubuntu is finally getting some
> love :).
> The English Voxforge models are currently packaged in julius-voxforge.
> There I did go with the nightly builds there, since in addition to the time
> and disk size (which IMHO is already enough of a reason), it needed HTK to
> build, which is not redistributable. It'd also be interested in more
> opinions though.
In terms of freeness of the OS, depending on non-redistributable tools for
building the data files is more of an issue than whether we actually process
them at package build time. Is this a julius-specific requirement, or does
it also affect sphinx?
--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
slangasek at ubuntu.com vorlon at debian.org
> Am Montag, 25. Februar 2013 schrieb Ted Gould :
>
> > **
> > Howdy,
> >
> > As some folks may have noticed we're working on a voice input feature in
> > HUD. Part of what that requires is acoustic models to be available to
> > understand the speech coming in. Currently in Ubuntu there are a couple of
> > these, but we need to get to the point of providing for various languages
> > and having a way to update these continuously as the data gets better.
> >
> > So that leads to the question: How do we want these to look in Ubuntu?
> >
> > The best open source for training data appears to be Voxforge<http://www.voxforge.org>,
> > a collection of samples based on known text. These samples can then be
> > used to compile the acoustical model that the various libraries need. This
> > takes significant amounts of CPU time. Their most complete language is
> > English, which has about 100 hours of audio, and takes about 10 CPU hours
> > to compile the models that Sphinx needs. While English is the most
> > complete, I think it's important to realize that the best/worst case
> > scenario that supports all languages well could result in easily over a
> > thousand hours of CPU time.
> >
> > So if we think of things in the classic source vs. binary split, it seems
> > like the Voxforge data is the source and we should make a source package
> > that then builds these binary models. But, at some level, we're just
> > exchanging binary data (sound files) for different binary files (acoustic
> > models). Would it make more sense to package something like the Voxforge
> > nightly builds<http://www.repository.voxforge1.org/downloads/Nightly_Builds/>for use in Ubuntu?
> >
> > I'd love to hear people's thoughts on this. I'm leaning towards putting
> > the Voxforge data as a source package, as it is our source, but I'm worried
> > about the impact it may have on rebuilding the archive.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20130225/7c07bf1d/attachment.pgp>
More information about the ubuntu-devel
mailing list