Hi Ted,<div><br></div><div>It's great to hear that voice recognition in Ubuntu is finally getting some love :).</div><div><br></div><div>The English Voxforge models are currently packaged in julius-voxforge. There I did go with the nightly builds there, since in addition to the time and disk size (which IMHO is already enough of a reason), it needed HTK to build, which is not redistributable. It'd also be interested in more opinions though.</div>
<div><br></div><div>Out of curiosity, what's the plan for voice recognition in Ubuntu? Sphinx/Julius/Kaldi?</div><div><br></div>Regards,<div><br></div><div>Siegfried<span></span><br>
<div><br></div><div>Am Montag, 25. Februar 2013 schrieb Ted Gould :<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><u></u>
<div>
Howdy,<br>
<br>
As some folks may have noticed we're working on a voice input feature in HUD. Part of what that requires is acoustic models to be available to understand the speech coming in. Currently in Ubuntu there are a couple of these, but we need to get to the point of providing for various languages and having a way to update these continuously as the data gets better.<br>
<br>
So that leads to the question: How do we want these to look in Ubuntu?<br>
<br>
The best open source for training data appears to be <a href="http://www.voxforge.org" target="_blank">Voxforge</a>, a collection of samples based on known text. These samples can then be used to compile the acoustical model that the various libraries need. This takes significant amounts of CPU time. Their most complete language is English, which has about 100 hours of audio, and takes about 10 CPU hours to compile the models that Sphinx needs. While English is the most complete, I think it's important to realize that the best/worst case scenario that supports all languages well could result in easily over a thousand hours of CPU time.<br>
<br>
So if we think of things in the classic source vs. binary split, it seems like the Voxforge data is the source and we should make a source package that then builds these binary models. But, at some level, we're just exchanging binary data (sound files) for different binary files (acoustic models). Would it make more sense to package something like the <a href="http://www.repository.voxforge1.org/downloads/Nightly_Builds/" target="_blank">Voxforge nightly builds</a> for use in Ubuntu?<br>
<br>
I'd love to hear people's thoughts on this. I'm leaning towards putting the Voxforge data as a source package, as it is our source, but I'm worried about the impact it may have on rebuilding the archive.<br>
<br>
Thanks,<br>
Ted<br>
<br>
</div>
</blockquote></div>
</div><br><br>-- <br>Siegfried<br>