eSpeak in Norwegian, part 1

Wed Dec 6 18:37:35 GMT 2006

Hi all,

Jonathan kindly generated some basic Norwegian voice files for eSpeak so 
I could start testing and giving feedback. He and I have exchanged a few 
emails about these files, but I'll take it to the open list now so that 
others can follow the process of figuring out how to do this.

While working on the Norwegian voice I'm also trying to work out a more 
streamlined procedure for doing this so that it will be easier for 
others to contribute to other languages later.

First, I think it's good to have a standard reference text for each 
language. I've selected the Wikipedia article on language 
(http://en.wikipedia.org/wiki/Language), which is itself available in 
many languages. It's far from identical in all languages, but it only 
needs to be an internal reference for each language. Ideally when a text 
file is made from that page it should be frozen so that it will be a 
reliable reference for discussion.

First I cleaned up the text a bit, removing some mark-up, table of 
contents and bullet points (it even helps to check the spelling in the 
original text ...). I then split article up into individual files for 
each paragraph and stored them in my home directory in the following 
structure:

~/espeak/text/no/no-lang01.txt -02, etc.

I then generated .wav files of each text file using the initial voice 
files. I first made files at the standard speed of -s160, but found that 
slower files were easier to analyse and settled on -s100. It may be 
different for other languages or listeners though (and most people will 
set a much higher speed when actually using the voices).

While I was at it I did the same for Spanish, Polish and Swedish (it's 
good to have some competition among neighbours!). The .wav files are 
rather large so I ran 'oggenc *' to compress them to ogg.

I should also say a few words about setting up eSpeak at this point. I 
downloaded the latest version which Jonathan provided from here: 
http://espeak.sourceforge.net/test/espeak-1.17k.zip

eSpeak was already installed on my system and I didn't want to play too 
much with that. I just put in a quick hack to use the old application 
with the new data files. I unzipped the new espeak in my home directory 
and placed the data files in ~/espeak/

in /usr/share I did:
sudo mv espeak-data/ espeak-data-orig
sudo ln -s /home/henrik/espeak/espeak-data/ espeak-data

I'm sure someone can come up with a better way to do this :)

So, after generating the .ogg files it's time to start debugging them. I 
found using pre-generated sound files to be quite handy because then you 
can pause and rewind (unfortunately seeking is degraded in the ogg 
compression step). We could also get native speakers who are not yet 
using Linux to listen to the files and report back.

Which raises the next question: What is the most useful form I can 
provide feedback in? I've made some comments on individual words below, 
mostly vowel sounds, but I suspect a more informed comment about the 
phenomes might be better. I guess having the native listener tweak the 
language files directly would be ideal but I'll need to grok more of the 
eSpeak toolchain to do that.

I've tarred up that directory I was working on and uploaded it here:
http://people.ubuntu.com/~henrik/espeak/espeak-files-heno.tar.gz

But without ogg files, which I've placed separately here:
http://people.ubuntu.com/~henrik/espeak/ogg/

There is also a simple python script in there to help with the .wav 
generation, though that could be much improved.

The results from my first listening test:

--------------

Språk (ubestemt) betegner menneskenes[1] måter[2] å kommunisere[3] på. 
Bevisst[4] kommunikasjon skjer[5] ved hjelp av lydspråk[6], tegnspråk og 
skriftspråk, ubevisst kommunikasjon for eksempel ved kroppsspråk. 
Språkvitenskap[7] betegnes som lingvistikk[8].

[1] The 3rd 'e' is too long
[2] 'r' needs to be more pronounced
[3] The last 'e' has the wrong tone/flavour Sounds like an æ, should be 
like 'Long E' on [*]
[4] The 'e' is too long/too much emphasis, and the i should be very 
short (double consonant rule)
[5] needs a longer 'e' More like 'Long E' on [*]
[6] the 'y' sounds like the 'ee' in Leeds, but should be like 'Long Y' 
in [*]
[7] 'å' needs to be longer like 'Long Å' on [*]
[8] needs a shorter 'i'

[*] http://frodo.bruderhof.com/norskklassen/sounds-g.htm

-------------

Please try this approach if there is some basic language support for you 
native language in espeak so we can streamline the process further. Thanks!

Henrik