[i18n] Input Method and Fonts improvements for Gutsy

Arne Goetje arne.goetje at canonical.com
Mon Aug 6 18:15:23 BST 2007

Hash: SHA1

Dear all,

I have taken a first look at the current font and input method situation
in Gutsy (Tribe 3 Live CD and up to date installation on HDD) and have a
few suggestions to make.

1. Input Method (SCIM):
Both Live CD and default installation come with the SCIM package
installed, however it is not properly set up, so that the user actually
cannot use it.
SCIM depends on some environment variables and the SCIM demon started in
the background. There is a nice tool, called im-switch, which takes care
of this.
The purpose of im-switch is to give the user a simple frontend to choose
which input method program (s)he wants to use. For most users with
non-Latin based alphabets, this should be SCIM, as it clearly supports
most languages and scripts. However, some Asian users might prefer a
different application, like IIIMF or gcin (especially in Taiwan).
im-switch will take a parameter, whether or not it should do the setting
system wide or in the user scope only, and the name of the input method

So, for making SCIM the system wide default, the following should be
done on the Live CD and in the default installation:
1. install and configure scim and its modules
2. install im-switch
3. run as root:
	im-switch -s scim
After a relogin, the user can toggle SCIM on/off by pressing CRTL+SPACE.

2. SCIM modules:
The default installed scim module packages are:
 * scim-modules-table
 * scim-tables-additional (Russian and Indic IMs)
I highly recommend, that we put the following packages and their
dependencies into the Live CD and the default installation to make it
become more useful:
 * scim-anthy or scim-prime: Japanese input methods, scim-prime is a
dictionary based IM, which has a great advantage over anthy. Although
both are widely used in Japan.
 * scim-chewing: Traditional Chinese phonetic IM, widely used in Taiwan
 * scim-pinyin: Simplified and Traditional Chinese Pinyin IM, widely
used in China and by foreigners in Taiwan. ;)
 * scim-hangul: As the name says it - Korean.
 * scim-tables-zh: additional table based IMs for Simplified and
Traditional Chinese, many of them are popular in China, Hong Kong and
 * scim-thai: well, Thai. :)
 * scim-m17n: bridge to the m17n library, which adds a lot of additional
 IMs, including Latin based ones for the European languages with
diacritics. (not everyone likes to fiddle with XKB settings. ;) )

The following packages may NOT be installed:
 * scim-uim: BROKEN, will trash the SCIM setup tool. Don't install it.
 * scim-chinese: old version of scim-pinyin, not compatible with the
current scim package; breaks dependency handling.

3. Fonts:
 a) language selector:
The idea with the language selector handling the fontconfig
configuration is nice, however, it needs some tweaking:
 * more languages: I will add more config files for more locales; needs
some testing and probably some community feedback.
 * Question: how to handle those config files which come with the font
packages? Font preference handling should be done by language selector,
while font specific options can remain the the config files installed by
the font packages? If that's the case, we need to check all the font
packages and tweak those where it's not the case.

 b) Font packages:
I see a problem: space on the Live CD is a bit "restricted"... but some
font packages come with multiple fonts inside and install them all, even
if we don't need them. This wastes precious space.
I'm still trying to get an overview about which fonts cover which
Unicode ranges and which fonts should be taken into account for the
three Alias fonts "sans-serif", "serif" and "monospace".
Bottom line: Some font packages come with fonts we don't need for this
purpose. Question: how to handle this?

Option 1: We craft a seperate package, just for the Live CD and put
selected fonts from the other font packages together, just for this
single purpose.
Caveat: might conflict with the other font packages (duplicate fonts
files), should probably not be used on the default installation on the
users' harddisks.

Option 2: We split the font packages into 2: a "base" package with the
fonts we need for the Live CD and an "extra" package, where the rest of
the fonts are in.
Caveat: it's not always easy to draw the line which font should be in
base and which ones in extra. Users might get confused.

I would probably prefer option 1... less work, if we can restrict it to
the Live CD only.

 c) rendering issue in Chinese locale environment:
This might be a bug in my chinese font package, I will take care of this
and provide a new package for Gutsy.

4. Improvements for Gutsy+1
I expect that we don't have enough time to implement these improvements
into Gutsy, therefor we should probably postpone them for the next release:
 a) Language selector:
It would be useful, if the user could have an Advanced button in the
language selector, where (s)he can adjust his/her preferred fonts and
translation order. Just like you have a list of available fonts and you
move them up or down according to your own preference. And the same
should be possible for translations:

There are users who live in a foreign country and whose language ability
is not good enough to use that country's locale settings, but use their
native language instead. However, they need to use their host country's
writing system.

Take me as example: I'm from Germany, but live in Taiwan. On my computer
I prefer en_US as my default locale, but need to display Chinese
characters probably. Therefor I prefer the Arphic font over the Baekmuk
or Kochi fonts.
Another foreigner living in Japan, might have the same issue but prefers
the Kochi font over the Arphic and Baekmuk ones.

There are also users who depend on translations, but sometimes meet the
situation, that a translation is not available in their native language.
The default fallback is English. But maybe that user is not very good in
understanding English and prefers a different fallback language, or set
of languages: For example, a Taiwanese user who uses Traditional
Chinese, might prefer Simplified Chinese and then Japanese as fallback
and not English.

So: have a Advanced button in the language selector, which pops up a new
window with two Tabs: one for setting the preferred fonts and one for
translation fallbacks.

 b) CJK fonts:
This topic really is... erm... difficult.
For the Arphic fonts (and probably also a Heiti (sans-serif, like DejaVu
Sans) and Yuanti (rounded, like Kochi Gothic) font) I have the following
in mind:
The problem is, that many characters share the same codepoint in
Unicode, but have a different shape (number of strokes and stroke order)
in the different CJK regions (China, Hong Kong / Macao, Taiwan, Japan,
Korea). This is one of the main reasons why users in these regions
prefer different fonts.
My approach would be to put all character shape variants into a single
TTC (TrueType Collection) and use a different glyph ID to Unicode
codepoint mapping for each "virtual font".
Instead of having 5 separate TTF files, each about 25MB in size, we
would end up with only one TTC file (about 30 MB in size), which
produces 5 "virtual fonts". Saves a lot of space. ;)

(If you need more details about this technology, I can elaborate about
it in a follow up mail)

Caveat: QT3 does not support TTC fonts. GTK2 however has no problem with
it. QT4 >= 4.3 is also able to use them.
So, I basically wait until KDE4 is released and adopted into Ubuntu.
Otherwise KDE users can't use the TTC fonts.

That's it for the moment, if you have some opinion about one of these
issues, please speak up. :)

Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the ubuntu-devel mailing list