[i18n] Input Method and Fonts improvements for Gutsy

Matt Zimmerman mdz at ubuntu.com
Tue Aug 7 15:30:37 BST 2007

Thanks for starting the discussion with this overview.

On Tue, Aug 07, 2007 at 01:15:23AM +0800, Arne Goetje wrote:
> 1. Input Method (SCIM):
> Both Live CD and default installation come with the SCIM package
> installed, however it is not properly set up, so that the user actually
> cannot use it.

This was working at one point; Michael Vogt was involved with it.  CCing

> SCIM depends on some environment variables and the SCIM demon started in
> the background. There is a nice tool, called im-switch, which takes care
> of this.

im-switch is installed by language-support packages corresponding to
languages which require it.  The trouble, of course, is that none of these
are installed on the live CD due to space constraints.

So we may need to find a way to get scim installed, but selectively enabled
depending on the language, or perhaps rethink the way we handle

> I highly recommend, that we put the following packages and their
> dependencies into the Live CD and the default installation to make it
> become more useful:
>  * scim-anthy or scim-prime: Japanese input methods, scim-prime is a
> dictionary based IM, which has a great advantage over anthy. Although
> both are widely used in Japan.
>  * scim-chewing: Traditional Chinese phonetic IM, widely used in Taiwan
>  * scim-pinyin: Simplified and Traditional Chinese Pinyin IM, widely
> used in China and by foreigners in Taiwan. ;)
>  * scim-hangul: As the name says it - Korean.
>  * scim-tables-zh: additional table based IMs for Simplified and
> Traditional Chinese, many of them are popular in China, Hong Kong and
> Taiwan.
>  * scim-thai: well, Thai. :)
>  * scim-m17n: bridge to the m17n library, which adds a lot of additional
>  IMs, including Latin based ones for the European languages with
> diacritics. (not everyone likes to fiddle with XKB settings. ;) )

As with im-switch, these modules are installed by the relevant
language-support packages.  It would be useful for you to review their
dependencies and establish whether they are correct.  We can then make
decisions on language support simply by selecting the relevant
language-support package, which will conveniently keep track of which
packages are relevant for which languages.

> The following packages may NOT be installed:
>  * scim-uim: BROKEN, will trash the SCIM setup tool. Don't install it.
>  * scim-chinese: old version of scim-pinyin, not compatible with the
> current scim package; breaks dependency handling.

scim-uim seems to be installed with Edubuntu only.  What is the trouble with
it?  Can it be fixed?  If not, should it be removed entirely?

Likewise for scim-chinese.  We don't seem to be using it, so if it isn't
needed, it should probably be removed to reduce confusion.

> 3. Fonts:
>  a) language selector:
> The idea with the language selector handling the fontconfig
> configuration is nice, however, it needs some tweaking:
>  * more languages: I will add more config files for more locales; needs
> some testing and probably some community feedback.

Michael should be able to guide you on how to contribute these changes.

>  * Question: how to handle those config files which come with the font
> packages? Font preference handling should be done by language selector,
> while font specific options can remain the the config files installed by
> the font packages? If that's the case, we need to check all the font
> packages and tweak those where it's not the case.

A good question, and one which I believe Michael struggled with.  I think
further communication with fontconfig upstream is required in order to
understand all of the issues and get a consensus about how this should work.

>  b) Font packages:
> I see a problem: space on the Live CD is a bit "restricted"... but some
> font packages come with multiple fonts inside and install them all, even
> if we don't need them. This wastes precious space.
> I'm still trying to get an overview about which fonts cover which
> Unicode ranges and which fonts should be taken into account for the
> three Alias fonts "sans-serif", "serif" and "monospace".
> Bottom line: Some font packages come with fonts we don't need for this
> purpose. Question: how to handle this?

Our vision for Ubuntu to date has been as follows:

- The default ISO boots in English, but may contain additional language
  support where space is available.  The installer contains all available
  language support.

- Users who do a default installation should be able to properly display a
  variety of scripts, therefore corresponding fonts are installed by

- Users may use the language selector or the installer to enable full
  support for a variety of languages, including both program translations
  and input

This attempts to cover the following use cases:

- The user uses their computer in English, but is able to read one or more
  other languages (e.g., via web pages)

- The user is able to install Ubuntu using the translated installer using
  latin input, and is connected to the Internet to download translations and
  input methods for their language

We believe this covers a fairly wide range of users, but there are certainly
cases which are not covered.  Some of the problems with the current approach

- The live CD environment is not localized, so the user may not realize that
  full localization is available for their language until they have
  committed to a full installation.  A common solution proposed for this is
  that of localized CDs, but the main blocker for this is the need to
  organize testing by people with local language skills.  This is not
  impossible, but requires a substantial amount of work and hasn't been a
  high priority.

- Users who only care about their primary language are distracted by a large
  font selection.  This is most visible in OpenOffice.org.  However, if
  these fonts are limited to users who localize their desktop, it means that
  many web pages will be displayed with ugly boxes, etc., so there is a
  certain tradeoff here.  We're quite open to changing it, but it should be
  considered carefully.  It may be that we want to provide a simple way to
  install foreign language fonts if we remove them from the default

- There is a general lack of clarity about the different ways in which the
  user might want to use a language: display of documents (fonts),
  translation of programs (gettext), text input (input methods), etc.  The
  language support selector handles the second two, while the former we
  attempt to handle by default for many languages.  The distinction between
  language-pack and language-support is awkward (though hidden from the user
  by the language selector), and I'm not convinced that there are cases
  where one is desirable, but not the other.

- Several font packages contain multiple fonts, and support multiple
  languages, and we don't have a good idea of how everything fits.

- There are a variety of input methods, and we don't know for sure which
  ones are right for a particular locale.

> Option 1: We craft a seperate package, just for the Live CD and put
> selected fonts from the other font packages together, just for this
> single purpose.
> Caveat: might conflict with the other font packages (duplicate fonts
> files), should probably not be used on the default installation on the
> users' harddisks.

This is an interesting idea, as it would allow us to continue to provide
legible fonts for many languages without creating so much confusion with a
huge number of default fonts.

> 4. Improvements for Gutsy+1
> I expect that we don't have enough time to implement these improvements
> into Gutsy, therefor we should probably postpone them for the next release:
>  a) Language selector:
> It would be useful, if the user could have an Advanced button in the
> language selector, where (s)he can adjust his/her preferred fonts and
> translation order. Just like you have a list of available fonts and you
> move them up or down according to your own preference. And the same
> should be possible for translations:

This would probably be a good way to expose the difference between display
of documents, translation of programs, and text input.  There is a great
deal of complexity in managing the relevant settings for all of those,

>  b) CJK fonts:
> This topic really is... erm... difficult.
> For the Arphic fonts (and probably also a Heiti (sans-serif, like DejaVu
> Sans) and Yuanti (rounded, like Kochi Gothic) font) I have the following
> in mind:
> The problem is, that many characters share the same codepoint in
> Unicode, but have a different shape (number of strokes and stroke order)
> in the different CJK regions (China, Hong Kong / Macao, Taiwan, Japan,
> Korea). This is one of the main reasons why users in these regions
> prefer different fonts.
> My approach would be to put all character shape variants into a single
> TTC (TrueType Collection) and use a different glyph ID to Unicode
> codepoint mapping for each "virtual font".
> Instead of having 5 separate TTF files, each about 25MB in size, we
> would end up with only one TTC file (about 30 MB in size), which
> produces 5 "virtual fonts". Saves a lot of space. ;)
> (If you need more details about this technology, I can elaborate about
> it in a follow up mail)

This is a key problem, and an interesting proposed solution.  Would this
reqire any changes outside of the fonts themselves?

 - mdz

More information about the ubuntu-devel mailing list