[i18n] Input Method and Fonts improvements for Gutsy
Arne Goetje
arne at ubuntu.com
Fri Aug 10 11:35:34 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Matt Zimmerman wrote:
> On Tue, Aug 07, 2007 at 01:15:23AM +0800, Arne Goetje wrote:
>> 1. Input Method (SCIM):
>> Both Live CD and default installation come with the SCIM package
>> installed, however it is not properly set up, so that the user actually
>> cannot use it.
>
> This was working at one point; Michael Vogt was involved with it. CCing
> him.
>
>> SCIM depends on some environment variables and the SCIM demon started in
>> the background. There is a nice tool, called im-switch, which takes care
>> of this.
>
> im-switch is installed by language-support packages corresponding to
> languages which require it. The trouble, of course, is that none of these
> are installed on the live CD due to space constraints.
>
> So we may need to find a way to get scim installed, but selectively enabled
> depending on the language, or perhaps rethink the way we handle
> language-support.
OK, I did some more tests...
SCIM does work, if the user makes a right mouse click in the application
window and selects "Input Method -> SCIM". This works in all UTF-8 locales.
But as this step is not obvious to the general novice user, I recommend
to set the environment variable(s)
GTK_IM_MODULE=scim (and QT_IM_MODULE=scim). That way SCIM works like
expected.
For the Live CD, this approach is enough, but for the default
installation the scim-bridge-* packages should also be installed. I'll
have to dig a bit further how they need to be configured, but the are
supposed to solve a few problems with 3rd party applications (Acrobat
Reader, Skype, etc.)
>> I highly recommend, that we put the following packages and their
>> dependencies into the Live CD and the default installation to make it
>> become more useful:
>> * scim-anthy or scim-prime: Japanese input methods, scim-prime is a
>> dictionary based IM, which has a great advantage over anthy. Although
>> both are widely used in Japan.
>> * scim-chewing: Traditional Chinese phonetic IM, widely used in Taiwan
>> * scim-pinyin: Simplified and Traditional Chinese Pinyin IM, widely
>> used in China and by foreigners in Taiwan. ;)
>> * scim-hangul: As the name says it - Korean.
>> * scim-tables-zh: additional table based IMs for Simplified and
>> Traditional Chinese, many of them are popular in China, Hong Kong and
>> Taiwan.
>> * scim-thai: well, Thai. :)
>> * scim-m17n: bridge to the m17n library, which adds a lot of additional
>> IMs, including Latin based ones for the European languages with
>> diacritics. (not everyone likes to fiddle with XKB settings. ;) )
>
> As with im-switch, these modules are installed by the relevant
> language-support packages. It would be useful for you to review their
> dependencies and establish whether they are correct. We can then make
> decisions on language support simply by selecting the relevant
> language-support package, which will conveniently keep track of which
> packages are relevant for which languages.
Well, if I need to input Chinese and Japanese on an English system, I
don't want to install a few dozen files from the language packs,
especially if the translations are useless for me anyways. ;)
Installing all above mentioned modules with their dependencies on the
Live CD, needs about 48MB additional space. (I selected scim-anthy here
over scim-prime).
If we remove some font packages and create a core-fonts package instead,
we can save about 30 MB or more (see below).
>> The following packages may NOT be installed:
>> * scim-uim: BROKEN, will trash the SCIM setup tool. Don't install it.
>> * scim-chinese: old version of scim-pinyin, not compatible with the
>> current scim package; breaks dependency handling.
>
> scim-uim seems to be installed with Edubuntu only. What is the trouble with
> it? Can it be fixed? If not, should it be removed entirely?
scim-uim is not actively maintained. When this package is installed, the
SCIM setup tool (GUI) always crashes with a segfault. Removing the
package solves the issue.
> Likewise for scim-chinese. We don't seem to be using it, so if it isn't
> needed, it should probably be removed to reduce confusion.
scim-chinese is the old version of scim-pinyin. The package got renamed
with the SCIM API change between 1.2.x and 1.4.0. scim-chinese does not
work with the current scim version and actually conflicts with it.
Therefor it should be removed.
>> 3. Fonts:
>> b) Font packages:
>> Option 1: We craft a seperate package, just for the Live CD and put
>> selected fonts from the other font packages together, just for this
>> single purpose.
>> Caveat: might conflict with the other font packages (duplicate fonts
>> files), should probably not be used on the default installation on the
>> users' harddisks.
>
> This is an interesting idea, as it would allow us to continue to provide
> legible fonts for many languages without creating so much confusion with a
> huge number of default fonts.
I have spent some time to compare the default installed fonts on the
Live CD with additional fonts available in the repositories.
Currently the /usr/share/fonts/truetype/ directory uses about 94 MB of
space.
Below is a list of fonts, I consider necessary as core fonts to display
all kinds of scripts. I made the selection with screen readability and
complex font requirements in mind.
- ------------------------------------------------------------------------
Font Name Package Scripts Filesize
- ------------------------------------------------------------------------
DejaVu Sans ttf-dejavu Multiple 519412
DejaVu Sans Bold ttf-dejavu Multiple 493320
DejaVu Sans Mono ttf-dejavu Multiple 289712
DejaVu Sans Mono Bold ttf-dejavu Multiple 278376
DejaVu Serif ttf-dejavu Multiple 213360
DejaVu Serif Bold ttf-dejavu Multiple 204988
MgOpenCanonica ttf-mgopen Greek 281580
MgOpenCanonica Bold ttf-mgopen Greek 284968
MgOpenModerna ttf-mgopen Greek 60404
MgOpenModerna Bold ttf-mgopen Greek 57592
Abyssinica SIL ttf-sil-abyssinica Ethiopian(Amharic) 619012
Ezra SIL ttf-sil-ezra Hebrew 153392
PakType Tehreer ttf-paktype Arabic, Farsi, Urdu 308756
Scheherazade ttf-scheherazade Arabic, Farsi, Urdu 260392
Lohit Bengali ttf-bengali-fonts Bengali 138536
Chandas ttf-devanagari-fonts Devanagari 2584956
Lohit Gujarati ttf-gujarati-fonts Gujarati 79168
Lohit Kannada ttf-kannada-fonts Kannada 186364
AnjaliOldLipi ttf-malayalam-fonts Malayalam 433556
Lohit Oriya ttf-oriya-fonts Oriya 93140
Saab ttf-punjabi-fonts Punjabi 114092
Lohit Tamil ttf-tamil-fonts Tamil 64760
Pothana2000 ttf-telugu-fonts Telugu 194268
Padauk ttf-sil-padauk Myanmar 146104
Padauk Bold ttf-sil-padauk Myanmar 148632
Khmer OS System ttf-khmeros Khmer 265624
PhetsarathOT ttf-lao Lao 92828
Loma ttf-thai-tlwg Thai 37140
Loma-Bold ttf-thai-tlwg Thai 37964
AR PL ShanHeiSun Uni ttf-arphic-uming Han 20890468
UnBatang ttf-unfonts Hangul 3678974
UnBatangBold ttf-unfonts Hangul 4070868
UnDotum ttf-unfonts Hangul 2209390
UnDotumBold ttf-unfonts Hangul 2808360
Sazanami Mincho ttf-sazanami-mincho Japanese 10554196
Sazanami Gothic ttf-sazanami-gothic Japanese 7690324
SIL Yi ttf-sil-yi Yi 463336
TibetianMachineUniAlpha ttf-tmuni Tibetian, Dzongkha 1355768
- -------------------------------------------------------------------------
Total 62364080
- -------------------------------------------------------------------------
* The filesizes for DejaVu and AR PL ShanHeiSun Uni fonts are those
from the current packages, newer versions will differ.
* DejaVu should be upgraded to 2.18 to include Georgian script.
* Paktype Tehreer and Scheharazade both contain almost the same glyphs
and face and I think only one of them is needed. They are supposed to
replace the ttf-arabeyes fonts, because those lack Farsi and Urdu support.
* Question is if we need to keep the Bold versions... cold save some
additional space.
* the Unfonts fonts are supposed to replace the Baekmuk fonts.
* The Sazanami fonts are supposed to replace the Kochi fonts.
* All these fonts are supposed to be used instead of the DejaVu fonts
for their individual script coverage, because their complex script
support and/or shapes are better than DejaVu's.
* These fonts are supposed to be taken out of their packages and put
together into a new core-fonts packages. Installing their original
packages will waste a lot of space.
>> b) CJK fonts:
>> This topic really is... erm... difficult.
>> For the Arphic fonts (and probably also a Heiti (sans-serif, like DejaVu
>> Sans) and Yuanti (rounded, like Kochi Gothic) font) I have the following
>> in mind:
>> The problem is, that many characters share the same codepoint in
>> Unicode, but have a different shape (number of strokes and stroke order)
>> in the different CJK regions (China, Hong Kong / Macao, Taiwan, Japan,
>> Korea). This is one of the main reasons why users in these regions
>> prefer different fonts.
>> My approach would be to put all character shape variants into a single
>> TTC (TrueType Collection) and use a different glyph ID to Unicode
>> codepoint mapping for each "virtual font".
>> Instead of having 5 separate TTF files, each about 25MB in size, we
>> would end up with only one TTC file (about 30 MB in size), which
>> produces 5 "virtual fonts". Saves a lot of space. ;)
>>
>> (If you need more details about this technology, I can elaborate about
>> it in a follow up mail)
>
> This is a key problem, and an interesting proposed solution. Would this
> reqire any changes outside of the fonts themselves?
No. TTC works already with GTK2 and QT4 >= 4.3. OpenOffice.org is
supposed to work, at least it does on SuSE Linux... The debian package
seems to have a bug... it cannot use TTC correctly.
However, Qt3, GTK1 and other legacy software cannot use TTC.
Cheers
Arne
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGvD92bp/QbmhdHowRAqhgAJ0UIdahEzeJOjAOwfAb9k0WJWOYRwCZAba9
pmmVMvkKeh50ftDUrWmzA8Q=
=5wtI
-----END PGP SIGNATURE-----
More information about the ubuntu-devel
mailing list