[Pkg-fonts-devel] Fwd: Draining the font swamp

"Arne Götje (高盛華)" arne at linux.org.tw
Tue May 29 03:30:10 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matt Zimmerman wrote:
>> * GTK2 apps use fontconfig
>> * QT3 apps do not use fontconfig, at least not for the alias fonts
>> serif, sans and monotype
>> * Legacy X apps and GTK1 also cannot use fontconfig. They only use xft
>> to select the fonts. For these ones defoma is still neded.
> 
> What does QT3 use?

Ok, it seems QT3 can use fontconfig to see the available fonts, but the
alias fonts sans, serif and monospace don't work. At least selecting
glyphs from multiple fonts doesn't work.
QT3 comes with qtconfig-qt3, where the user can define substitute fonts.
Those have to be setup manually. That one is supposed to work, but I
didn't try it yet. (I'm a GTK2 user, because of certain QT3 limitations)

> I am not very concerned for legacy X apps or GTK1; I think we are
> approaching a time when such packages can be expected to declare an explicit
> dependency relationship to obtain their fonts if necessary, rather than
> assuming that they are available.

hmm... I don't know the details how X handles fonts... need to ask
someone else...

>>> - Which fonts are any good, and for which languages (no easy answer here)
>> IMHO the following needs to be done:
>>
>> 1. classify the available fonts into "Decorative" and "Desktop" fonts.
>> With "Decorative" I mean fonts, which are nice for *printing*, "Desktop"
>> refers to fonts which are suitable for *screen* display.
>> Example for Decorative fonts: AR PL ZenKai family, this is a brushstroke
>> CJK font, which is nice for printing documents, but horrible for screen
>> display.
>> Example for Desktop fonts: DejaVu sans. It's a smart font and a very
>> simple stroked font, makes it perfect for screen display... no issues
>> with hinting and so on...
> 
> Given that this needs input from users of many character sets, perhaps
> creating a wiki page would be a good start.

yes... first a list of all available fonts in Debian/Ubuntu, then
classify them whether or not they are suitable for screen display and if
yes, which Unicode region (some are fine for other regions than Latin ;) ).

>> 2. A list of default fonts should be made for certain languages (this is
>> only interesting for screen display):
>>  * For Latin script DejaVu Sans and the SIL fonts for sans and serif
>> respectively should be on the top of the list. Both are smart fonts
>> which can compose almost any diacritical combination, as for Vietnamese,
>> European languages and African languages based on Latin script. Also
>> both fonts include the full list of IPA characters...
> 
> Which one of the SIL fonts do you mean?  Is it packaged and available in
> Debian and Ubuntu?

ttf-sil-doulos
ttf-sil-charis

both are smart serif fonts. Doulos is the more popular one.

> In Ubuntu, we currently use DejaVu for both sans and serif.

the DejaVu Serif fonts are not smart enough... :( DejaVu Sans is better
though, but not optimal.

>>  * For CJK this is difficult. Here we have several issues to take care of:
>>   a) embedded bitmap glyphs are needed to render acceptable glyphs in
>> small fontsizes (12 pt and below)
> 
> Is this an inherent limitation, or one which is specific to the free
> rendering engines currently available?  Is it possible to overcome this
> problem without creating so many bitmap glyphs?  Who creates them?

the problem is stroke hinting in small fontsizes (< 16pt). The
autohinting feature in freetype does not produce good enough glyphs,
especially for the more complex glyphs with many strokes. It would be
possible to use binary hinting, but a) binary hinting is patented by
Adobe and b) this is a whole lot of work for the font maintainer, you
can spend years until you have hinted all the CJK glyphs. This is
Sisyphus work...
So, CJK font maintainers include bitmaps to bypass the rendering engine
to produce glyphs on the screen. Many glyphs have much more strokes than
fit into a 16x16 bitmap grid (12 pt). So, the art of embedding bitmaps
is, to leave information away, but keep it distinguishable for the user.
(It doesn't always work, though...)

>>   b) there is currently no font avaliable which covers all CJK glyphs in
>> Unicode
>>   c) we don't have any acceptable sans-serif font for CJK
>>   d) currently we can only use the ming/mincho style for screen display,
>> the Kochi Mincho and AR PL ShanHeiSun Uni fonts contain embedded bitmaps
>> already.
>>   e) CJK glyphs in China, Hong Kong, Taiwan, Japan and Korea have
>> different shapes which share the same Unicode codepoints. The available
>> fonts (ttf-arphic-uming, ttf-kochi-mincho and ttf-unfonts) overlap each
>> other in the CJK range, which confuses fontconfig! Chinese users usually
>> prefer the ttf-arphic-uming package, while Japanese users might prefer
>> the ttf-kochi-mincho or ttf-sazanami-mincho fonts and Korean users stick
>> with the ttf-unfonts package. What makes it worse is, that fontconfig
>> comes with a predefined list of fonts which should be preferred. This
>> does not suit all CJK users, as they have different preferences.
> 
> This seems like a real mess.  In Ubuntu, we try to work around this by
> changing font preferences depending on which supported languages are
> selected by the user, but it is not ideal.  What if the system needs to
> support more than one of these?

That's exactly the problem. It is a real mess.
See below for the explanation of what we need to change in fontconfig
(or better, upstream should change it!)

>> For all these CJK issues I'm working on a solution. But it takes time
>> until it's ready.
> 
> Where can we learn more about your work?

I didn't put any web page up for that. But I can tell you now. :)
I'm the font maintainer of the ttf-arphic-{ukai|uming} packages, my
project is CJK-Unifonts and it aims to provide a free set of fonts
covering all CJK glyphs currently in Unicode for all CJK regions (China,
Hong Kong/Macao, Taiwan, Japan, Korea).
Currently it's still a mess, but I'm getting somewhere... slowly...
For now, the fonts support Big5, GB2312 and HKSCS (Hong Kong
supplemental charset), but the glyph shapes are those which came with
the font and do not follow any standard... this is a problem many users
have with the fonts.
Currently the fonts come in two styles, Unicode and MBE (MBE is only of
interest for Taiwanese users and even then optional).
For the next release I plan to distribute the fonts as ttc (truetype
collection), which allows me to shrink the font size dramatically.
For now, each font is about 20MB in size, but the difference between the
Unicode and MBE styles are only 12 glyphs. So, it's actually  a waste of
space to have two full size fonts around. A single TTC file would save
about 50% of space in this case.
For the future I plan to include glyphs for the different CJK regions
(if they differ). Compared to providing seperate fonts for each region
(at current size of 20MB, that would be at least 6 times (China, Hong
Kong/Macao, Taiwan, Japan, Korea, Taiwan MBE) as much, while with a
single TTC it would be only around 22 MB total or so...
The user however still sees 6 different fonts in his system and would
have to choose which style he wants to use as default for CJK glyphs.
That one would have to be configured in fontconfig.

I'm also working on a sans-serif CJK font (DejaVu style) for screen display.

>>> - Which criteria are important for selecting which font to use in which
>>>   context (language, character set, ...)
>> 1. locale setting is probably best to determine the default preferred
>> fonts, but the user should have a possibility to change it.
> 
> As I understand it, this is something that fontconfig does not consider.  I
> think Michael (CCed) has some background from discussions about this.

yes, fontconfig currently does not support this.

>> 2. documents which use the ODF format can have paragraphs or even single
>> letters marked with a language tag. Those can override the default
>> system font settings.
>> 3. fontconfig should take care of selecting the proper font for each
>> script. But it needs some tuning IMHO... see below!
>>
>>> - Whether fontconfig requires adjustments in order to respect those criteria
>> YES! IMHO fontconfig needs some big improvement.
>> Currently fontconfig's decision on which font to use for a specific
>> glyph is influenced by the following:
>>  * which charsets the font package has registered in defoma. If the font
>> has registered itself for ISO8859-1, fontconfig will consider the font
>> suitable for that codepoint range. Same is true for all other charsets.
>>  * how many glyphs are covered by the font and in which codepoint
>> regions (determined by fontconfig)
>>  * the config files provided by fontconfig contain a default font
>> preference ordering. This might not suit all users... and usually does not.
>>  * config files provided by font packages, which can label their own
>> fonts as preferred ones and override the system wide setting by this.
> 
> Right, there doesn't seem to be a way to tell fontconfig which language is
> in use in a particular context, or even good defaults for the system as a
> whole.
> 
>> The following needs to be done:
>>  * the default config files which specify any font preference should be
>> removed!
> 
> Why?  Surely we need to have a globally consistent view of which fonts are
> preferred under which circumstances.  The user should of course be able to
> override this, but we need to have a common starting point.
> 
> If it is because the existing preferences are not flexible enough, then we
> should try to fix that instead.

They are not flexible enough.
The default installation comes with the following config files enabled:
40-generic.conf
- ---------------------------------
<!--
  Serif faces
 -->
        <alias>
                <family>Bitstream Vera Serif</family>
                <family>DejaVu Serif</family>
                <family>Times New Roman</family>
                <family>Times</family>
                <family>Nimbus Roman No9 L</family>
                <family>Luxi Serif</family>
                <family>Kochi Mincho</family>
                <family>AR PL SungtiL GB</family>
                <family>AR PL Mingti2L Big5</family>
                <family>MS 明朝</family>
                <family>Baekmuk Batang</family>
                <family>FreeSerif</family>
                <family>MgOpen Canonica</family>
                <default><family>serif</family></default>
        </alias>
<!--
  Sans-serif faces
 -->
        <alias>
                <family>Bitstream Vera Sans</family>
                <family>DejaVu Sans</family>
                <family>Helvetica</family>
                <family>Arial</family>
                <family>Verdana</family>
                <family>Albany AMT</family>
                <family>Nimbus Sans L</family>
                <family>Luxi Sans</family>
                <family>Kochi Gothic</family>
                <family>AR PL KaitiM GB</family>
                <family>AR PL KaitiM Big5</family>
                <family>MS ゴシック</family>
                <family>Baekmuk Dotum</family>
                <family>SimSun</family>
                <family>FreeSans</family>
                <family>MgOpen Modata</family>
                <default><family>sans-serif</family></default>
        </alias>
<!--
  Monospace faces
 -->
        <alias>
                <family>Bitstream Vera Sans Mono</family>
                <family>DejaVu Sans Mono</family>
                <family>Courier</family>
                <family>Courier New</family>
                <family>Andale Mono</family>
                <family>Luxi Mono</family>
                <family>Cumberland AMT</family>
                <family>Nimbus Mono L</family>
                <family>NSimSun</family>
                <family>FreeMono</family>
                <default><family>monospace</family></default>
        </alias>
- ---------------------------------
65-nonlatin.conf
- --------------------------------
        <alias>
                <family>serif</family>
                <prefer>
                        <family>Frank Ruehl</family>
                        <family>MgOpen Canonica</family>
                        <family>Kochi Mincho</family>
                        <family>AR PL SungtiL GB</family>
                        <family>AR PL Mingti2L Big5</family>
                        <family>MS 明朝</family>
                        <family>Baekmuk Batang</family>
                </prefer>
        </alias>
        <alias>
                <family>sans-serif</family>
                <prefer>
                        <family>Nachlieli</family>
                        <family>MgOpen Modata</family>
                        <family>Kochi Gothic</family>
                        <family>AR PL KaitiM GB</family>
                        <family>AR PL KaitiM Big5</family>
                        <family>MS ゴシック</family>
                        <family>Baekmuk Dotum</family>
                        <family>SimSun</family>
                </prefer>
        </alias>
        <alias>
                <family>monospace</family>
                <prefer>
                        <family>Miriam Mono</family>
                        <family>Kochi Gothic</family>
                        <family>AR PL KaitiM GB</family>
                        <family>Baekmuk Dotum</family>
                </prefer>
        </alias>
- ---------------------------
These defaults do more harm than good.
For Latin script a good default would be:
 * serif: Doulos SIL, Charis SIL, DejaVu Serif, Bitstream Vera Serif
 * sans: DejaVu Sans, Bitstream Vera Sans
 * monospace: DejaVu Sans Mono, Bitstram Vera Sans Mono

For CJK (KR and ZH locales):
 Latin plus the following:
 * serif: AR PL ShanHeiSun Uni MBE, AR PL ShanHeiSun Uni, Kochi Mincho,
Sazanami Mincho, UnBatang
 * sans: AR PL ShanHeiSun Uni MBE, AR PL ShanHeiSun Uni, Kochi Mincho,
Sazanami Mincho, UnDotum
 * monospace: AR PL ShanHeiSun Uni MBE, AR PL ShanHeiSun Uni, Kochi
Mincho, Sazanami Mincho, UnDotum

For CJK (JP locale):
 Latin plus the following:
 * serif: Kochi Mincho, Sazanami Mincho, AR PL ShanHeiSun Uni MBE, AR PL
ShanHeiSun Uni, UnBatang
 * sans: Kochi Gothic, Sazanami Gothic, AR PL ShanHeiSun Uni MBE, AR PL
ShanHeiSun Uni, UnDotum
 * monospace: Kochi Mincho, Sazanami Mincho, AR PL ShanHeiSun Uni MBE,
AR PL ShanHeiSun Uni, UnDotum

The following entries should be removed altogether, as the fonts are
either non-free (and not available in Debian), or outdated and not
preferred:
 MS 明朝, Baekmuk *, AR PL KaitiM *, MS ゴシック, SimSun, NSimSun,
AR PL SungtiL GB, AR PL Mingti2L Big5

However, fontconfig still seems to get confused with the default
configuration.
For example: Latin A-Z glyphs are takes from Bitstream Vera, but 0-9 and
other Latin chars are taken from the AR PL * fonts. :(
At least that's how it appears to me in gucharmap.

>>  * fontpackages should not label their own fonts as preferred, but
>> should rather indicate for which locales and which codepoint ranges the
>> fonts should be used. (Latin fonts can label the locales as "any" and
>> Unicode range to the latin ranges they support; see next item.)
>>  * fontconfig should provide a method to specify for which locales which
>> fonts are suitable and which codepoint ranges they should be preferred
>> for. For example: Chinese users might want the AR PL ShanHeiSun Uni font
>> to be preferred for the CJK ranges, but the Doulos SIL or DejaVu Sans
>> fonts for Latin glyphs. So, I would configure fontconfig for the zh_*
>> locales to prefer the SIL fonts for Unicode ranges U+0000 ~ U+001F, etc.
>> and the AR PL ShanHeiSun Uni font for the range U+2E80 ~ U+9FFF, etc.
> 
> This sounds reasonable.  I would think that we may also need a way for the
> application to override fontconfig's idea of which language is relevant,
> when matching a font pattern.  As you point out, applications may display
> text in multiple locales, and so in order to match correctly in all cases,
> fontconfig must receive that information, and not only the current locale
> setting.

yes. the user should have a way to override the default config. A GUI in
the system settings area would be nice. :) (hint, hint)

>>> - Whether we still need all these horrible bitmap fonts
>> As seperate fonts, probably only for X. CJK users however need fonts
>> with embedded bitmaps.
> 
> This is my thinking as well; the bitmap font packages (xfonts-*) should only
> be needed for legacy apps at this point.

I'm not sure what a default X installation needs... need to check with
the x.org folks.

>>> - Whether we still need server-side fonts for anything
>>>
>>> - Whether we need DeFoMa
>> Yes, as legacy X, GTK1 and QT apps depend on it.
> 
> I should have been clearer; I am trying to establish (among other things)
> what is needed by default in Ubuntu.  The rest will still be available, but
> depending on what's currently shipped, we may be able to remove some things
> from the default install, which would be a win for us.
> 
>> Fontconfig also uses the settings to determine the character sets
>> provided by the fonts.
>> AFAIK, it is also used to register fonts for ghostscript... however I'm
>> not an expert on this and it seems to me that it does not always work as
>> planned.
> 
> I'm CCing Till Kamppeter in hopes that he can enlighten us regarding font
> selection for Ghostscript.
> 

IMHO the ghostscript fonts are only needed for TeX... but I might be
wrong...

Cheers
Arne
- --
Arne Götje (高盛華) <arne at linux.org.tw>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F  1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGW5Axbp/QbmhdHowRAtsrAKDdFpO/UbNvTC/GTImB3OaXNVDttACdEMwe
LQAYGAk8J8iyIpjiQrOpB24=
=VCEK
-----END PGP SIGNATURE-----



More information about the ubuntu-devel mailing list