[ubuntu-in] SCIM keymap: Unicode and TTF

Dinbandhu dinbandhu at sprynet.com
Thu Aug 23 14:42:14 BST 2007


Let me first thank you for an extremely lucid exposition of the current
issues involving TTF and Unicode. It has been highly educational and
gave an overall picture of what is going on, the challenges we face, and
how to move ahead.

On Thu, 2007-08-23 at 10:44 +0530, Gora Mohanty wrote:
> On Wed, 2007-08-15 at 20:19 -0400, Dinbandhu wrote:
> > [...]in your experience how common is it
> > for Hindi readers to be accessing computers running on for example Win
> > 98 which can only recognize TTF fonts? 
> 
> Not so sure about Hindi, but in Orissa there were quite a few people
> using not only Windows 98, but also Windows 95, to an extent such
> that regional newspapers were unwilling to shift to Unicode web pages.

Yes, I have experienced the same sorts of issues. I think the matter
remains quite pervasive. 

> This will probably change within the next couple of years, in that
> most people will be forced to shift to at least Windows XP. 

While my hope is that this should be quickly resolving over the coming
few years, economics is one strong rate-limiting factor that may
continue to prevent many from converting over to newer technologies in
the immediate future. For those running Win95 or Win95, converting over
to at least WinXP likely involves the purchase of a new computer.

One good way around the economics challenge is to educate people about
Linux. Most people running older computers could run the latest version
of a Linux distribution without problem, either as a full convert or as
a dual boot, and thereby gain access to the most modern computer
technologies and Unicode font. 

So mass public education about the ease of use and efficiency of Linux
OS's could make a big difference in the topography of this issue,
without putting a dent in people's pocketbooks. 

I myself use an old laptop, and having set up a dual boot with Ubuntu,
have personal experience that new Linux distros on old laptops can run
extremely well.

> While XP
> can be enabled to support Hindi and a few other Indian languages, it
> does not work with many others.

As noted above, there is also the economic difficulty. Since XP is
limiting in both financial and technical realms, Linux along with
massive public education campaigns about it for example in the Indian
public and private school systems, can provide a meaningful and
realistic solution.
 
> > Is there any facility in the current Linux system for accommodating
> > communication with non-Unicode users?
> 
> Given that the number and aesthetics of old-style, 8-bit fonts for
> Indian languages is better than Unicode equivalents, I agree that
> they need to be supported at least for the next few years. 

Good point. (I had noticed that many of the fonts I tried in Unicode did
not seem quite as well developed or aesthetically attractive. Some were
I would say, actually unclear.)

> There is
> also a wealth of existing content made using these fonts, that needs
> to be converted into Unicode. 

Yes, I see. That need is also there.

> However, as most of these fonts are
> proprietary, and many are not even available free of cost for
> non-commercial use, I do have misgivings about encouraging people to
> continue using them. 

Your misgiving is well based and reasonable. I concur in full. 

> Thus, convertibility to Unicode must be part and
> parcel of any such scheme, to prevent lock-in to a particular
> font/vendor.

That makes sense. 

One question: when we talk about "convertibility", does that mean
converting from that TTF font to the same font only in Unicode, or to a
different Unicode font? I would hope to see us adding these fonts to the
fund of what is available to Unicode. Perhaps that is what you address
just below.

For example, the default TTF font for Hindi (BRH Hindi) in Baraha
Windows was I felt, better than the fonts I have tried so far in
Unicode. I would very much like to be able to use that one in hi-baraha.
Or any such font really, which is well established and traditional,
clear and easy to read. As opposed to some of the "moderinistic" ones
that are sometimes not so easy to read. Perhaps the Shiva font you are
currently working on, which is used by publishing houses, will fulfill
that need.

> As TTF fonts are well-supported under Linux, what is needed is a way
> to allow text entry using these fonts, and means to convert the
> content to and from Unicode. Here is what I would propose be done:

I am very inspired to read your three-fold solution to this issue. It
appears an effective way of addressing a problem which is truly a
reality in the world of Indian language communication today in the
computer world.

> (a) Build keymaps for various fonts: As the encoding of the fonts is
>     non-standardised, this has to be done separately for each font.
>     The saving grace is that many of these fonts fall into identical
>     families, and general principles of making the conversion maps
>     can be elucidated.

Great. It seems like that will facilitate the work a lot. I had also had
some basic hint about family trends in key-mapping, from reading the
Baraha and Itrans literature.

This idea of (1) building keymaps for the various dominant TTF fonts,
along with (2) at the same time providing tools for their Unicode
conversion, seems a foundation stone for the solution to this issue.

>       Growing out of the work on the Unicode Baraha maps, I have
>     almost completed a keymap for the Devanagari Shiva font, using the
>     same Baraha layout. Shiva is an 8-bit, TTF font widely used by
>     printing houses, and this was needed for Sarai publications. 

Great. Looking forward to trying it out.

>     I
>     will also make the maps for Inscript, ITRANS, Bolnagri, and
>     Phonetic. Once these are done, I plan to write up a short report
>     on how to prepare conversion tables for such fonts, so that other
>     people can work on preparing tables.

Great. This will be invaluable for getting more and more people
involved. 

As noted, I would like to see the BRH-Hindi TTF fond available in
hi-baraha, and with instruction would be willing to do the work of
getting it working. I already made the keymap for it as you know in the
beginning, before switching over to making the Unicode version.

> (b) As mentioned earlier, the problem with the above is that it
>     encourages content creation in old technology, using
>     non-standardised encodings. Thus, I think that such maps should be
>     released only along with converters to/from Unicode. I have long
>     been planning for making a general-purpose library for Indian
>     language character processing, which would also include such
>     converters.

That makes a lot of sense. Wonderful work. It will help promulgate the
use of Indian languages in computers throughout the world.

> (c) Minor improvements to the rendering of such fonts might be made by
>     adding hints for anti-aliasing, something which I understand that
>     most Indian language fonts do not do. However, I have no
> experience
>     in this area.

I do not know the meaning of "anti-aliasing".

So in brief, the solutions are:

1. For the future: wide-spread public education about Linux so people
can access Unicode fonts inexpensively. Thus ultimately eliminating the
need for TTF fonts.

2. For the present: making keymaps for the dominant TTF fonts for use in
linux-based typing systems (like SCIM Baraha, ITRANS, Inscript,
Phonetic; and Bolnagri) along with tools for their conversion to
Unicode.

Many thanks for your guidance Gora, in "mapping out" what seems to me an
excellent approach in solving this important challenge.

Regards,
Swarup







More information about the ubuntu-in mailing list