Ubuntu encodings

Simos Xenitellis simos.lists at googlemail.com
Tue Aug 15 11:08:13 UTC 2006


Στις 15-08-2006, ημέρα Τρι, και ώρα 05:37 -0300, ο/η Felipe Figueiredo
έγραψε:
> On Tuesday 15 August 2006 05:15, Dieter Schicker wrote:
> > Hi,
> >
> > maybe I don't exactly get what you mean but your view seems somehow
> > western centristic to me. You might not believe it but there _are_
> > actually people out there who need other encodings than latin1. For me
> > it's logical that Ubuntu uses UTF-8 as default encoding because it's the
> > lowest common denominator (also lowest seems the wrong term for
> > UTF-8 :-)). And since we are on Linux you can easily change the default
> > encoding  (and also the encoding of special applications) according to
> > your needs.
> 
> I agree with Joe. I see the same bad behaviour in kubuntu's firefox (I also 
> use debian etch with latin1) when seeing some portuguese sites. It chooses 
> UTF8 wrongly (instead of iso88591) very often.
> 
> I wouldn't mind using UTF8 locally, if it weren't for this. From where I 
> stand, UTF8 is not common denominator, it's a plain wrong denominator, since 
> it scrambles all accentuated characters.
> 
> And explicitly disagreeing with you, not far away in time, iso88591 was the 
> default encoding in linux. Ok, maybe quite some time, but I still see no 
> reason why this should be changed _as a default_.

If you consider that other people who speak different languages use
GNOME, there is a need for a common encoding to simplify the
maintainance. If a string is encoded in iso-8859-1 or iso-8859-5 or
iso-8859-7 you can never know because these strings do not carry any
identification that signifies their encoding. By choosing UTF-8, you
solve this problem.

For your specific needs using just iso-8859-1 could be ideal and would
probably save some space in the strings; however it makes it hell for
maintainance.

See for example these popular webmail services like Hotmail and Yahoo
Mail. They have regional websites for their webmail which use legacy
encodings. If people from the same region communicate the system appears
to work. However, when someone from Yahoo Arabic sents an e-mail to
someone at Hotmail Japan, all funny things happen. Is this a common case
to force the use of UTF-8? Well, it turns out to be.

There is indeed an issue of training people to make the transition; you
can convert your text/database material from legacy to UTF-8, though
there should be more publicity on this.

The conversion from legacy to UTF-8 should be like a dive in the deep
water; if you don't go for it all the way you end up with corrupted
data. For example, there are databases with legacy encodings that accept
UTF-8 text but they encoded it as HTML entities. In addition, in some
distributions, the default encoding in the databases (mysql) is
iso-8859-2 (or -1?) and the sorting method is swedish (case insensitive)
which makes lots of fun to users that speak something else.

Make the move to UTF-8. If there are any specific issues/queries, there
are people that are happy to help out.

Simos






More information about the ubuntu-users mailing list