Character set issues

David M lists2006 at viewport.ukfsn.org
Wed Apr 26 17:42:22 UTC 2006


Karl Auer wrote in gmane.linux.ubuntu.user 
 about: Re: Macros in OpenOffice 2 

> On Sat, 2006-04-22 at 01:05 -0500, Tommy Trussell wrote:
>> > > Viel Spaß!

[The above line is readable as expected here, fwiw]

>> I'm butting in here to say the special characters showed up fine here
>> -- I'm using gmail. Whatever email program you are using may be set to
>> a different character set by default. I just looked at the headers of
>> his message, and his character set is charset=iso-8859-1
>
> Selecting iso-8859-1 didn't help, but selecting UTF-8 did.

I have a hunch that my newsreader (slrn) is actually lying about my 
articles being ISO-8859-1, but unfortunately, as far as I'm aware, 
that's as advanced as it gets (as least configuration-wise). It's a case
of choosing either this or some other now-deprecated character sets,
Unicode isn't an option. The slrn FAQ suggests that slrn will support 
Unicode when slang 2.0 is released, which implies it can't, yet.


I had previously dist-upgraded to Breezy and had no problems reading
European characters in news with slrn, but then I suffered a disk
problem that meant that I had to reinstall the OS from scratch.

Strangely, since then, all non-ASCII characters in ISO-8859-1 articles 
have been replaced by hexadecimal character codes, and I've also noticed
that, as UTF-8 posting becomes more commonplace, characters in articles 
posted in *UTF-8* *are* readable in slrn (including not just European 
letters, but Japanese/Chinese as well). However, this only seems to work 
if articles are sent as raw text, unencoded: articles which are 
base64-encoded display as the base64-encoding, as slrn can't decode that 
(because it's a text newsreader, and news wasn't designed for non-text 
articles).

I guess the full reinstall of Breezy must have kicked in some Unicode 
support somewhere in the system that a normal upgrade hadn't previously 
included. I don't understand why this is screwing with 
correctly-identified ISO-8859-1 articles, though: presumably my terminal
is treating _everything_ as UTF-8 whether the source likes it or not. I
guess this is a risk of decoupling character display (terminal) from
file reading (newsreader), until everything becomes Unicode-aware and
compliant?

I'm therefore surmising that my terminal (GNOME terminal) is handling
UTF-8 OK, that my editor (vim) is handling UTF-8 OK (and, I guess, must 
be saving files as such, otherwise the article would have been correctly
readable by others as ISO-8859-1?), and because Unicode is 
backwards-compatible, even if slrn can't handle UTF-8 itself, the 
characters that it 'thinks' it 'displays' as ISO-8859-1 are being 
correctly recognised and converted "at display time" by my terminal. 
Newsreaders or mailers with fuller character set support see the article 
claiming to be ISO-8859-1, and try to display it as such, resulting in 
mangled characters as that is not what they are?


-- 
| David M,    __________| replyto email valid <365 days | en, fr, (de) |
| Edinburgh, Scotland.  | but on-list replies preferred |   ________   |
> Please trim quoted text & interleave reply comments for readability. <





More information about the ubuntu-users mailing list