Character set issues
David M
lists2006 at viewport.ukfsn.org
Wed Apr 26 17:42:22 UTC 2006
Karl Auer wrote in gmane.linux.ubuntu.user
about: Re: Macros in OpenOffice 2
> On Sat, 2006-04-22 at 01:05 -0500, Tommy Trussell wrote:
>> > > Viel SpaÃ!
[The above line is readable as expected here, fwiw]
>> I'm butting in here to say the special characters showed up fine here
>> -- I'm using gmail. Whatever email program you are using may be set to
>> a different character set by default. I just looked at the headers of
>> his message, and his character set is charset=iso-8859-1
>
> Selecting iso-8859-1 didn't help, but selecting UTF-8 did.
I have a hunch that my newsreader (slrn) is actually lying about my
articles being ISO-8859-1, but unfortunately, as far as I'm aware,
that's as advanced as it gets (as least configuration-wise). It's a case
of choosing either this or some other now-deprecated character sets,
Unicode isn't an option. The slrn FAQ suggests that slrn will support
Unicode when slang 2.0 is released, which implies it can't, yet.
I had previously dist-upgraded to Breezy and had no problems reading
European characters in news with slrn, but then I suffered a disk
problem that meant that I had to reinstall the OS from scratch.
Strangely, since then, all non-ASCII characters in ISO-8859-1 articles
have been replaced by hexadecimal character codes, and I've also noticed
that, as UTF-8 posting becomes more commonplace, characters in articles
posted in *UTF-8* *are* readable in slrn (including not just European
letters, but Japanese/Chinese as well). However, this only seems to work
if articles are sent as raw text, unencoded: articles which are
base64-encoded display as the base64-encoding, as slrn can't decode that
(because it's a text newsreader, and news wasn't designed for non-text
articles).
I guess the full reinstall of Breezy must have kicked in some Unicode
support somewhere in the system that a normal upgrade hadn't previously
included. I don't understand why this is screwing with
correctly-identified ISO-8859-1 articles, though: presumably my terminal
is treating _everything_ as UTF-8 whether the source likes it or not. I
guess this is a risk of decoupling character display (terminal) from
file reading (newsreader), until everything becomes Unicode-aware and
compliant?
I'm therefore surmising that my terminal (GNOME terminal) is handling
UTF-8 OK, that my editor (vim) is handling UTF-8 OK (and, I guess, must
be saving files as such, otherwise the article would have been correctly
readable by others as ISO-8859-1?), and because Unicode is
backwards-compatible, even if slrn can't handle UTF-8 itself, the
characters that it 'thinks' it 'displays' as ISO-8859-1 are being
correctly recognised and converted "at display time" by my terminal.
Newsreaders or mailers with fuller character set support see the article
claiming to be ISO-8859-1, and try to display it as such, resulting in
mangled characters as that is not what they are?
--
| David M, __________| replyto email valid <365 days | en, fr, (de) |
| Edinburgh, Scotland. | but on-list replies preferred | ________ |
> Please trim quoted text & interleave reply comments for readability. <
More information about the ubuntu-users
mailing list