my strategy on implementing line-endings (eol) support

Alexander Belchenko bialix at ukr.net
Fri Apr 4 18:55:17 BST 2008


Stefan Monnier пишет:
> Just a plea here: please don't assume Unicode = utf-16.
> E.g., in the unix world, utf-16 basically doesn't exist and most uses of
> Unicode is with a utf-8 encoding instead.

utf-16 and utf-8 is encodings. It's not Unicode in its raw format itself.
I've seen a lot of confusion from people who wrote python programs
and don't understand difference between utf-8 strings and unicode strings.
It's not the same! At least utf-16 is most closer to internal format of unicode
strings than others, that's why I made this slip utf-16 -> unicode.

> 
> The problems you're talking about have nothing to do with Unicode, but
> with the utf-16 encoding instead (tho it probably affect utf-32 as
> well).  I wouldn't be surprised if other (non-Unicode) encodings suffer
> from similar problems.  So please say "utf-16" rather than "Unicode".

Ok, let's call it 'utf-16 problem'. 'other non-Unicode encodings' should not
suffer from this problem, because most of existing encodings represent
subset of unicode but used 8-bit characters in strings.
utf-16 problem is in using 16-bits characters.

> Also I wouldn't worry too much about this problem: utf-16 being almost
> exclusively used under Windows, I suspect that all the tools that can
> handle UTF-16 can also perfectly deal with CRLF line endings, so there's
> probably never any need for any form of EOL conversion on those files.

I think it's safe enough to relax about utf-16, and preventing any eol conversion
for them.

> 
> 
>         Stefan "who's been confused for the Nth time by this mixup in
>                 this thread."
> 
> 
> 




More information about the bazaar mailing list