my strategy on implementing line-endings (eol) support
Michael Haggerty
mhagger at alum.mit.edu
Fri Apr 4 10:04:13 BST 2008
Mark Hammond wrote:
> If we are assuming UCS2 (ie, UTF16 without surrogates) and know the
> byte order, then I see no reason you couldn't also optimize this
> encoding by replacing with a blind "s.replace("\n\0", "\r\0\n\0")".
I don't think this would be correct, because you don't know that the
replace is occurring at a character-aligned boundary. For example, the
characters "ਧĀ" = '\u0A27\u0100' have the encoding '\x27\n\0\x01', so
your replace would screw it up because the middle two bytes are '\n\0'.
Michael
More information about the bazaar
mailing list