my strategy on implementing line-endings (eol) support

Michael Haggerty mhagger at alum.mit.edu
Fri Apr 4 10:04:13 BST 2008


Mark Hammond wrote:
> If we are assuming UCS2 (ie, UTF16 without surrogates) and know the
> byte order, then I see no reason you couldn't also optimize this
> encoding by replacing with a  blind "s.replace("\n\0", "\r\0\n\0")".

I don't think this would be correct, because you don't know that the
replace is occurring at a character-aligned boundary.  For example, the
characters "ਧĀ" = '\u0A27\u0100' have the encoding '\x27\n\0\x01', so
your replace would screw it up because the middle two bytes are '\n\0'.

Michael




More information about the bazaar mailing list