my strategy on implementing line-endings (eol) support

Alexander Belchenko bialix at ukr.net
Fri Apr 4 06:21:38 BST 2008


Mark Hammond пишет:
>> Alexander Belchenko ?????:
> I'm just trying to ensure we are on the same page here.  I hope I don't sound pedantic or argumentative...

No, of course you're not. Thank you for your patience to explain it again and again.
I'm just stuck and can't figure out how to implement your suggestion right.
I'm inclined to finish first stage of my eol work without looking at encodings,
and then thinking about it more.

Thank you.

> 
>> I don't see any efficient way to handle eol in unicode files without
>> hurting performance, so it's better to follow hg model and disable
>> eol-conversion for them, even if user set the 'eol' property to some
>> value different from 'exact'.
> 
> This isn't my conclusion.  I would suggest there is no efficient way to perform EOL conversion on an ASCII file many megabytes in size.  The solution in that case is to not enable EOL conversion for such files, and I think the same applies here.  There is a threshold where you can't do it effectively for ASCII files - there is a similar threshold for Unicode files, it's just a little lower.  Conversely, I would say that the "average" source file, in any encoding supported by Python, could probably take the hit of the encode and decode without significantly degrading performance.

Hmm, good point about files in many megabytes in size. Currently bzr tries to hold entire file in 
memory for some operations and this leads to MemoryError on very big files.

> 
> It seems to me that the EOL conversions are always O(N) - and I think we can remain very close to O(N) for most encodings, especially if we have to process the file as a stream anyway (ie, s.replace() can't really be the impl unless we slurp it entirely into memory, which doesn't seem like an ideal implementation...)
>  
> Cheers,
> 
> Mark
> 
> 
> 
> 




More information about the bazaar mailing list