my strategy on implementing line-endings (eol) support

Alexander Belchenko bialix at ukr.net
Wed Apr 2 18:15:37 BST 2008


Nicholas Allen пишет:
> 
> |
> | In my conviction there is 4 types of files:
> |
> | 1) binary files
> | 2) text files with exact line-endings
> | 3) text files with native/LF/CRLF/CR line-endings
> | 4) unicode text files similar to 3.
> Isn't there just 2 types of files (binary and text)? 4 above is just a 
> text file with encoding set to unicode. So I think file encoding needs 
> to be another property (UTF8, ASCII, unicode etc).

 From eol-conversion point of view it's not:

In [1]: u'\n'.encode('utf-16-le')
Out[1]: '\n\x00'

In [2]: u'\n'.encode('utf-16-be')
Out[2]: '\x00\n'

In [3]: u'\n'.encode('utf-16')
Out[3]: '\xff\xfe\n\x00'

By 'unicode text files' I actually mean 'utf-16'-encoded files.



More information about the bazaar mailing list