my strategy on implementing line-endings (eol) support

Thu Apr 3 13:26:36 BST 2008

> I'm not sure I understand your point completely.
> 
> Even if we force the user to set encoding property correctly we anyway
> can't be sure that this property is correct.

But we *can* be sure this property is correct as part of performing EOL translations, right?  If we detect it is not in the encoding we think it is (and therefore risk creating a file which is invalid in whatever encoding *is* correct), then we can simply refuse to perform any EOL translations at all?

> I called it "paranoid" mode. So if we really want to be paranoid, we
> should check that encoding property is actually
> right. I.e. we should check every file by decoding its bytestream to
> unicode. It means we should spend a big amount of
> time just to check that user settings is correct.

I think it is only necessary to check the setting is correct when we *assume* it is correct (ie, when we will do the wrong thing if it is incorrect).  I agree that most of the time we don't care if it is correct, as we don't attempt to interpret the file in that encoding.

> As a Cyrillic-minded man I can't agree with you that default encoding
> should be 'ascii'. It's the safe variant, it's
> true, but it does not help to speed things up.

It's not about speeding things up, it is about resisting the temptation to guess.  To my mind, it's very similar to Python choosing to use "ascii" as the default encoding with 'error' as the default handling - it is somewhat frustrating at times, but the least error prone decision.

However, it is a matter of philosophy - I always take the approach that it's easier to make a correct program fast than it is to make a fast program correct :)  I think I've made my point, so I'm happy to let things rest there...

Cheers,

Mark