[MERGE] Make annotate behave in a non-ASCII world

Wed Jul 11 00:55:08 BST 2007

On 7/7/07, Adeodato Simó <dato at net.com.org.es> wrote:
> * Aaron Bentley [Fri, 06 Jul 2007 13:37:39 -0400]:
>
> > Adeodato Simó wrote:
>
> > > I think this, or some other solution, is a must have, even if everybody
> > > prefers gannotate these days. ;-)
>
> > > +        try:
> > > +            to_file.write(anno)
> > > +        except UnicodeEncodeError:
> > > +            to_file.write(anno.encode(to_file.encoding, 'replace'))
>
> > Could you say why you're trying and catching the exception here?  I
> > think it would be better to encode the string unconditionally.
>
> Because we want the 'replace' encoding to happen *only* if the to_file
> object can't handle the characters in anno (which happens if for example
> a user with LANG=C annotates a file where one commiter had non-ascii
> characters).
>
> Unconditionally encoding is not desirable (as John also pointed out to
> me the first time) because users with the appropriate $LANG would see an
> unnecessarily mutilated string (which in the case of non-latin scripts
> would be the whole string).
>
> Hope the explanation was clear.

Sorry, I still don't understand it.  I think that writing a unicode
string to a file is the same as encoding it in the file's encoding,
then writing that byte string?  If all the unicode characters are
representable in that encoding, then the first attempt will succeed.
If any of them are not representable then it will fail and we'll redo
it and replace those characters.  How is this different to just
passing errors=replace in the first place?

-- 
Martin