[MERGE] Readable and properly encoded diff headers

Aaron Bentley aaron.bentley at utoronto.ca
Tue Aug 15 17:28:27 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adeodato Simó wrote:
> * Aaron Bentley [Tue, 15 Aug 2006 10:31:40 -0400]:
>>I don't think we should assume that the destination is a terminal. 
> 
> 
> Well, I was initially using bzrlib.user_encoding, and John recommended
> terminal_encoding() instead. In any case, this is not about the output
> being a terminal or not, but about having the headers of diff's output
> in the proper encoding.

Right.  And if we can't assume that the output is a terminal, we can't
assume that terminal_encoding is the proper encoding.  It is in the
common case, but not always.

>>For example, I believe this functionality is invoked by bzr-gtk's
>>gdiff, and so changing it to something other than utf-8 might be a
>>regression.

> And as for being a regression, current show_diff_trees() always returns
> headers in *ASCII*, never in 8bit (because of %r). So between shoving
> 8bit down applications always in UTF-8, and shoving it in the user's
> encoding, I think the latter is preferable

But that's not the choice.  ASCII is a 7-bit subset of utf-8, not an
8-bit binary representation.  And unicode strings represented using %r
can be decoded if that is desired.

>, unless somebody can explain
> me this would break stuff.

This will break stuff, because it will make it impossible for clients
that understand unicode filenames to get a diff that represents unicode
filenames.

Technically, %r is not lossy, while encoding to terminal_encoding can
be.  But more importantly, if we're going to fix this at all, we should
fix it correctly.

> * John Arbash Meinel [Tue, 15 Aug 2006 09:37:41 -0500]:

>>And then 'cmd_diff' can set path_encoding = osutils.terminal_encoding().
> 
> 
> I don't fancy this much, having the encoding travelling all over the
> place.

Only the caller can know the appropriate encoding for filenames, by
definition.  I don't have a preference for a particular way of allowing
the caller to specify.

>>'self.outf' for cmd_diff is setup *without* encoding, because we don't
>>want to silently transcode the contents of the file. What I would really
>>like is to have a 'do not accept unicode' file-like object when
>>encoding_type='exact'.

Heh, okay, scratch the idea of using outf encoding directly.

> Wouldn't it be possible to have a file-like object that does: "if I
> receive unicode, I encode it to $user_encoding; if I get strings, I pass
> them unmodified"? Would it make sense?

The subleties of unicode/string conversion are such that I'd prefer if
the formatter was very explicit, e.g.

class FileWithEncoding:
    def write_bytes(input):
        self.file.write(input)

    def write_text(input):
        self.file.write(input.encode(self.encoding))

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFE4fYr0F+nu1YWqI0RAv5ZAJoCVFYWLvInwxkOOMbahq0NoE/2YwCeIBE8
dxogpUPVhMiuw1sRaEqHtoE=
=fSMS
-----END PGP SIGNATURE-----




More information about the bazaar mailing list