[MERGE] Readable and properly encoded diff headers
Aaron Bentley
aaron.bentley at utoronto.ca
Tue Aug 15 17:28:27 BST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Adeodato Simó wrote:
> * Aaron Bentley [Tue, 15 Aug 2006 10:31:40 -0400]:
>>I don't think we should assume that the destination is a terminal.
>
>
> Well, I was initially using bzrlib.user_encoding, and John recommended
> terminal_encoding() instead. In any case, this is not about the output
> being a terminal or not, but about having the headers of diff's output
> in the proper encoding.
Right. And if we can't assume that the output is a terminal, we can't
assume that terminal_encoding is the proper encoding. It is in the
common case, but not always.
>>For example, I believe this functionality is invoked by bzr-gtk's
>>gdiff, and so changing it to something other than utf-8 might be a
>>regression.
> And as for being a regression, current show_diff_trees() always returns
> headers in *ASCII*, never in 8bit (because of %r). So between shoving
> 8bit down applications always in UTF-8, and shoving it in the user's
> encoding, I think the latter is preferable
But that's not the choice. ASCII is a 7-bit subset of utf-8, not an
8-bit binary representation. And unicode strings represented using %r
can be decoded if that is desired.
>, unless somebody can explain
> me this would break stuff.
This will break stuff, because it will make it impossible for clients
that understand unicode filenames to get a diff that represents unicode
filenames.
Technically, %r is not lossy, while encoding to terminal_encoding can
be. But more importantly, if we're going to fix this at all, we should
fix it correctly.
> * John Arbash Meinel [Tue, 15 Aug 2006 09:37:41 -0500]:
>>And then 'cmd_diff' can set path_encoding = osutils.terminal_encoding().
>
>
> I don't fancy this much, having the encoding travelling all over the
> place.
Only the caller can know the appropriate encoding for filenames, by
definition. I don't have a preference for a particular way of allowing
the caller to specify.
>>'self.outf' for cmd_diff is setup *without* encoding, because we don't
>>want to silently transcode the contents of the file. What I would really
>>like is to have a 'do not accept unicode' file-like object when
>>encoding_type='exact'.
Heh, okay, scratch the idea of using outf encoding directly.
> Wouldn't it be possible to have a file-like object that does: "if I
> receive unicode, I encode it to $user_encoding; if I get strings, I pass
> them unmodified"? Would it make sense?
The subleties of unicode/string conversion are such that I'd prefer if
the formatter was very explicit, e.g.
class FileWithEncoding:
def write_bytes(input):
self.file.write(input)
def write_text(input):
self.file.write(input.encode(self.encoding))
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFE4fYr0F+nu1YWqI0RAv5ZAJoCVFYWLvInwxkOOMbahq0NoE/2YwCeIBE8
dxogpUPVhMiuw1sRaEqHtoE=
=fSMS
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list