[MERGE] UTF-8 encoding in binary diffs

Goffredo Baroncelli kreijack at tiscalinet.it
Tue Jul 10 19:57:13 BST 2007

On Tuesday 10 July 2007, Martin Pool wrote:
> Martin Pool has voted -0.
> Status is now: Waiting
> Comment:
> I think changing to %s rather than %r is definitely right.
> And I think encoding just the filename, rather than the whole stream, is 
> also probably right, as it will give better results if we don't know the 
> encoding of the contents of the file.
> My only query is whether we should be hardcoding utf-8 here.  Shouldn't 
> we be putting the filenames into the user's encoding?

Internally the paths in bazaar are unicode (not __encoded__). 
The problem in the function _show_diff_trees() is that they mixes userdata 
(which is already __encoded__) and the paths which aren't __encoded__. I 
don't know if it is the only one case.

Because I use this function inside in my web interface (webserve) I suggest to 
pass another parameters to the function: the encoding of the unicode data 
(the filepath).

So, for an internal use (as webserve), this function encodes the data as utf8; 
for the external use (the bzr diff command for example) the encoding is 
the "user's encoding".

I am guessing how this function can _now_ work on non utf8 environment (ie 
windows). I have two hypotesis:
1) it doesn't work and display an utf8 encoded stream in a non utf8 
2) it works, because on the stdout the utf8 encoded stream is converted to the 
local encoding

> For details, see: 

gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack at inwind.it>
Key fingerprint = CE3C 7E01 6782 30A3 5B87  87C0 BB86 505C 6B2A CFF9

More information about the bazaar mailing list