[BUG] bzr changeset generation fails with non-ascii characters

Aaron Bentley aaron.bentley at utoronto.ca
Sat Jul 16 16:53:20 BST 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
> Robey Pointer wrote:

>>FWIW, I agree that the cset should be treated as being in no encoding
>>(using whatever encoding is used for each file), and that means being
>>8-bit clean with no codec.

I think this is the best option, but it may not be a great one for files
in 16-bit encodings.  The resulting patch would be hard to read, since
it would mix ASCII with, say, UCS-2.  AIUI, diff will just say 'binary
files differ', because 16-bit files treated as 8-bit files have NULs
everywhere.  I don't know what difflib does with binaries.

Just thought that point was worth mentioning.


> I'm thinking that probably we can just standardize on "meta-information
> is utf-8 encoded", and "patches are untranslated".
> 
> Does that seem reasonable? The current method would try to translate
> meta information into the user's local preferred encoding, but since it
> is a format that is meant to be given to someone else, it seems that
> utf-8 encoding might be best.

I think it's the best-available answer.  This will work best when the
files are utf-8 or ascii, but ISO-8859-* files will be tolerable.

It does mean that changesets are a mixed encoding format-- part utf8,
part-binary.  I don't see a lot of alternatives.  I suppose one would be
to work out a unicode-compliant way of encoding binary data, but it
wouldn't be very readable.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFC2S1w0F+nu1YWqI0RAncvAKCE2xCRiNVwoVwr3ktUGwp0fxjwtQCbB1kG
jaJjTLpbgZfJ4JxY0YTAERM=
=axxb
-----END PGP SIGNATURE-----




More information about the bazaar mailing list