[BUG] bzr changeset generation fails with non-ascii characters

John A Meinel john at arbash-meinel.com
Sat Jul 16 18:42:34 BST 2005


Aaron Bentley wrote:
> John A Meinel wrote:
>
>>>Robey Pointer wrote:
>
>
>>>>FWIW, I agree that the cset should be treated as being in no encoding
>>>>(using whatever encoding is used for each file), and that means being
>>>>8-bit clean with no codec.
>
>
> I think this is the best option, but it may not be a great one for files
> in 16-bit encodings.  The resulting patch would be hard to read, since
> it would mix ASCII with, say, UCS-2.  AIUI, diff will just say 'binary
> files differ', because 16-bit files treated as 8-bit files have NULs
> everywhere.  I don't know what difflib does with binaries.
>
> Just thought that point was worth mentioning.
>

Yeah, well, the changeset format isn't really designed to handle binary
files. I'm not sure what we should do about it.

But also, we have the problem that some portions of the file might be in
UNIX line endings, and portions in DOS/MAC line endings. Again because
diff replicates the exact line endings, and the changeset generation
will be done in unix line endings.

>
>
>>>I'm thinking that probably we can just standardize on "meta-information
>>>is utf-8 encoded", and "patches are untranslated".
>>>
>>>Does that seem reasonable? The current method would try to translate
>>>meta information into the user's local preferred encoding, but since it
>>>is a format that is meant to be given to someone else, it seems that
>>>utf-8 encoding might be best.
>
>
> I think it's the best-available answer.  This will work best when the
> files are utf-8 or ascii, but ISO-8859-* files will be tolerable.
>
> It does mean that changesets are a mixed encoding format-- part utf8,
> part-binary.  I don't see a lot of alternatives.  I suppose one would be
> to work out a unicode-compliant way of encoding binary data, but it
> wouldn't be very readable.

Yeah, I'm thinking something like base64 would be necessary.
If I can get it to upload from this hotel, check out revision 88. I
fixed a bug in the rename code, and updated to handle non-ascii
characters in filenames and commit messages.

>
> Aaron

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050716/4659f72a/attachment.pgp 


More information about the bazaar mailing list