problems with encodings for signed commits

Dafydd Harries daf at muse.19inch.net
Thu Dec 29 03:23:42 GMT 2005


The explanation here is rather lengthy, in the hope that it might save people
time in working out what the correct behaviour should be. I think this ties in
with the recent discussion about how bzr should deal with encodings, and
perhaps it might be illuminative in that regard also.

While importing a baz branch into bzr recently, I discovered that the
testament code fails if a commit message contains non-ascii characters. The
error ocurred when the code attempts to get an SHA of a unicode object
returned by Testament.as_text_lines, which proclaims:

    def as_text_lines(self):
        """Yield text form as a sequence of lines.

        The result is returned in utf-8, because it should be signed or
        hashed in that encoding.
        """

There is code that checks that the method doesn't return a unicode object, but
it's guarded by an "if __debug__", which I consider to be a bit odd.

The unicode object in question originates in Commit.commit:

        if isinstance(message, str):
            message = message.decode(bzrlib.user_encoding)

This object ends up in the new Revision object, which is used by
Testament.as_text_lines, which is used by Testament.as_short_text, which tries
to SHA the message whether it's unicode or not. Boom.

I have a branch with a test case, and a potential fix. No idea if the fix is a
good one, but it was enough to get me the import I wanted.

http://muse.19inch.net/~daf/bzr/bzr/devel/

-- 
Dafydd




More information about the bazaar mailing list