problems with encodings for signed commits

Dafydd Harries daf at muse.19inch.net
Thu Dec 29 17:02:17 GMT 2005


Ar 29/12/2005 am 00:24, ysgrifennodd Aaron Bentley:
> Dafydd Harries wrote:
> | While importing a baz branch into bzr recently, I discovered that the
> | testament code fails if a commit message contains non-ascii characters.
> 
> | There is code that checks that the method doesn't return a unicode
> object, but
> | it's guarded by an "if __debug__", which I consider to be a bit odd.
> |
> | The unicode object in question originates in Commit.commit:
> 
> No, it doesn't.  It originates in baz_import.iter_import_version:
> 
> ~        commitobj.commit(branch, log_message.decode('ascii', 'replace'),
> ~                         verbose=False, committer=log_creator,
> ~                         timestamp=timestamp, timezone=0, rev_id=rev_id)

I think we're using different versions of bzrtools. Mine uses this to do the
commit:

            wt.commit(log.summary, verbose=False, committer=log.creator,
                      timestamp=timestamp, timezone=0, rev_id=rev_id)

If  .decode('ascii', 'replace') was used, it would have mangled the log
(replacing all non-ascii characters with \ufffd), which didn't happen. In this
case, at least, decoding from UTF-8 was the right thing to do, even if it was
being done in the wrong place.

> |
> |         if isinstance(message, str):
> |             message = message.decode(bzrlib.user_encoding)
> 
> This is bogus.  If Commit.commit gets a bytestring, it should treat it
> as ascii-- there's no defined encoding.  Assuming that this bytestring
> is in the user encoding is not right.  This should be done in
> cmd_commit, where we know that the bytestring came from the user, and
> therefor the user's encoding applies.

That will work if I'm importing from a baz archive that's using the same
encoding as I am, but there's no way of knowing that that's the case.

I think using unicode internally everywhere is a fine principle, but the
nettle of doing the decoding a) in the right place and b) from the right
encoding has to be grasped.

-- 
Dafydd




More information about the bazaar mailing list