problems with encodings for signed commits
Dafydd Harries
daf at muse.19inch.net
Thu Dec 29 17:02:17 GMT 2005
Ar 29/12/2005 am 00:24, ysgrifennodd Aaron Bentley:
> Dafydd Harries wrote:
> | While importing a baz branch into bzr recently, I discovered that the
> | testament code fails if a commit message contains non-ascii characters.
>
> | There is code that checks that the method doesn't return a unicode
> object, but
> | it's guarded by an "if __debug__", which I consider to be a bit odd.
> |
> | The unicode object in question originates in Commit.commit:
>
> No, it doesn't. It originates in baz_import.iter_import_version:
>
> ~ commitobj.commit(branch, log_message.decode('ascii', 'replace'),
> ~ verbose=False, committer=log_creator,
> ~ timestamp=timestamp, timezone=0, rev_id=rev_id)
I think we're using different versions of bzrtools. Mine uses this to do the
commit:
wt.commit(log.summary, verbose=False, committer=log.creator,
timestamp=timestamp, timezone=0, rev_id=rev_id)
If .decode('ascii', 'replace') was used, it would have mangled the log
(replacing all non-ascii characters with \ufffd), which didn't happen. In this
case, at least, decoding from UTF-8 was the right thing to do, even if it was
being done in the wrong place.
> |
> | if isinstance(message, str):
> | message = message.decode(bzrlib.user_encoding)
>
> This is bogus. If Commit.commit gets a bytestring, it should treat it
> as ascii-- there's no defined encoding. Assuming that this bytestring
> is in the user encoding is not right. This should be done in
> cmd_commit, where we know that the bytestring came from the user, and
> therefor the user's encoding applies.
That will work if I'm importing from a baz archive that's using the same
encoding as I am, but there's no way of knowing that that's the case.
I think using unicode internally everywhere is a fine principle, but the
nettle of doing the decoding a) in the right place and b) from the right
encoding has to be grasped.
--
Dafydd
More information about the bazaar
mailing list