problems with encodings for signed commits

Thu Dec 29 05:24:42 GMT 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dafydd Harries wrote:
| While importing a baz branch into bzr recently, I discovered that the
| testament code fails if a commit message contains non-ascii characters.

| There is code that checks that the method doesn't return a unicode
object, but
| it's guarded by an "if __debug__", which I consider to be a bit odd.
|
| The unicode object in question originates in Commit.commit:

No, it doesn't.  It originates in baz_import.iter_import_version:

~        commitobj.commit(branch, log_message.decode('ascii', 'replace'),
~                         verbose=False, committer=log_creator,
~                         timestamp=timestamp, timezone=0, rev_id=rev_id)

|
|         if isinstance(message, str):
|             message = message.decode(bzrlib.user_encoding)

This is bogus.  If Commit.commit gets a bytestring, it should treat it
as ascii-- there's no defined encoding.  Assuming that this bytestring
is in the user encoding is not right.  This should be done in
cmd_commit, where we know that the bytestring came from the user, and
therefor the user's encoding applies.

| http://muse.19inch.net/~daf/bzr/bzr/devel/

I don't think this is right.  The testament should be built assuming its
contents are unicode, or else all fields should be automatically
converted to utf-8.  No conditionals.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDs3Ma0F+nu1YWqI0RAiRaAKCJC81rkQyTcEzzbLiUJWLpv+jEswCfc9d9
OI4gAYlFusAb3t8ygo5YdWU=
=zhZS
-----END PGP SIGNATURE-----