Bug: Commit message containing control characters

Harald Meland harald.meland at usit.uio.no
Mon Sep 5 08:01:17 BST 2005


[Martin Pool]

> On 05/09/05, Harald Meland <harald.meland at usit.uio.no> wrote:
>> > Am I saying anything different? But the context before was about not being
>> > able to represent \x01 in XML -- which is possible using &#1;.

I think it was Jan Hudec who said the above; it certainly wasn't
me. :-)

> I guess we have the option of storing literally "\x01", ie the four
> characters BACKSLASH X ZERO ONE, or something along those lines.

That's what the patch I posted a few days ago does
(http://patchwork.ozlabs.org/bazaar-ng/patch?id=2236).  The patch is a
little ugly as it stands, due to me not being sure whether this was
the right way to attack the problem -- but I'd be happy to clean it
up.

> It is a bit like inventing our own syntax.

Yeah, and it is a non-reversible transformation of the message text.

> I really think of the commit message as text, not binary data, and
> so not something that should be containing non-whitespace control
> characters.

I agree.

> Perhaps we should just do this when taking the commit message in,
> and not worry about unescaping, or even squash them to '?' (as
> Python can do with unrepresentable characters).  That would at least
> stop the exception.

My patch does this -- but for anything that uses
bzrlib.xml.pack_xml(), not just commit messages.

Maybe it would be better to do the message escaping in
bzrlib.revision.Revision.to_element(), and patch pack_xml() so that it
raises an error if it finds any non-escaped characters?
-- 
Harald




More information about the bazaar mailing list