Bug: Commit message containing control characters

Harald Meland harald.meland at usit.uio.no
Fri Sep 2 06:23:38 BST 2005

[Martin Pool]

> On 9/2/05, Harald Meland <harald.meland at usit.uio.no> wrote:
>> [Robert Collins]
>> >> The root of the problem is that the XML 1.0 specification doesn't seem
>> >> to allow encoding of such "control characters" as e.g. "\x01", if I
>> >> understand the the well-formedness constraint here correctly:
>> >>
>> >>   http://www.w3.org/TR/REC-xml/#NT-Char
> Yes, so it would seem.
> I'm not sure that supporting control characters is really a good idea;
> it seems pretty problematic when the rest of the application wants to
> treat it as "normal" unicode text.  I can see the attraction for being
> able to do lossless imports of existing data.

This pretty much sums up my own sentiments on the issue. :-)

> I'm inclined to say that tailor (or maybe bzr?) should just strip
> out those characters.

As my recipe demonstrated, the bug isn't tailor-specific, and hence I
think that the responsibility for doing tricks with the message input
resides in bzr.

> Would that be a problem for you?  How did it get in there?

If the control character in question were just stripped, it would
render the original commit message somewhat meaningless.  Translated,
it says something like:

  Changed default field separator from ':' to '', in the hope that
  this is sufficient to avoid problems with field contents containing
  the separator character.

I suggest that instead of just stripping the control character,
replace it with some common string escape syntax for the character,
e.g. '\x1c' or '\034'.

More information about the bazaar mailing list