Bug: Commit message containing control characters

Harald Meland harald.meland at usit.uio.no
Fri Sep 2 00:56:38 BST 2005


[Robert Collins]

>> The root of the problem is that the XML 1.0 specification doesn't seem
>> to allow encoding of such "control characters" as e.g. "\x01", if I
>> understand the the well-formedness constraint here correctly:
>> 
>>   http://www.w3.org/TR/REC-xml/#NT-Char
>
>  should work.

I don't think so; the XML 1.0 specification's section "Character and
Entity References" (http://www.w3.org/TR/REC-xml/#sec-references)
says:

  Well-formedness constraint: Legal Character

  Characters referred to using character references MUST match the
  production for Char.

... where the final Char is hyperlinked to this definition:

  Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
           [#x10000-#x10FFFF]

and hence does not include e.g. #x1.

[ Note that the XML 1.1 specification has a different definition for
  Char (http://www.w3.org/TR/xml11/#NT-Char):

    Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

  which *does* include most control characters.  However, I'm not
  aware of anyone actually using the 1.1 specification. ]

> What does the revision object on disk have ?  (look in
> .bzr/revision-store).

(Does this mean that my recipe for reproducing the problem doesn't
work for you?)

It contains an unescaped octet with the \x01 value:

  >>> import gzip
  >>> f = gzip.open(".bzr/revision-store/hmeland at twoflower.uio.no-20050901232805-c694ca13e56cc690.gz")
  >>> f.read()
  '<revision committer="Harald Meland &lt;hmeland at twoflower.uio.no&gt;" inventory_id="hmeland at twoflower.uio.no-20050901232805-c694ca13e56cc690" inventory_sha1="8814ec17a37373296e6d558ac56a724307a8a327" revision_id="hmeland at twoflower.uio.no-20050901232805-c694ca13e56cc690" timestamp="1125617285.161967993" timezone="7200">\n<message>foo\x01bar</message>\n</revision>\n'

I should also note that the traceback I get in ~/.bzr.log indicates
that the problem occurs inside the ElementTree module that resides
inside my copy of the bzr.dev branch, and that I'm using bleeding-edge
bzr.dev with no local changes.
-- 
Harald




More information about the bazaar mailing list