[MERGE] push handles file-ids containing quotes correctly

John Arbash Meinel john at arbash-meinel.com
Tue Jul 11 00:10:46 BST 2006

Hash: SHA1

Aaron Bentley wrote:
> John Arbash Meinel wrote:
>>>>> Done, since you insist.  I still think it's pointless to test the
>>>>> unescaper, since if the XML text is not a valid inventory, (even if it's
>>>>> well-formed XML) file_ids_affected_by is probably broken.
>>>>> Aaron
>>> Well, it may be valid escaped XML.
> Valid escaped XML doesn't make it a valid inventory.  Our inventories
> are an XML subset that has line breaks after each entry, no comments, no
> CDATA, no processing instructions, no numeric entity references...
>>> We obviously missed ', so it
>>> seem possible that we are missing others. 
> Well, yeah.  Anything that's not ASCII is going to be serialized as
> numeric entity references by ElementTree, because ElementTree defaults
> to ASCII, not utf-8.  And we don't decode numeric entity references.

Well, it wouldn't be that hard to change the unescaper to try int() on
the returned value. Or even change the regex to:

and then have the unescaper do something like:

numeric = match.group('numeric')
if numeric is not None:
  return unichr(numeric)
  return _map[match.group('text')]

That assumes that numeric references match the python unicode codepoint,
but I would guess that they do.

>>> Which we won't find until a
>>> bug surfaces again. And I'd rather it surface early rather than later.
>>> I don't know of anything we are missing. But I know that as of right
>>> now, we don't have a lot of testing for extended unicode file ids.
> Oh, you know that thing Robert says that "untested code is broken code"?
>  We can't even commit unicode file-ids.


>     return "%02x/" % (adler32(fileid) & 0xff)
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u1234' in
> position 18: ordinal not in range(128)
> ----------------------------------------------------------------------
> Ran 4 tests in 0.364s

Sure. adler32() for hash prefixes would only work on bytestreams. so
we'd need to utf8 it, or somesuch.

Certainly I found in my encoding work that saying you support unicode,
and actually supporting it are a little bit different. :)

Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the bazaar mailing list