[MERGE] push handles file-ids containing quotes correctly

John Arbash Meinel john at arbash-meinel.com
Mon Jul 10 17:20:42 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> Hi all,
> 
> xml.sax.saxutils.unescape is horribly, horribly broken.  Sure, it does
> exactly what it claims, but no one should ever want what it claims to
> do: convert &, <, and > but not ' or " into their
> corresponding characters.
> 
> The result of using a copy of xml.saxutils.unescape in
> Repository.fileids_altered_by_revision_ids is that we fail to decode
> file-ids containing single- or double-quotes correctly.
> 
> This bundle fixes our local _unescape_xml function to unescape all 5 of
> the predefined entities described in XML 1.0 section 4.6, and also to
> raise an exception if it encounters an XML entity that it cannot decode
> (rather than simply ignoring it).
> 
> Aaron

+1 on having our own _unescape that does the right thing. +0 on having
the tests all the way up at 'push'. also +0 for having to pass over the
string N times for N replacements.


I think a better implementation would look like:

_unescape_map = {
	'apos':"'",
	'quot':'"',
	'amp':'&',
...
}
_unescape_re = None

def _unescaper(match, _map=_unescape_map):
    return _map[match.group(0)]

def _unescape_xml(data):
    global _unescape_re
    if _unescape_re is None:
	_unescape_re = re.compile('\&([^;]*);')
    return _unescape_re.sub(_unescaper, data)
...

This means that we will raise a
KeyError if a pattern isn't found, but it is better than an assert error
(which won't be run in python -O). It also means a single pass over the
data rather than multiple passes.

Also, this should be tested directly (by calling _unescape_xml directly
with several different inputs). If there is a problem higher up, then
maybe we should also have a test for 'push'. But to start with, we
should have direct tests of '_unescape_xml'.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEsn5aJdeBCYSNAAMRAlH+AKC3rlZ67i9kC5NRidv0DQH+zMRLzACfbviQ
7KHunRvZJ4ZunP8OzqdFWBQ=
=8qfo
-----END PGP SIGNATURE-----




More information about the bazaar mailing list