[MERGE] push handles file-ids containing quotes correctly
John Arbash Meinel
john at arbash-meinel.com
Mon Jul 10 17:20:42 BST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Aaron Bentley wrote:
> Hi all,
>
> xml.sax.saxutils.unescape is horribly, horribly broken. Sure, it does
> exactly what it claims, but no one should ever want what it claims to
> do: convert &, <, and > but not ' or " into their
> corresponding characters.
>
> The result of using a copy of xml.saxutils.unescape in
> Repository.fileids_altered_by_revision_ids is that we fail to decode
> file-ids containing single- or double-quotes correctly.
>
> This bundle fixes our local _unescape_xml function to unescape all 5 of
> the predefined entities described in XML 1.0 section 4.6, and also to
> raise an exception if it encounters an XML entity that it cannot decode
> (rather than simply ignoring it).
>
> Aaron
+1 on having our own _unescape that does the right thing. +0 on having
the tests all the way up at 'push'. also +0 for having to pass over the
string N times for N replacements.
I think a better implementation would look like:
_unescape_map = {
'apos':"'",
'quot':'"',
'amp':'&',
...
}
_unescape_re = None
def _unescaper(match, _map=_unescape_map):
return _map[match.group(0)]
def _unescape_xml(data):
global _unescape_re
if _unescape_re is None:
_unescape_re = re.compile('\&([^;]*);')
return _unescape_re.sub(_unescaper, data)
...
This means that we will raise a
KeyError if a pattern isn't found, but it is better than an assert error
(which won't be run in python -O). It also means a single pass over the
data rather than multiple passes.
Also, this should be tested directly (by calling _unescape_xml directly
with several different inputs). If there is a problem higher up, then
maybe we should also have a test for 'push'. But to start with, we
should have direct tests of '_unescape_xml'.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEsn5aJdeBCYSNAAMRAlH+AKC3rlZ67i9kC5NRidv0DQH+zMRLzACfbviQ
7KHunRvZJ4ZunP8OzqdFWBQ=
=8qfo
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list