[MERGE] push handles file-ids containing quotes correctly

John Arbash Meinel john at arbash-meinel.com
Tue Jul 11 17:33:26 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jan Hudec wrote:
> On Mon, Jul 10, 2006 at 18:10:46 -0500, John Arbash Meinel wrote:

...

>> numeric = match.group('numeric')
>> if numeric is not None:
>>   return unichr(numeric)
>> else:
>>   return _map[match.group('text')]
>>
>> That assumes that numeric references match the python unicode codepoint,
>> but I would guess that they do.
> 
> Pardon me, but I think the numeric entity reference regexp is wrong.
> Numeric entities match r'&#\d+;'.
> 
> I would probably use the lastindex trick and do:
> 
> _unescape_re = re.compile(r'&(?:#(\d+)|(amp)|(gt)|(lt)|(apos)|(quot));')
> _unescape_list = u"&><'\""
> 
> def _unescaper(match):
> 	if match.lastindex == 1:
> 		return unichr(int(match.group(1)))
> 	else:
> 		return _unescape_list[match.lastindex - 2]
> 
> I am using indices instead of texts because the comparison and array
> lookup should be slightly faster (the strings are not interned).
> 

Well, my numeric reference stuff was never actually meant as an
implementation. Just a suggestion. But thanks for fleshing it out.
I think it would be better to match anything between & and ;, though,
because we want to know if there are escapes we aren't handling.
I suppose you could add a catchall to the regex at the end, and then you
would at least get an 'index_error' on the _unescape_list string.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEs9LWJdeBCYSNAAMRAoQlAKDPRh4MOO675LcOFsxUN9/Cvugt0ACg07y4
Nnj+aTB8tTam3vpMvBWTVHo=
=2gHp
-----END PGP SIGNATURE-----




More information about the bazaar mailing list