"Using Saved Location: foo"
John Arbash Meinel
john at arbash-meinel.com
Tue May 2 15:44:01 BST 2006
Martin Pool wrote:
> On 02/05/2006, at 3:18 PM, Jan Hudec wrote:
>>
>
>> Well, with a twist. The input url is actually 'encoded', so urlescape
>> must not escape % in this case.
>
> Right - actually I think doing it chunk-by-chunk is just going to
> complicate things. I think the other algorithm I quoted would be better
> for handling these pseudo-URLs, since it doesn't touch things that might
> already be escaped:
>
> for c in pseudo_url:
> if c in url_safe_characters:
> r += c
> else:
> if isinstance(c, unicode):
> r += urlescape(c.encode('utf-8'))
>
> --Martin
I just wanted to mention that the isinstance() check will always return
the same thing for ever character. So really you only need to do:
if not isinstance(psuedo_url, unicode):
return psuedo_url
r = []
if c in url_safe_characters:
r.append(c)
else:
r.append(urlescape(c)) # urlescape does utf-8 encoding :)
return ''.join(r)
You have to be a little bit careful, since usually ":" is not url safe,
but it will occur in all fully qualified urls.
(Urllib quotes everything that isn't A-Za-z0-9_.-)
Also, the 'isinstance(...,unicode)' isn't really useful, because all
user input comes in as unicode. Which is why I was proposing to do:
try:
return psuedo_url.encode('ascii')
except UnicodeEncodeError:
pass
That should work for a large portion of URLs, and then we don't have to
do all of the per-character checking.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060502/7f88f949/attachment.pgp
More information about the bazaar
mailing list