"Using Saved Location: foo"

Tue May 2 15:44:01 BST 2006

Martin Pool wrote:
> On 02/05/2006, at 3:18 PM, Jan Hudec wrote:
>>
> 
>> Well, with a twist. The input url is actually 'encoded', so urlescape
>> must not escape % in this case.
> 
> Right - actually I think doing it chunk-by-chunk is just going to
> complicate things.  I think the other algorithm I quoted would be better
> for handling these pseudo-URLs, since it doesn't touch things that might
> already be escaped:
> 
>   for c in pseudo_url:
>     if c in url_safe_characters:
>       r += c
>     else:
>       if isinstance(c, unicode):
>         r += urlescape(c.encode('utf-8'))
> 
> --Martin

I just wanted to mention that the isinstance() check will always return
the same thing for ever character. So really you only need to do:

if not isinstance(psuedo_url, unicode):
  return psuedo_url

r = []
if c in url_safe_characters:
  r.append(c)
else:
  r.append(urlescape(c)) # urlescape does utf-8 encoding :)
return ''.join(r)

You have to be a little bit careful, since usually ":" is not url safe,
but it will occur in all fully qualified urls.
(Urllib quotes everything that isn't A-Za-z0-9_.-)

Also, the 'isinstance(...,unicode)' isn't really useful, because all
user input comes in as unicode. Which is why I was proposing to do:

try:
  return psuedo_url.encode('ascii')
except UnicodeEncodeError:
  pass

That should work for a large portion of URLs, and then we don't have to
do all of the per-character checking.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060502/7f88f949/attachment.pgp