John Arbash Meinel john at arbash-meinel.com
Tue May 2 14:09:32 BST 2006

Martin Pool wrote:
> On 02/05/2006, at 3:18 PM, Jan Hudec wrote:
>> Well, with a twist. The input url is actually 'encoded', so urlescape
>> must not escape % in this case.
> Right - actually I think doing it chunk-by-chunk is just going to
> complicate things.  I think the other algorithm I quoted would be better
> for handling these pseudo-URLs, since it doesn't touch things that might
> already be escaped:
>   for c in pseudo_url:
>     if c in url_safe_characters:
>       r += c
>     else:
>       if isinstance(c, unicode):
>         r += urlescape(c.encode('utf-8'))
> --Martin

Well, I think we need to decode for display by chunk, because if a chunk
doesn't decode by 'utf-8' then I think we should leave it escaped.

Also, I think internally we should use the escaped form, and that is
what we save and send to the host. That way, if the host uses a
different escaping, the user can just manually escape it, rather than
typing it as unicode.

But I agree, if the user types unicode, we can use a per-character encoding.


