"Using Saved Location: foo"
Jan Hudec
bulb at ucw.cz
Wed May 3 06:20:33 BST 2006
On Tue, May 02, 2006 at 09:44:01 -0500, John Arbash Meinel wrote:
> Martin Pool wrote:
> > On 02/05/2006, at 3:18 PM, Jan Hudec wrote:
> >>
> >
> >> Well, with a twist. The input url is actually 'encoded', so urlescape
> >> must not escape % in this case.
> >
> > Right - actually I think doing it chunk-by-chunk is just going to
> > complicate things. I think the other algorithm I quoted would be better
> > for handling these pseudo-URLs, since it doesn't touch things that might
> > already be escaped:
> >
> > for c in pseudo_url:
> > if c in url_safe_characters:
> > r += c
> > else:
> > if isinstance(c, unicode):
> > r += urlescape(c.encode('utf-8'))
> >
> > --Martin
>
> I just wanted to mention that the isinstance() check will always return
> the same thing for ever character. So really you only need to do:
>
> if not isinstance(psuedo_url, unicode):
> return psuedo_url
>
> r = []
> if c in url_safe_characters:
> r.append(c)
> else:
> r.append(urlescape(c)) # urlescape does utf-8 encoding :)
> return ''.join(r)
>
> You have to be a little bit careful, since usually ":" is not url safe,
> but it will occur in all fully qualified urls.
> (Urllib quotes everything that isn't A-Za-z0-9_.-)
Bzr must *NOT* quote any of the special characters -- %/:?&=;,#$
Otherwise there would be no way to enter their unescaped form.
(note: according to RFC $ is special, though I don't really have idea
when it's used)
> Also, the 'isinstance(...,unicode)' isn't really useful, because all
> user input comes in as unicode. Which is why I was proposing to do:
>
> try:
> return psuedo_url.encode('ascii')
> except UnicodeEncodeError:
> pass
>
> That should work for a large portion of URLs, and then we don't have to
> do all of the per-character checking.
It's .encode('ascii') that won't do. This needs quoting as well:
http://www.ucw.cz/~bulb/{archives}/
Neither '~', '{' nor '}' are allowed literally.
The point of isinstance(...,unicode) is, that user input is always
unicode while the encoded form is not. User input always needs exactly
one round of encoding.
--
Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060503/a4ef68dd/attachment.pgp
More information about the bazaar
mailing list