"Using Saved Location: foo"

Wed May 3 06:20:33 BST 2006

On Tue, May 02, 2006 at 09:44:01 -0500, John Arbash Meinel wrote:
> Martin Pool wrote:
> > On 02/05/2006, at 3:18 PM, Jan Hudec wrote:
> >>
> > 
> >> Well, with a twist. The input url is actually 'encoded', so urlescape
> >> must not escape % in this case.
> > 
> > Right - actually I think doing it chunk-by-chunk is just going to
> > complicate things.  I think the other algorithm I quoted would be better
> > for handling these pseudo-URLs, since it doesn't touch things that might
> > already be escaped:
> > 
> >   for c in pseudo_url:
> >     if c in url_safe_characters:
> >       r += c
> >     else:
> >       if isinstance(c, unicode):
> >         r += urlescape(c.encode('utf-8'))
> > 
> > --Martin
> 
> I just wanted to mention that the isinstance() check will always return
> the same thing for ever character. So really you only need to do:
> 
> if not isinstance(psuedo_url, unicode):
>   return psuedo_url
> 
> r = []
> if c in url_safe_characters:
>   r.append(c)
> else:
>   r.append(urlescape(c)) # urlescape does utf-8 encoding :)
> return ''.join(r)
> 
> You have to be a little bit careful, since usually ":" is not url safe,
> but it will occur in all fully qualified urls.
> (Urllib quotes everything that isn't A-Za-z0-9_.-)

Bzr must *NOT* quote any of the special characters -- %/:?&=;,#$
Otherwise there would be no way to enter their unescaped form.
(note: according to RFC $ is special, though I don't really have idea
when it's used)

> Also, the 'isinstance(...,unicode)' isn't really useful, because all
> user input comes in as unicode. Which is why I was proposing to do:
> 
> try:
>   return psuedo_url.encode('ascii')
> except UnicodeEncodeError:
>   pass
> 
> That should work for a large portion of URLs, and then we don't have to
> do all of the per-character checking.

It's .encode('ascii') that won't do. This needs quoting as well:

http://www.ucw.cz/~bulb/{archives}/

Neither '~', '{' nor '}' are allowed literally.

The point of isinstance(...,unicode) is, that user input is always
unicode while the encoded form is not. User input always needs exactly
one round of encoding.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060503/a4ef68dd/attachment.pgp