"Using Saved Location: foo"

Martin Pool mbp at sourcefrog.net
Thu May 4 07:40:36 BST 2006


On  3 May 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> You make a good point. There are characters that are valid ASCII that
> still aren't valid in URLs. I was concerned about double escaping the %
> characters, but we certainly could use this heuristic:
> 
> 1) If it doesn't have :// it is a local path. Convert it into a file:///
>    url by escaping everything except A-Za-z0-9_.-/ (and on windows
>    convert \ => /, and handle C: properly)
> 2) If it has ://, consider it to be a hybrid URL. That is, a URL which
>    may have some escaping, but might also have some Unicode, or other
>    non URL characters.
>    Convert hybrid->normalized URL using:
> 	encode non-ascii characters with utf8 + % escaping.
> 	% escape all other characters, except A-Za-z0-9_.-/:?&=;,#$

+ should also be on that list - that is, if someone puts it in a hybrid url we
should assume it's an already-escaped space, not a needing-to-be-escaped
plus.  (I guess you meant to include it.)

The basic point is that anything which already matches the strict URL
grammar should not be changed by this.

> 3) Saved URLs will always be saved as normalized URLs, so should not be
> re-normalized.

That sounds good.

> get_transport(path) can further use this heuristic:
> if isinstance(path, unicode):
> 	path is 1 or 2
> else isinstance(path, str):
> 	path is 1 or 3

I would say the distinction should be whether the string came from the
user (and so can be a hybrid url), or if they should be strict about the
input.  So the choice is not 2 vs 3, but rather 2 vs "raise an error if
there are any characters but those strictly allowed".

But I suppose some things like branch/parent where we might want to be
strict probably are not at the moment, so we need to treat them the same
as user input.

> Functions like Branch.get_parent() should realize that they always
> return URLs and thus should always return a plain str() not a unicode
> string. (Which naively reading .bzr/branch/parent as a utf8 file would do).

-- 
Martin




More information about the bazaar mailing list