"Using Saved Location: foo"
Martin Pool
mbp at sourcefrog.net
Thu May 4 07:40:36 BST 2006
On 3 May 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> You make a good point. There are characters that are valid ASCII that
> still aren't valid in URLs. I was concerned about double escaping the %
> characters, but we certainly could use this heuristic:
>
> 1) If it doesn't have :// it is a local path. Convert it into a file:///
> url by escaping everything except A-Za-z0-9_.-/ (and on windows
> convert \ => /, and handle C: properly)
> 2) If it has ://, consider it to be a hybrid URL. That is, a URL which
> may have some escaping, but might also have some Unicode, or other
> non URL characters.
> Convert hybrid->normalized URL using:
> encode non-ascii characters with utf8 + % escaping.
> % escape all other characters, except A-Za-z0-9_.-/:?&=;,#$
+ should also be on that list - that is, if someone puts it in a hybrid url we
should assume it's an already-escaped space, not a needing-to-be-escaped
plus. (I guess you meant to include it.)
The basic point is that anything which already matches the strict URL
grammar should not be changed by this.
> 3) Saved URLs will always be saved as normalized URLs, so should not be
> re-normalized.
That sounds good.
> get_transport(path) can further use this heuristic:
> if isinstance(path, unicode):
> path is 1 or 2
> else isinstance(path, str):
> path is 1 or 3
I would say the distinction should be whether the string came from the
user (and so can be a hybrid url), or if they should be strict about the
input. So the choice is not 2 vs 3, but rather 2 vs "raise an error if
there are any characters but those strictly allowed".
But I suppose some things like branch/parent where we might want to be
strict probably are not at the moment, so we need to treat them the same
as user input.
> Functions like Branch.get_parent() should realize that they always
> return URLs and thus should always return a plain str() not a unicode
> string. (Which naively reading .bzr/branch/parent as a utf8 file would do).
--
Martin
More information about the bazaar
mailing list