[BUG] Bzr can't pull bundles from http urls
John Arbash Meinel
john at arbash-meinel.com
Thu Dec 7 15:54:13 GMT 2006
Aaron Bentley wrote:
> The http transport requires that relpaths be strings, not unicode. This
> is unlike get_transport, which happily accepts unicode strings. As a
> result,
>
> get_transport(u'http://foo/bar').get('') works, but using urlutils to
> split the url, then doing get_transport(u'http://foo').get(u'bar') does not.
>
> The obvious solution would be to use the normalize_path function, but
> that will also return unicode, even though such unicode consists of
> ascii-range characters.
>
> I believe this is a bug in normalize_path.
>
> Furthermore, normalize_path does not accept urls containing tilde. This
> is a bug IMHO, because tilde is acceptable per RFC 2396. And it would
> be ludicrous to be promoting the use of tilde-containing URLs while
> simultaneously claiming that tildes are illegal in URLs.
We are escaping the character, which from what I've seen works just
fine. It has worked over sftp, and over http. IE I can access:
http://bazaar.launchpad.net/~bzr/bzr/trunk/.bzr/branch-format
and
http://bazaar.launchpad.net/%7Ebzr/bzr/trunk/.bzr/branch-format
Reading RFC 2396 seems to say:
Data characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include upper and lower case
letters, decimal digits, and a limited set of punctuation marks and
symbols.
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
So if we were following that, we would !, *, and () into the list.
Also, for clarity I would put the character in a different section.
Right now it is divided into alpha, number, misc, and reserved. So
really ~ should be in the 'misc' section, not the 'reserved' section.
We can make it clearer with:
_url_safe_characters = set(
"abcdefghijklmnopqrstuvwxyz" # Lowercase alpha
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" # Uppercase alpha
"0123456789" # Numbers
"_.-!~*'()" # Unreserved characters
"/;?:@&=+$," # Reserved characters
"%#" # Extra reserved characters
)
>
> So this patch changes normalize_path to emit only bytestrings, and to
> accept tilde as a safe character. It also changes read_bundle_from_url
> to normalize the url it's passed, so that it's more consistent with
> Branch.open_containing.
>
> Aaron
+1 whichever way we decide.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061207/f3cd0bbc/attachment.pgp
More information about the bazaar
mailing list