[BUG] Bzr can't pull bundles from http urls

John Arbash Meinel john at arbash-meinel.com
Thu Dec 7 15:54:13 GMT 2006


Aaron Bentley wrote:
> The http transport requires that relpaths be strings, not unicode.  This
> is unlike get_transport, which happily accepts unicode strings.  As a
> result,
> 
> get_transport(u'http://foo/bar').get('') works, but using urlutils to
> split the url, then doing get_transport(u'http://foo').get(u'bar') does not.
> 
> The obvious solution would be to use the normalize_path function, but
> that will also return unicode, even though such unicode consists of
> ascii-range characters.
> 
> I believe this is a bug in normalize_path.
> 
> Furthermore, normalize_path does not accept urls containing tilde.  This
> is a bug IMHO, because tilde is acceptable per RFC 2396.  And it would
> be ludicrous to be promoting the use of tilde-containing URLs while
> simultaneously claiming that tildes are illegal in URLs.

We are escaping the character, which from what I've seen works just
fine. It has worked over sftp, and over http. IE I can access:
http://bazaar.launchpad.net/~bzr/bzr/trunk/.bzr/branch-format
and
http://bazaar.launchpad.net/%7Ebzr/bzr/trunk/.bzr/branch-format

Reading RFC 2396 seems to say:
   Data characters that are allowed in a URI but do not have a reserved
   purpose are called unreserved.  These include upper and lower case
   letters, decimal digits, and a limited set of punctuation marks and
   symbols.

      unreserved  = alphanum | mark

      mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

So if we were following that, we would !, *, and () into the list.

Also, for clarity I would put the character in a different section.
Right now it is divided into alpha, number, misc, and reserved. So
really ~ should be in the 'misc' section, not the 'reserved' section.

We can make it clearer with:

 _url_safe_characters = set(
    "abcdefghijklmnopqrstuvwxyz" # Lowercase alpha
    "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # Uppercase alpha
    "0123456789" # Numbers
    "_.-!~*'()"  # Unreserved characters
    "/;?:@&=+$," # Reserved characters
    "%#"         # Extra reserved characters
 )

> 
> So this patch changes normalize_path to emit only bytestrings, and to
> accept tilde as a safe character.  It also changes read_bundle_from_url
> to normalize the url it's passed, so that it's more consistent with
> Branch.open_containing.
> 
> Aaron

+1 whichever way we decide.

John
=:->


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061207/f3cd0bbc/attachment.pgp 


More information about the bazaar mailing list