[MERGE] Register netloc protocols at 'register_transport_proto' time

John Arbash Meinel john at arbash-meinel.com
Wed Oct 24 18:21:47 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:
> 
>     john> Vincent Ladeuil wrote:
>     >>>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:
>     >> 
>     john> The attached patch fixes the bug I just mentioned about
>     john> missing netloc protocol registrations when sharing
>     john> transports.
>     >> 
>     >> Nice catch.
>     >> 
>     >> So, you found a code path where we try to split an url before
>     >> calling get_transport...
>     >> 
>     >> But we are still in a grey area, see
>     >> https://bugs.edge.launchpad.net/bzr/+bug/122258.
>     >> 
>     >> Basically, decorators trick urlparse in a way that rarely causes
>     >> problems. I think you run into it because you have switched from
>     >> sftp to bzr+ssh.
>     >> 
>     >> So I think the right solution is either to make urlparse deal
>     >> with decorators or to really filter out the decorators *before*
>     >> calling urlparse (easier said than done ;-)
> 
>     john> Well, we have 2 different "types" of decorators. We have:
>     john> readonly+...
> 
>     john> And we have
>     john> http+urllib
> 
> I'm painfully aware of that ;) And bzr+ssh and bzr+http[s] are
> from yet another type.
> 
>     >> 
>     john> I went ahead and defaulted the parameter to False,
>     john> because there were more protocols that should not be
>     john> registered. (All of the decorators, and memory, etc.)
>     >> 
>     >> Can you elaborate on why you did that ?
>     >> register_urlparse_netloc_protocol ensures that the urls will be
>     >> parsed correctly (most of the time :-/). I'm afraid we are still
>     >> exposed to the underlying problem then.
> 
>     john> Well, it doesn't make much sense to register
>     john> "readonly+" or "log+" or "trace+".  And 'memory:///' or
>     john> 'file:///' doesn't take a host, but probably would
>     john> still be netloc safe.
> 
>     john> 'lp:' is the only really weird one, since we don't
>     john> require the '///' (though we will accept it if they do,
>     john> and we don't accept 'lp:/' or 'lp://' at the moment.)
> 
> I understand, but this applies to the "bug" we just identified
> (optimistically), what if another one occurs ?
> 
> I think you pin-point rightly that it's unsafe to delay the
> registration (until the module is loaded and moreover as a
> side-effect).
> 
> Andrew was surprised to *have* to do that for bzr protocols.
> 
> Overall, it seems weird to even *have* to think about that.
> 
> So, making the registration mandatory, automatic even, seems safer.
> 
> 
> <snip/>
> 
>     john> I'm actually thinking we would be best served by just
>     john> copying out the urlparse code, and just rewriting it so
>     john> that it always had "netloc=True". We could probably
>     john> simplify it, and make our lives easier.
> 
>     john> Thoughts?
> 
> I just checked python 2.4, 2.5 and 2.6 implementations, 2.5 seems
> better (and includes SFTP support :), 2.6 introduces no changes
> (so far).
> 
> So, I'm all for having our own private implementation.
> 
> Cons:
> 
> - we lose support from the python community
> 
> Pro:
> - we can fix bugs :) #122258 comes to mind (and is really two
>   separate bugs as mentioned in the comments),
> - we can take decorators into account,
> - we can simplify the implementation by getting rid of useless
>   constraints (fragments, param) where appropriate.
> 

^- I'm just going to consider it:
https://bugs.launchpad.net/bzr/+bug/122258

And submit my changes.

I'll look at pulling in urlparse. But I also had another thought. What about
this in _split_url:

  (scheme, netloc, path, params,
   query, fragment) = urlparse.urlparse(url, allow_fragments=False)
  if netloc == '':
    # Must not have been registered
    register_urlparse_netloc_protocol(scheme)
    (scheme, netloc, path, params,
     query, fragment) = urlparse.urlparse(url, allow_fragments=False)

Because ConnectedTransport._split_url sort of already knows that everything
needs to have a netloc (since that is the host we are connecting to.)

I don't really like having it as a side effect, but it does mean that all
protocols that get used will get registered, without having to register all
permutations of all decorators ahead of time.

Also, because urlparse.uses_netloc is a list, parsing time will slow down when
we get 100s of them in there.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHH38rJdeBCYSNAAMRAsDbAJ0X8xpxTZoWFzyf3RijP8gnBmfGGQCgr2pT
soPh0qTmokaqMsSLXA0SrUM=
=k6+K
-----END PGP SIGNATURE-----



More information about the bazaar mailing list