[MERGE] Register netloc protocols at 'register_transport_proto' time
John Arbash Meinel
john at arbash-meinel.com
Wed Oct 24 18:21:47 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Vincent Ladeuil wrote:
>>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:
>
> john> Vincent Ladeuil wrote:
> >>>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:
> >>
> john> The attached patch fixes the bug I just mentioned about
> john> missing netloc protocol registrations when sharing
> john> transports.
> >>
> >> Nice catch.
> >>
> >> So, you found a code path where we try to split an url before
> >> calling get_transport...
> >>
> >> But we are still in a grey area, see
> >> https://bugs.edge.launchpad.net/bzr/+bug/122258.
> >>
> >> Basically, decorators trick urlparse in a way that rarely causes
> >> problems. I think you run into it because you have switched from
> >> sftp to bzr+ssh.
> >>
> >> So I think the right solution is either to make urlparse deal
> >> with decorators or to really filter out the decorators *before*
> >> calling urlparse (easier said than done ;-)
>
> john> Well, we have 2 different "types" of decorators. We have:
> john> readonly+...
>
> john> And we have
> john> http+urllib
>
> I'm painfully aware of that ;) And bzr+ssh and bzr+http[s] are
> from yet another type.
>
> >>
> john> I went ahead and defaulted the parameter to False,
> john> because there were more protocols that should not be
> john> registered. (All of the decorators, and memory, etc.)
> >>
> >> Can you elaborate on why you did that ?
> >> register_urlparse_netloc_protocol ensures that the urls will be
> >> parsed correctly (most of the time :-/). I'm afraid we are still
> >> exposed to the underlying problem then.
>
> john> Well, it doesn't make much sense to register
> john> "readonly+" or "log+" or "trace+". And 'memory:///' or
> john> 'file:///' doesn't take a host, but probably would
> john> still be netloc safe.
>
> john> 'lp:' is the only really weird one, since we don't
> john> require the '///' (though we will accept it if they do,
> john> and we don't accept 'lp:/' or 'lp://' at the moment.)
>
> I understand, but this applies to the "bug" we just identified
> (optimistically), what if another one occurs ?
>
> I think you pin-point rightly that it's unsafe to delay the
> registration (until the module is loaded and moreover as a
> side-effect).
>
> Andrew was surprised to *have* to do that for bzr protocols.
>
> Overall, it seems weird to even *have* to think about that.
>
> So, making the registration mandatory, automatic even, seems safer.
>
>
> <snip/>
>
> john> I'm actually thinking we would be best served by just
> john> copying out the urlparse code, and just rewriting it so
> john> that it always had "netloc=True". We could probably
> john> simplify it, and make our lives easier.
>
> john> Thoughts?
>
> I just checked python 2.4, 2.5 and 2.6 implementations, 2.5 seems
> better (and includes SFTP support :), 2.6 introduces no changes
> (so far).
>
> So, I'm all for having our own private implementation.
>
> Cons:
>
> - we lose support from the python community
>
> Pro:
> - we can fix bugs :) #122258 comes to mind (and is really two
> separate bugs as mentioned in the comments),
> - we can take decorators into account,
> - we can simplify the implementation by getting rid of useless
> constraints (fragments, param) where appropriate.
>
^- I'm just going to consider it:
https://bugs.launchpad.net/bzr/+bug/122258
And submit my changes.
I'll look at pulling in urlparse. But I also had another thought. What about
this in _split_url:
(scheme, netloc, path, params,
query, fragment) = urlparse.urlparse(url, allow_fragments=False)
if netloc == '':
# Must not have been registered
register_urlparse_netloc_protocol(scheme)
(scheme, netloc, path, params,
query, fragment) = urlparse.urlparse(url, allow_fragments=False)
Because ConnectedTransport._split_url sort of already knows that everything
needs to have a netloc (since that is the host we are connecting to.)
I don't really like having it as a side effect, but it does mean that all
protocols that get used will get registered, without having to register all
permutations of all decorators ahead of time.
Also, because urlparse.uses_netloc is a list, parsing time will slow down when
we get 100s of them in there.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHH38rJdeBCYSNAAMRAsDbAJ0X8xpxTZoWFzyf3RijP8gnBmfGGQCgr2pT
soPh0qTmokaqMsSLXA0SrUM=
=k6+K
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list