[MERGE][bug #111702] bzr init <remote_branch> connects multiple times

Sat Jun 2 14:20:22 BST 2007

>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:

    john> John Arbash Meinel has voted +1 (conditional).
    john> Status is now: Conditionally approved

Thanks.

    john> Comment:
    john> I wish there were a better way than having to pass possible
    john> transports around through all the different api layers,

I don't know yet what layers will be traversed but I guess there
will not be that much, BzrDir, Branch and Repository (via BzrDir)
already have their associated transport[s] so the impact should
not be that high.

    john> but the only obvious one is to use a cache, and that
    john> seems to be very frowned upon.

... and their implementation is incomplete: only ftp and sftp
have a global cache, http and remote (in its TCP, ssh and http
based implementations) don't. One can argue that pycurl have one
though. Also webdav, relying on http, doesn't have one either.

By "doesn't have one", I mean even the connection sharing which
is implemented at the transport level is not enough to guarantee
a single connection for bzr commands.

I think my proposal is between the two already used mechanisms:
- a bit more sharing than at the transport level only,
- a bit less sharing than at the global level.
- a caching policy adjustable by callers.

    john> This looks incorrectly indented:
    john> +    * ``bzr init`` should only connect to the remote location one time.
    john> +      We have been connecting several times because we forget to pass
    john> +      around the Transport object. This modifies
    john> +      ``BzrDir.create_branch_convenience``, so that we can pass in the
    john> +      Transport that we already have.
    john> +     (John Arbash Meinel, Vincent Ladeuil, #111702)
    john> +

What ? You mean that we will never put that on punched cards so
limiting lines to 72 chars (even 65 in that case) is not really a
requirement ?

Well, your call :)

Humorous note: If you look at the cited lines above, you will
notice that the largest one ends up in column 82, which means
that my default setting of 65 (whose intent was to take this kind
of case into account) is already too high !

    john> I realize this isn't your code but this is incorrect:

Fire, fire ! I'm refactoring :-)

    john> 
    john> - t = get_transport(safe_unicode(base))
    john> +            t = get_transport(safe_unicode(base), possible_transports)

    john> get_transport() isn't defined anymore as strictly requiring a
    john> Unicode path. It is actually specifically defined as taking a URL
    john> or trying to be smart if it gets a unicode path. (If it thinks it
    john> is a URL it tries to encode it, if it is a local path it treats
    john> it appropriately).

Yes, I saw that. 

It perfectly fit my purpose in fact ; as "local" and "connected"
transports already have a distinct treatment, handling connection
sharing will be a natural consequence.

    john> So I would just remove the "safe_unicode()" calls. There are
    john> other places as well.

Will do.

    john> This loop could be clearer:
    john> +    transport = None
    john> +    if possible_transports:
    john> +        for t in possible_transports:
    john> +            if t.base == base:
    john> +                transport = t
    john> +                break
    john> +    if transport is None:
    john> ...
    john> +    if transport is None:
    john> ...
    john> +    return transport

    john> Instead, you could just do:
    john> +    if possible_transports:
    john> +        for t in possible_transports:
    john> +            if t.base == base:
    john> +                return t
    john> +                break

    john> Then you don't need the 'if transport is None' checks.

Ok. This will be refactored anyway.

    john> Also, the way you have written this, the transport will
    john> only be re-used if it is identically the same. I
    john> thought we were trying to also handle cases when it is
    john> a parent/child. Or is that something you are still
    john> working on?

Still under work. Specifically I needed the scheme, host, user
and password to be accessible attributes of the transport, so i
began refactoring all "connected" transports to make them
accessible. Every implementation did that in different ways and
many comments indicated that some sharing was more than expected
:-)

This work is done now, I will now begin to address the multiple
connections bugs. So far, using ftp for the tests and disabling
the cache reveals ('b' is the actual number of connections):

FAIL: commands.test_branch.TestBranch.test_branch_remote_local
    not equal:
a = 1
b = 2

FAIL: commands.test_cat.TestCat.test_cat
    not equal:
a = 1
b = 2

FAIL: commands.test_checkout.TestCheckout.test_checkout
    not equal:
a = 1
b = 2

FAIL: commands.test_merge.TestMerge.test_merge
    not equal:
a = 1
b = 5

FAIL: commands.test_missing.TestMissing.test_missing
    not equal:
a = 1
b = 3

FAIL: commands.test_pull.TestPull.test_pull
    not equal:
a = 1
b = 4

FAIL: commands.test_push.TestPush.test_push
    not equal:
a = 1
b = 2

What that means is that writable transports don't share
connections without a global cache.

    john> And I don't think this works for local relative paths. Because
    john> T.base is expanded to the absolute path, rather than the relative
    john> path that was passed in. (unless I misread it and the path has
    john> already been converted to a full url).

We don't care about reusing connections for local transports nor
memory transports ;-)

       Vincent