[rfc] Tracking multiple connections

Mon May 7 10:26:21 BST 2007

Hi,

Following a discussion on IRC with John, I want to make the
following proposal.

We have already encountered several bugs around multiple
connections to servers for a given bzr command (bug 75721 for
push and bug 111702 for init).

Several others are also waiting:

- 31140 ftplib uses more than two connections, can exceed limit

- 43731 SFTP connection cache should be removed (fixing this one
  should reveal some lack of connection sharing)

Finally some tests revealed that bzr branch can also connect
multiple times and surely some others are waiting.

The implied cost, here, is that a new connection should be
established because we fail to share the transport (and its
associated connection).

So far, we reuse a connection by using transport.clone(), which
implies we have the already connected transport available.

Bug 111702 fix (pending review) implements a basic reuse
mechanism in Transport.get_transport() by allowing a
'possible_transports' parameter to be provided.

This parameter is a list of known transports provided by the
caller.

I chose a list instead of a single transport because:

- Robert mentioned that somewhere :) (can't find the reference
  sorry),

- at least bzr branch can deal with two transports pointing to
  different transports.

- push and pull for bound branches are likely good candidates as
  they will deal with transports that may be hidden from higher
  levels.

So basically we have two ways to get a new transport:

- call Transport.get_transport(url).

- clone an already available transport with a relative url.

Both are used throughout bzrlib and one possible code path is:

t = get_transport(url)

<some calls>

newurl = t.base +/- relpath

<some calls>

t2 = get_transport(newurl)

At the point where we build 't2', 't' is not in scope anymore. So
we have lost the opportunity to share the connection.

To achieve connection sharing on all the code paths where we want
it, we can:

1 - handle a global transport cache (joke, we already want to get
    rid of the sftp one: bug 43731),

2 - replace the url parameter by a transport parameter in all the
    code paths,

3 - add an optional 'transport' whose base is url in all the code
    paths,

4 - add an optional 'possible_transports' in all the code paths.

2 smells like a compatibility nightmare (a regression in the API
ease of use even), 3 may fail to address the uses of concurrent
multiple transports, so I'm more in favor of 4 :-)

Now get_transport will be called with possible_transports and
from that list can build a new transport that will share the
connection.

The idea is that high-level commands will build the list of
reusable transports (based on the command line parameters) and
from there the transports will be cloned preserving the
connection sharing.

For the lower levels, providing such a list allows each function
to decide if she wants to just forward it, add some transports to
it or just ignore it.

Using an optional parameter also allows callers to ignore it so
that we can deploy its use progressively.

Related bugs:
https://bugs.launchpad.net/bzr/+bug/110448
https://bugs.launchpad.net/bzr/+bug/36294
https://bugs.launchpad.net/bzr/+bug/55461

I would have liked to go further and propose an API associated
with get_transport refactoring along the lines of (pseudo
incomplete code) :

def get_transport(url, possible transports):

  url = normalized_url(url)
  for reusable in possible_transports
      transport = reusable.reuse_connection_for(url)
      if transport is not None:
         break

  if transport is None:
      transport = find one in transport factories

def reuse_connection_for(url):
   scheme, host, user, password, path = split_url(url)
   transport = None
   if scheme, host, user, password == self.scheme, host, user, password:
      transport = self.clone_or_whatever(path)
   return transport

But, looking at the code :

- LocalTransport can short-circuit reuse_connection_for, no
  connection to reuse,

- all the others duplicate the url splitting and connection
  attributes handling using different names.

So I think a first step is to define a new
ConnectedTransport(Transport) class that will handle the
attributes extraction and define an interface to
connect/disconnect. Then we can refactor all the transport
classes to inherit from that class.

Once this refactoring is done, it will be far easier to track the
spurious connections and to deploy this proposal.

All comments welcome,

    Vincent