Transport connection caches must go.

Mon May 29 23:22:52 BST 2006

On Wed, 2006-05-24 at 10:59 +1000, Martin Pool wrote:
> On 23 May 2006, Robey Pointer <robey at lag.net> wrote:

> > 
> > new_transport = old_transport.try_clone(url)
> > 
> > If the URL is incompatible (you asked an sftp transport to clone an  
> > http url), it returns None.
> 
> That might be reasonable; so then if you were doing something on a
> remote bound branch whose master is on the same server it could discover
> that they should use the same connection.
> 
> (But possibly this should be at a higher level, because this needs to
> deal with locks as well.)
> 
> Incidentally I don't think 'clone' is such a great name, maybe "chdir"
> would be better, or something that conveys that we're creating a new
> related object.
> 
> This is fundamentally not a cache - the term implies that if you
> happen to not find something in the cache you can get it from somewhere
> else, whereas here it's about more than just performance because
> reconnecting might need the user to reauthenticate.

This is a promising approach. To sum up, we have the following
constraints:
 * Connections to multiple resources on different urls should only
initiate new connections, prompt for auth etc, if actually required -
i.e.  if they are not on the same host.
 * Two common operations that may lead us to connect to the same host
twice with no a-priori knowledge we are doing so are 'bzr branch A B',
and 'bzr checkout A B'. In the code for these operations we are opening
a new bzrdir at a URL and there is currently no way to provide the
transport in use by the bzrdir for A, to the call to open B, *as a
possible transport to clone/chdir from*.
requirements
 * We do not want a global cache: separate code paths, i.e. different
connections initiated by a webserver should not interact with each
other. (In this example, the credentials for the transport would be
coming from the end user, so having two requests share a transport would
be a security hole). Tests also have had nasty interactions with cached
transports.
 * We want something that is easy to use.

So heres a concrete proposal:

We add to the transport api a new clone_url() method. This is called
clone_url because like clone() it makes a new transport, but unlike
clone it requires a url, not a relative reference. We could name it
chdir or something, but we should rename clone at that point too. (Also,
clone() is well known in some object patterns as giving a new instance
of the same type, so clone is IMO still appropriate). This new method
will raise an error like 'RequiresNewTransport' when the url its being
asked to clone to is one it cannot reach by any pattern of clone()
calls.

We add to get_transport a keyword parameter 'reuse_connections' which
will accept an iterable that yields transports. These connections will
have clone_url() called on them preferentially. The first to return a
Transport will short circuit the whole call. We probably need to be
careful to ensure that local transports dont inappropriately accept
things - probably having LocalTransport raise RequiresNewTransport
unconditionally is the right thing to do.

Now, to reuse a connection in our examples above, its just:
b_bzrdir =
bzrlib.bzrdir.BzrDir.open_containing_with_transport(get_transport(b_url,
reuse_connections=[a_bzrdir.root_transport]))

Thoughts ?

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060530/12708854/attachment.pgp