Transport connection caches must go.

Wed May 31 14:06:46 BST 2006

Robert Collins wrote:

...

>> This is fundamentally not a cache - the term implies that if you
>> happen to not find something in the cache you can get it from somewhere
>> else, whereas here it's about more than just performance because
>> reconnecting might need the user to reauthenticate.
> 
> This is a promising approach. To sum up, we have the following
> constraints:
>  * Connections to multiple resources on different urls should only
> initiate new connections, prompt for auth etc, if actually required -
> i.e.  if they are not on the same host.
>  * Two common operations that may lead us to connect to the same host
> twice with no a-priori knowledge we are doing so are 'bzr branch A B',
> and 'bzr checkout A B'. In the code for these operations we are opening
> a new bzrdir at a URL and there is currently no way to provide the
> transport in use by the bzrdir for A, to the call to open B, *as a
> possible transport to clone/chdir from*.
> requirements
>  * We do not want a global cache: separate code paths, i.e. different
> connections initiated by a webserver should not interact with each
> other. (In this example, the credentials for the transport would be
> coming from the end user, so having two requests share a transport would
> be a security hole). Tests also have had nasty interactions with cached
> transports.
>  * We want something that is easy to use.

I guess I never realized about point 3. But it makes sense, and means
that I'm happy enough with this proposal.

> 
> So heres a concrete proposal:
> 
> We add to the transport api a new clone_url() method. This is called
> clone_url because like clone() it makes a new transport, but unlike
> clone it requires a url, not a relative reference. We could name it
> chdir or something, but we should rename clone at that point too. (Also,
> clone() is well known in some object patterns as giving a new instance
> of the same type, so clone is IMO still appropriate). This new method
> will raise an error like 'RequiresNewTransport' when the url its being
> asked to clone to is one it cannot reach by any pattern of clone()
> calls.
> 
> We add to get_transport a keyword parameter 'reuse_connections' which
> will accept an iterable that yields transports. These connections will
> have clone_url() called on them preferentially. The first to return a
> Transport will short circuit the whole call. We probably need to be
> careful to ensure that local transports dont inappropriately accept
> things - probably having LocalTransport raise RequiresNewTransport
> unconditionally is the right thing to do.
> 
> Now, to reuse a connection in our examples above, its just:
> b_bzrdir =
> bzrlib.bzrdir.BzrDir.open_containing_with_transport(get_transport(b_url,
> reuse_connections=[a_bzrdir.root_transport]))
> 
> 
> Thoughts ?

It doesn't help our current "branch = Branch.open(passed_in_url)", so
all of our open and open_containing_calls would need a new parameter.
Which is a little bit ugly, but I can understand why you want to do it
as a parameter rather than a global in the library.

Maybe if we are going to start doing stuff like this, we should have
some sort of Session object (sort of like Transactions, only more at the
user level), which could be a user cache.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060531/8c0c5768/attachment.pgp