Transport connection caches must go.

Tue May 23 15:51:39 BST 2006

Robert Collins wrote:
> On Mon, 2006-05-22 at 19:43 -0500, John Arbash Meinel wrote:
>> Robert Collins wrote:
>>> On Mon, 2006-05-22 at 08:32 -0500, John Arbash Meinel wrote:
>>>> Michael Ellerman wrote:
>>>
>>>> That is the way the current code is. But both ftp and sftp cache their
>>>> connections. It would seem reasonable that HTTP could do the same thing.
>>> please no. The SFTP connection cache is an ongoing burden and I'm about
>>> +5 for any patch that removes it with extreme prejuidice. 
>>>
>>> There was a previous thread on this a few weeks back, but the title
>>> wasn't changed unfortunately.
>>>
>>> -Rob
>> What would you prefer? I think we should have some sort of keep-alive.
>>
>> One could argue for a single BzrDir connection. But I think it would be
>> nice if doing "bzr branch sftp://foo/one sftp://foo/two" didn't require
>> you to login twice. And remembering the connection at the Transport
>> level seems the right place to do it.
>>
>> Right now the sftp cache is just a weakref dictionary, so it should
>> expire when nobody is using it anyway.
> 
> The intent of the cache is to avoid expensive operations like
> connecting, or passphrase entry, occurring more than once. However, a
> cache is not a guarantee of this. We need to structure our code such
> that it can never require a new connection, and if we do that the cache
> is no longer of benefit at all.
> 
> Having the cache makes tests more complex, has in fact caused bugs in
> tests, and generally adds complexity thats IMO unneeded.
> 
> In the specific case case you mention, its possible that a cache is the
> best way, but I'll think about this in more detail next week. 
> 
> Rob

The reason we have per-transport caches, is because we cache different
things for each transport. And the key for the cache may be different. I
personally don't like that the ftp cache includes the password in its
key, as I don't like to remember passwords.

We could cache the Transport objects themselves (say based on the netloc
portion of the URL), and then have get_transport() do a clone() if there
is a matching netloc.
And then we would add a transport.flush_cache() function, which could be
called by the test suite.

We could add a 'get_cacheable_object' to the Transport interface, and
then get_transport() would use that to figure out what needed to be
cached. Because we really want to cache the connection, not the Transport.

I personally prefer a "clear cache X seconds after the last one goes
away", rather than a strict release exactly after the last one, or hang
on until they are all gone.

We could move the weakref cache into transport/__init__.py, and each
Transport could use it however it wanted. Then we could still provide a
flush_cache() function.

Think about it some more, and we'll work something out. I *do* think we
want a cache, because there are too many ways to get to the same
location for us to restructure so that we 'can never require a new
connection'.

Maybe we could restructure Transport so that they all have some sort of
Connection, which could be maintained in a global state dictionary
rather than as a member of each Transport. I personally would like it if
connections expired eventually.

I look forward to whatever solutions you can think of.
John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060523/66978686/attachment.pgp