[MERGE] Add an InventoryEntry cache for xml deserialization

Sat Dec 13 16:00:11 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
...

> By making it global you cheaply maximize the reuse rate at the
> expense of memory consumption but doesn't really address the
> lifetime problem. Who will clear the cache ? When ?
> 
> My vote would go to either 3) or 4) depending on who is really
> responsible for the lifetime and what make managing (clearing,
> resizing, sharing) the cache the most natural

I didn't make this clear earlier, but (3) is not sufficient. The
serializers themselves are global objects, so adding a cache to each
just makes you have 4 global caches.

Technically the serializers are also shared fixtures, but they are
defined as not having any state, so it hasn't been a problem. (__slots__
= []).

We do have other bits of global state. Most notably the ui_factory,
combined with nested-progress-bars.

At least, when I tried to run bzrlib with threads the first part that
started failing was the progress bar code. (When .finished() it assumes
that it is the last one on the stack.)

Anyway, I'm guessing you missed my other email where I implemented (4)
anyway.

> 
>     jam> Which helps for us to know that the caching rules won't
>     jam> be violated. The main downside is that something that
>     jam> ends up dealing with two mostly-identical repositories
>     jam> will not benefit.
> 
> Should be rare enough to neglect. At worst, and only if really
> worth it, the two repos can negotiate to share their caches.
> 
>     jam>    The original use-case I was trying to handle is the
>     jam>    "extract all revision trees from the repository"
>     jam>    which works just fine here, but there are other use
>     jam>    cases where an entry cache would be helpful.
> 
> I'd love to have one for "bzr log -v"... but i"m pretty sure that
> in that case I'd love to be able to clear it (or at least purge
> it more aggressively), at various points.

There is always entry_cache.clear() for all of the caches I've written.
But I'm wondering what your specific use case is. Or are you thinking
this is more something that is being written to disk?

> 
> Don't we have cases where we'd want two (or more) different
> caches targeted at different history points ?
> 
> In that case maybe we'd want to be able to attach/detach caches
> to the serializers themselves ?
> 
> And what if bzrlib is used for several repositories ? 
> 
> And what if we want to take the working tree size into account
> when dimensioning the cache ?

Actually we already do. At the end of deserializing an inventory, I make
sure to resize() to 2x the size of the inventory. Otherwise you pretty
much just thrash the cache.

It is still a reasonable upper bound, as we can't do much without
holding 2 inventories in memory (at least not yet :).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklD3AsACgkQJdeBCYSNAANBTwCfbSEGgwmYr6ZHLU8WqpmRpXcb
U4gAniYJPmUTrkoOJFWhV+/2IRa0yJQK
=SZ/r
-----END PGP SIGNATURE-----