[MERGE] Add an InventoryEntry cache for xml deserialization
Vincent Ladeuil
v.ladeuil+lp at free.fr
Sat Dec 13 09:36:35 GMT 2008
>>>>> "jam" == John Arbash Meinel <john at arbash-meinel.com> writes:
<snip/>
jam> Anyway, I ended up running into some cache-lifetime
jam> issues, and figured it would be reasonable to open it up
jam> for discussion how we want to handle it.
jam> 1) Many, many, many tests re-use the same revision ids
jam> and file ids with different actual values.
Because we rely on the assumption that we don't use any shared
fixture in TDD parlance.
jam> This is pretty trivial to fix by adding a
jam> "_entry_cache.clear()" as part of TestCase.setUp()
Deadly sin (I'm sure you know that or you wouldn't bring that up
to discussion, don't you ? :-).
Are you really suggesting that all global caches should be
cleared there ? TestCase.setUp() should remain minimal IMHO.
The last time we introduced a global cache we end up getting rid
of it (sftp connections).
And we get rid of it mostly for test purposes (if I remember
right) on the basis that global caches are evil.
jam> 2) There are a couple of tests that re-use file_ids and
jam> revision_ids *within* the test, but in a different
jam> tree.
And a different repository ?
jam> I can fix this by clearing the cache in the inner loop
jam> before the next permutation is run, by changing it to
jam> not re-use the revision ids, etc. But it did make me
jam> wonder if we would do better with a different fix.
And you're right, this is clearly an indication that our policy
to *avoid* global caches has merits.
And it also explain why so many tests were failing. I wanted to
review your patch yesterday and was scared by the number of test
failures... I'm far less scared now that you mention the root
cause.
And to me, the fact that introducing a *cache* breaks so many
tests is a clear indication that something is wrong in the cache
implementation or usage, *not* in the test suite.
jam> 3) At one point I switched to having each XMLSerializer
jam> have its own entry cache, in case that was actually
jam> necessary. ATM I think it is actually a YAGNI.
jam> 4) I've been considering restructuring things so that
jam> the repository passes the entry_cache to
jam> self._serializer.deserialize_inventory(), as this makes
jam> it obvious that the lifetime of the cache is bound to
jam> the Repository.
The heart of the problem as you rightly point it out here is the
lifetime of the cache.
By making it global you cheaply maximize the reuse rate at the
expense of memory consumption but doesn't really address the
lifetime problem. Who will clear the cache ? When ?
My vote would go to either 3) or 4) depending on who is really
responsible for the lifetime and what make managing (clearing,
resizing, sharing) the cache the most natural
jam> Which helps for us to know that the caching rules won't
jam> be violated. The main downside is that something that
jam> ends up dealing with two mostly-identical repositories
jam> will not benefit.
Should be rare enough to neglect. At worst, and only if really
worth it, the two repos can negotiate to share their caches.
jam> The original use-case I was trying to handle is the
jam> "extract all revision trees from the repository"
jam> which works just fine here, but there are other use
jam> cases where an entry cache would be helpful.
I'd love to have one for "bzr log -v"... but i"m pretty sure that
in that case I'd love to be able to clear it (or at least purge
it more aggressively), at various points.
Don't we have cases where we'd want two (or more) different
caches targeted at different history points ?
In that case maybe we'd want to be able to attach/detach caches
to the serializers themselves ?
And what if bzrlib is used for several repositories ?
And what if we want to take the working tree size into account
when dimensioning the cache ?
Said otherwise: you just find a case when a change in the cache
policy was a net improvement, it's obvious to me that more cases
will appear in the future. The solution then will not be to
multiply global caches.
I want your patch to land quickly so I'm not asking for a perfect
solution *now*, use any solution you want without falling into
YAGNIs, but please avoid the global cache approach.
jam> I've gone down a couple different routes, and right now
jam> I'm leaning towards having a single entry cache, and
jam> just clearing it out at appropriate times during the
jam> test suite.
jam> In the "real-world" if we had revision_id collisions, we
jam> would have much more serious problems than just
jam> inventory entry caching,
Sure.
jam> so I'm considering the problem as mostly just a test
jam> suite not being properly isolated between test cases
jam> issue.
I disagree there, the test suite relies on the assumption that we
don't use shared fixtures[1], let's keep that assumption.
Vincent
[1] http://xunitpatterns.com/Shared Fixture.html
More information about the bazaar
mailing list