[MERGE] Add an InventoryEntry cache for xml deserialization

Sat Dec 13 00:52:58 GMT 2008

On 12 Dec 2008, John Arbash Meinel <john at arbash-meinel.com> wrote:
> Anyway, I ended up running into some cache-lifetime issues, and figured
> it would be reasonable to open it up for discussion how we want to
> handle it.
> 
> 1) Many, many, many tests re-use the same revision ids and file ids with
>    different actual values. This is pretty trivial to fix by adding a
>    "_entry_cache.clear()" as part of TestCase.setUp()
> 
> 2) There are a couple of tests that re-use file_ids and revision_ids
>    *within* the test, but in a different tree. I can fix this by
>    clearing the cache in the inner loop before the next permutation is
>    run, by changing it to not re-use the revision ids, etc.
>    But it did make me wonder if we would do better with a different fix.
> 
> 3) At one point I switched to having each XMLSerializer have its own
>    entry cache, in case that was actually necessary. ATM I think it is
>    actually a YAGNI.
> 
> 4) I've been considering restructuring things so that the repository
>    passes the entry_cache to self._serializer.deserialize_inventory(),
>    as this makes it obvious that the lifetime of the cache is bound to
>    the Repository. Which helps for us to know that the caching rules
>    won't be violated. The main downside is that something that ends up
>    dealing with two mostly-identical repositories will not benefit.
> 
>    The original use-case I was trying to handle is the "extract all
>    revision trees from the repository" which works just fine here, but
>    there are other use cases where an entry cache would be helpful.
> 
> I've gone down a couple different routes, and right now I'm leaning
> towards having a single entry cache, and just clearing it out at
> appropriate times during the test suite. In the "real-world" if we had
> revision_id collisions, we would have much more serious problems than
> just inventory entry caching, so I'm considering the problem as mostly
> just a test suite not being properly isolated between test cases issue.

In general it is nicer to have the cache owned by an object whose
lifetime is explicitly controlled by the caller, rather than held in a
global variable.  As you've encountered in this particular case it is
better for testing, not just for correctness but also for isolation.

I'd be inclined to try something between 3 and 4 where the repository
owns it but passes it to the serializer.  You're right though that there
is the issue that it won't be shared when that would be useful.

There may be other things that could usefully be cached across
repositories too.

-- 
Martin      <http://launchpad.net/~mbp>