[MERGE] Add an InventoryEntry cache for xml deserialization

Fri Dec 12 21:16:05 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Robert Collins has voted approve.
> Status is now: Approved
> Comment:
>> 1) First an assumption. For a given (file_id, revision_id) pair, the
>> InventoryEntry must be the same.
> 
> This is only guaranteed true for the same model repo - E.g. tree roots
> are different between plain and rich-root. Also there was a very short
> lived bug back around 0.0.8 that misrecorded the exec bit for about 2
> commits; but I think this is ignorable.
> 
> split-inv might benefit from a cache too; and for split-inv the source
> string is a safe cache key (it is safe in xml inventories too).
> 

"source string" ? You mean the 'value' for split-inv, or the raw line of
xml for others?

Anyway, I ended up running into some cache-lifetime issues, and figured
it would be reasonable to open it up for discussion how we want to
handle it.

1) Many, many, many tests re-use the same revision ids and file ids with
   different actual values. This is pretty trivial to fix by adding a
   "_entry_cache.clear()" as part of TestCase.setUp()

2) There are a couple of tests that re-use file_ids and revision_ids
   *within* the test, but in a different tree. I can fix this by
   clearing the cache in the inner loop before the next permutation is
   run, by changing it to not re-use the revision ids, etc.
   But it did make me wonder if we would do better with a different fix.

3) At one point I switched to having each XMLSerializer have its own
   entry cache, in case that was actually necessary. ATM I think it is
   actually a YAGNI.

4) I've been considering restructuring things so that the repository
   passes the entry_cache to self._serializer.deserialize_inventory(),
   as this makes it obvious that the lifetime of the cache is bound to
   the Repository. Which helps for us to know that the caching rules
   won't be violated. The main downside is that something that ends up
   dealing with two mostly-identical repositories will not benefit.

   The original use-case I was trying to handle is the "extract all
   revision trees from the repository" which works just fine here, but
   there are other use cases where an entry cache would be helpful.

I've gone down a couple different routes, and right now I'm leaning
towards having a single entry cache, and just clearing it out at
appropriate times during the test suite. In the "real-world" if we had
revision_id collisions, we would have much more serious problems than
just inventory entry caching, so I'm considering the problem as mostly
just a test suite not being properly isolated between test cases issue.

Thoughts?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklC1JUACgkQJdeBCYSNAAPWRQCfTcOoqF//Tz7GVzKD7UOLUfWz
hJwAoMKKACE9oJkXWbPngJTPbnTlgV6v
=WtcL
-----END PGP SIGNATURE-----