[MERGE] Move RemoteRepository caching logic to CachedParentsProvider
Aaron Bentley
aaron at aaronbentley.com
Wed Dec 10 20:26:32 GMT 2008
Andrew Bennetts wrote:
> Aaron Bentley wrote:
>> Andrew Bennetts wrote:
>>> I think this is because it unfortunately isn't safe to cache misses in
>>> RemoteRepository, because the cache of misses doesn't get invalidated when
>>> revisions are added to its _real_repository.
>> I have made miss-caching optional, and disabled it for repository caches.
>>
>> All tests pass with these changes.
>
> bb:tweak
>
> Looks good, although I spotted a bug. And I'm feeling a bit fussy about
> tests today... See comments inline.
>
> [...]
>> + self._parents_map = {}
>
> I think I preferred the old variable name. It's not a big deal though, so
> if you prefer it this way I don't mind.
Hey, it's your variable name. Changed.
> The comment about using an
> LRUCache still seems applicable, though.
I'm not sure whether an LRUCache allows caching misses. But that
comment's been there for ages, no one's changed it, so it feels like litter.
>> + self._cache_misses = True
>> + self._debug = debug
>> + if self._debug:
>> + self._requested_parents = None
>>
>> def __repr__(self):
>> return "%s(%r)" % (self.__class__.__name__, self._real_provider)
>>
>> + def enable_cache(self, cache_misses=True):
>> + """Enable cache."""
>> + self._parents_map = {}
>> + self._cache_misses = cache_misses
>> + if self._debug:
>> + self._requested_parents = set()
>
> enable_cache not only enables caching, it also clears an already enabled
> cache. I think that's a bit dangerous.
How? It's a cache. It shouldn't contain anything we can't get again.
> How about making it raise
> AssertionError if self._parents_map is not None?
I don't think that the current policy against assertions is sensible.
But since there is one, I don't think I should be raising
AssertionErrors, either.
> If caching is optional, then the constructor's docstring should explicitly
> state whether caching is initially enabled or disabled.
Done.
>> + def disable_cache(self):
>> + """Disable cache."""
>> + self._parents_map = None
>> + if self._debug:
>> + self._requested_parents = None
>
> This method also clears the cache. I think that's reasonable behaviour, but
> you should mention it in the docstring.
Done.
>
>> def get_parent_map(self, keys):
>> - """See _StackedParentsProvider.get_parent_map"""
> [...]
>> + """See RemoteRepository.get_parent_map."""
>
> It's not a big deal, but I think this docstring should refer to something
> other than RemoteRepository, probably _StackedParentsProvider.
Done.
> If only we
> had a ParentProvider abstract base class/interface class, so there was a
> clearly canonical place to document get_parent_map. Oh well.
Duck typing, man. It's the way of the future.
>> + # Hack to build up the caching logic.
>> + ancestry = self._parents_map
>> + if ancestry is None:
>> + # Caching is disabled.
>> + missing_revisions = set(keys)
>> + ancestry = {}
>> + else:
>> + missing_revisions = set(key for key in keys if key not in ancestry)
>> + if missing_revisions:
>> + parent_map = self._get_parent_map(missing_revisions)
>> + if self._debug:
>> + mutter('re-retrieved revisions: %d of %d',
>> + len(set(ancestry).intersection(parent_map)),
>> + len(parent_map))
>> + ancestry.update(parent_map)
>> + if self._cache_misses:
>> + ancestry.update(dict((k, None) for k in missing_revisions
>> + if k not in parent_map))
>
> Add a comment here to say that None is never a valid parents list, so None
> is a safe value to use to record a miss.
Done.
>> === modified file 'bzrlib/remote.py'
>> --- bzrlib/remote.py 2008-12-01 23:50:52 +0000
>> +++ bzrlib/remote.py 2008-12-03 05:09:44 +0000
>> @@ -287,17 +287,6 @@
> [...]
>> @@ -481,7 +470,8 @@
>> other_repository.bzrdir.transport.base !=
>> self.bzrdir.transport.base):
>> parents_provider = graph._StackedParentsProvider(
>> - [parents_provider, other_repository._make_parents_provider()])
>> + [self._unstacked_provider,
>> + other_repository._make_parents_provider()])
>> return graph.Graph(parents_provider)
>
> Is this change correct?
No, it appears not.
> === modified file 'bzrlib/tests/per_repository/test_add_fallback_repository.py'
> --- bzrlib/tests/per_repository/test_add_fallback_repository.py 2008-09-04 20:32:04 +0000
> +++ bzrlib/tests/per_repository/test_add_fallback_repository.py 2008-12-10 02:31:19 +0000
> @@ -16,7 +16,11 @@
>
> """Tests for Repository.add_fallback_repository."""
>
> -from bzrlib import errors
> +from bzrlib import (
> + bzrdir,
> + errors,
> + remote,
> + )
> from bzrlib.revision import NULL_REVISION
> from bzrlib.tests import TestNotApplicable
> from bzrlib.tests.per_repository import TestCaseWithRepository
> @@ -25,7 +29,14 @@
> class TestAddFallbackRepository(TestCaseWithRepository):
>
> def test_add_fallback_repository(self):
> - repo = self.make_repository('repo')
> + if isinstance(self.repository_format, remote.RemoteRepositoryFormat):
> + # RemoteRepository by default builds a default format real
> + # repository, but the default format is unstackble. So explicitly
> + # make a stackable real repository and use that.
> + repo = self.make_repository('repo', format='1.9')
> + repo = bzrdir.BzrDir.open(self.get_url('repo')).open_repository()
> + else:
> + repo = self.make_repository('repo')
> tree = self.make_branch_and_tree('branch')
> if not repo._format.supports_external_lookups:
> self.assertRaises(errors.UnstackableRepositoryFormat,
> @@ -45,3 +56,10 @@
> # ... or on the repository directly...
> self.assertEqual({revision_id: (NULL_REVISION,)},
> repo.get_parent_map([revision_id]))
> + # ... or on the repository's graph.
> + self.assertEqual({revision_id: (NULL_REVISION,)},
> + repo.get_graph().get_parent_map([revision_id]))
> + # ... or on the repository's graph, when there is an other repository.
> + other = self.make_repository('other')
> + self.assertEqual({revision_id: (NULL_REVISION,)},
> + repo.get_graph(other).get_parent_map([revision_id]))
>
> (As you can see, this patch also fixes this test so that it actually applies
> to RemoteRepository! And ideally I think there should be multiple test methods
> rather than piggy-backing lots of tests on the end of one method, but I
> don't think that's worth stalling this branch for.)
>
> AFAICT, RemoteRepository.get_graph should just be identical to
> Repository.get_graph.
I disagree. It doesn't make much sense to put a StackedParentsProvider
inside a StackedParentsProvider. We should just create a
StackedParentsProvider that includes:
- self._unstacked_parents_provider
- all fallback providers
- other_repository's parents provider
RemoteRepository.get_graph should assume that access to self is slower
than access to other_repository, and so it should put other_repository
first in the stacking order.
>
>> === modified file 'bzrlib/tests/test_graph.py'
>> --- bzrlib/tests/test_graph.py 2008-10-30 15:20:00 +0000
>> +++ bzrlib/tests/test_graph.py 2008-12-03 05:09:44 +0000
>> @@ -1379,13 +1379,13 @@
>>
>> def test_get_parent_map(self):
>> """Requesting the same revision should be returned from cache"""
>> - self.assertEqual({}, self.caching_pp._cache)
>> + self.assertEqual({}, self.caching_pp.get_cached_map())
>> self.assertEqual({'a':('b',)}, self.caching_pp.get_parent_map(['a']))
>> self.assertEqual(['a'], self.inst_pp.calls)
>> self.assertEqual({'a':('b',)}, self.caching_pp.get_parent_map(['a']))
>> # No new call, as it should have been returned from the cache
>> self.assertEqual(['a'], self.inst_pp.calls)
>> - self.assertEqual({'a':('b',)}, self.caching_pp._cache)
>> + self.assertEqual({'a':('b',)}, self.caching_pp.get_cached_map())
>>
>> def test_get_parent_map_not_present(self):
>> """The cache should also track when a revision doesn't exist"""
>> @@ -1394,7 +1394,7 @@
>> self.assertEqual({}, self.caching_pp.get_parent_map(['b']))
>> # No new calls
>> self.assertEqual(['b'], self.inst_pp.calls)
>> - self.assertEqual({'b':None}, self.caching_pp._cache)
>> + self.assertEqual({'b':None}, self.caching_pp._parents_map)
>
> Why aren't these tests consistent about whether they use get_cached_map() or
> _parents_map?
get_cached_map does not include cache misses.
> I think they should be consistent, probably in favour of
> get_cached_map().
Can't do it in favour of get_cached_map, so _parents_map (now _cache) it is.
>> +class TestCachingParentsProviderExtras(tests.TestCaseWithTransport):
>
> “Extras” What are “Extras” here? A docstring is called for.
Done.
> This appears to be a couple of unit tests in a single method. The first
> line is an assertion, and is repeated as the first line of another test.
> Make a new test method for it and delete it from test_cached and
> test_cache_misses.
>
> def test_cache_initially_empty(self):
> self.assertEqual({}, self.caching_pp.get_cached_map())
Done.
> Similarly, there's a test about the behaviour of disable_cache() clearing
> the cache tacked on the end. Make an explicit test for that, too.
Done.
Aaron
More information about the bazaar
mailing list