[MERGE] Re-order the packs during fetch

Wed Nov 26 11:12:42 GMT 2008

John Arbash Meinel wrote:
> The attached patch changes the fetch logic slightly. After we have
> determined/transmitted the revisions, we know what pack files those
> revisions are in. Which means we also know what pack files the
> inventories and texts are in. The existing code keeps looking in all
> pack indexes for the inventories and texts.
> 
> This patch just changes it, so that after we read revisions, we sort the
> packs so that the interesting ones are put at the beginning of the list.
> 
> In my testing, this helps a lot when we are transmitting just a small
> amount of changes, that tend to be localized in certain packs. For
> example, updating my bzr.1.9 branch with 1 new revision. The new code
> still reads a lot of .rix info to find what revisions to transmit, but
> then reads very little .iix and .tix because everything is in the first
> pack it looks for.

bb:tweak

These changes look good to me.  I have some cosmetic nitpicks, but nothing
substantial.

> === modified file 'bzrlib/repofmt/pack_repo.py'
> --- bzrlib/repofmt/pack_repo.py	2008-11-03 04:12:11 +0000
> +++ bzrlib/repofmt/pack_repo.py	2008-11-10 00:17:40 +0000
> @@ -568,6 +568,35 @@
>      def _extra_init(self):
>          """A template hook to allow extending the constructor trivially."""
>  
> +    def _pack_map_and_index_list(self, index_attribute, include_pack_obj=False):
> +        """Convert a list of packs to an index pack map and index list.
> +
> +        :param index_attribute: The attribute that the desired index is found
> +            on.
> +        :return: A tuple (map, list) where map contains the dict from
> +            index:pack_tuple, and lsit contains the indices in the preferred
> +            access order.
> +        """

“lsit” -> “list”

What's the include_pack_obj parameter for?  It seems to be unused by both the
implementation and the callers, so I guess it's a leftover of some earlier
hacking.

> @@ -616,6 +645,39 @@
>              index_builder_class=self._pack_collection._index_builder_class,
>              index_class=self._pack_collection._index_class)
>  
> +    def _update_pack_order(self, entries, index_to_pack_map):
> +        """Determine how we want our packs to be ordered.
> +
> +        This changes the sort order of self.packs, based on the order found in
> +        'entries'. This is mostly used to move unused packs to the end of the
> +        list, so that future requests can avoid probing them.
> +
> +        :param entries: A list of (index, ...) tuples
> +        :param index_to_pack_map: A mapping from index objects to pack objects.
> +        """

The docstring says this “changes the sort order ... based on the order found
in 'entries'”, which I guess is technically true, but it seems to me that
the more significant change is that the order of self.packs is changed based
on which packs are and aren't referenced (via an index) in 'entries'.  The
ordering of the entries seems to be a secondary factor (it determines the
secondary sort key, if you want to phrase it that way).

So the docstring isn't really wrong, but it does seem to emphasise the less
important aspect.

[...]
> @@ -1219,9 +1285,6 @@
>              return False
>          # XXX: the following may want to be a class, to pack with a given
>          # policy.
> -        mutter('Auto-packing repository %s, which has %d pack files, '
> -            'containing %d revisions into %d packs.', self, total_packs,
> -            total_revisions, self._max_pack_count(total_revisions))
>          # determine which packs need changing
>          pack_distribution = self.pack_distribution(total_revisions)
>          existing_packs = []
> @@ -1242,6 +1305,13 @@
>              existing_packs.append((revision_count, pack))
>          pack_operations = self.plan_autopack_combinations(
>              existing_packs, pack_distribution)
> +        num_new_packs = len(pack_operations)
> +        num_old_packs = sum([len(po[1]) for po in pack_operations])
> +        num_revs_effected = sum([po[0] for po in pack_operations])
> +        mutter('Auto-packing repository %s, which has %d pack files, '
> +            'containing %d revisions. Packing %d files into %d effecting %d'
> +            ' revisions', self, total_packs, total_revisions, num_old_packs,
> +            num_new_packs, num_revs_effected)
>          self._execute_pack_operations(pack_operations)
>          return True

“effecting” should be “affecting” I think.

I think this change isn't strictly related to the rest of this patch, but it
seems reasonable.

> === modified file 'bzrlib/tests/test_repository.py'
> --- bzrlib/tests/test_repository.py	2008-10-29 21:39:27 +0000
> +++ bzrlib/tests/test_repository.py	2008-11-10 01:45:49 +0000
> @@ -1049,8 +1049,36 @@
>  class TestPacker(TestCaseWithTransport):
>      """Tests for the packs repository Packer class."""
>  
> -    # To date, this class has been factored out and nothing new added to it;
> -    # thus there are not yet any tests.
> +    def test_pack_optimizes_pack_order(self):
[...]
> +        # Now, when we are copying the B & C revisions, their pack files should
> +        # be moved to the front of the stack
> +        # The new ordering moves B & C to the front, and leaves the others in
> +        # the original order.

I'd elaborate this comment slightly to say “to the front of the .packs
attribute”, to be clear that that's all it does.  When I first saw this test
I thought it might be testing something about the pack-names file.  Or
perhaps add an introductory comment/docstring to the test method to make it
clear that this is a test about the contents of the .packs attribute.

-Andrew.