[MERGE] Make push/pull/merge faster.

Tue Oct 30 18:15:59 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Merge with long histories is very slow - 90% of the time is in loading
> indices from both repositories.
> 
> This patch fixes this, but means that packs won't clone ghosts by
> default. I've filed https://bugs.edge.launchpad.net/bzr/+bug/158774 to
> address this in the future.
> 
> I think this is good for 0.92 - it makes packs look even nicer and
> doesn't affect knits.
> 
> Cheers,
> Rob
> 

You seem to set "supports_ghosts = True" for KnitRepo, but not for PackRepo. It
may be inherited, but setting it explicitly would be good.

+        if not find_ghosts and revision_id is not None:
+            graph = self.source.get_graph()
+            missing_revs = set()
+            searcher = graph._make_breadth_first_searcher([revision_id])
+            target_index = \
+                self.target._pack_collection.revision_index.combined_index
+            null_set = frozenset([_mod_revision.NULL_REVISION])
+            while True:
+                try:
+                    next_revs = set(searcher.next())
+                except StopIteration:
+                    break
+                next_revs.difference_update(null_set)

^- Why do you need to remove the null revisions? Just so you don't query the
target index for them?

+                target_keys =list ((key,) for key in next_revs)

^- This has a PEP8 typo, but even more, it seems odd to use a list on a
generator, than just doing a list comprehension

target_keys = [(key,) for key in next_revs]

+                have_revs = frozenset(node[1][0] for node in
target_index.iter_entries(target_keys))

^- This is an overly long line.

have_revs = frozenset(node[1][0] for node
                      in target_index.iter_entries(target_keys))
or something along those lines. (we each seem to have a slightly different
preference when wrapping)

+                missing_revs.update(next_revs - have_revs)
+                searcher.stop_searching_any(have_revs)
+            return missing_revs

The doc string for "iter_entries" has:
        :return: An iterable as per iter_all_entries, but restricted to the
            keys supplied. No additional keys will be returned, and every
            key supplied that is in the index will be returned.

It isn't perfectly clear that this will skip requested entries that are not
present. Or maybe it does, but it takes a long time to get there.

:return: Iterate over tuples (like iter_all_entries) for all keys which are
present. Keys not present will be skipped, keys present but not requested will
not be returned.

It also seems weird to have the returned tuples be 3 or 4 based on what is present:
        :return: An iterable of (index, key, value) or (index, key, value,
reference_lists).
            The former tuple is used when there are no reference lists in the
            index, making the API compatible with simple key:value index types.
            There is no defined order for the result iteration - it will be in
            the most efficient order for the index.

Since you still have the "index" object being returned, it isn't like you can
do: dict(foo.iter_all_entries). So having it have a fixed return value seems
better. (Yes this is late to realize, but still seems worthwhile.)

Otherwise having a "fetch_ghosts=True/False" is fine with me.

BB:tweak

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHJ3TeJdeBCYSNAAMRAh8gAJ4mIOe0MZ2Nsi414VoK1VEQKYa4ygCdE4FM
WXpG/M/c1/TtVUuiapC8ujM=
=ZeXK
-----END PGP SIGNATURE-----