[MERGE] Make merge more than 2x faster

Thu Jul 12 22:41:06 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> Hi all,
> 
> This patch improves the speed of merge by
> - avoiding a pointless round of conflict detection
> - avoiding duplicate_entry detection when no files have been added
> - avoiding checking whether the base revision is an ancestor if it was
>   selected via least-common-ancestor
> - using set_parent_trees instead of add_parent
> - using dirstate revision trees where possible
> - caching revision trees for later use
> 
> Combined, for my example, they reduce total execution time from ~3.2
> seconds to ~1.5 seconds.
> 
> Aaron

+    def _add_parent(self):

This seems like it should be called '_add_parents()' since it adds any new
parents that it finds.

As for performance improvements, one issue with DirState as it stands, is that
it requires 2 passes to update the working inventory and the parent trees
(rather than a single: here is all your new information). I'm guessing text
extraction is much more of a problem, but it might be something to think about.
(You've done more profiling in this area than I have).


Have you tried it with my knit_index_pyrex patch? It should help merge a lot if
it is reading data from different file knits. (Like merging 100 revisions of
bzr.dev into an old branch).


+        try:
+            pb = ui.ui_factory.nested_progress_bar()
+            try:
+                this_repo = self.this_branch.repository
+                graph = this_repo.get_graph()
+                revisions = [ensure_null(self.this_basis),
+                             ensure_null(self.other_basis)]
+                if NULL_REVISION in revisions:
+                    self.base_rev_id = NULL_REVISION
+                else:
+                    self.base_rev_id = graph.find_unique_lca(*revisions)
+                    if self.base_rev_id == NULL_REVISION:
+                        raise UnrelatedBranches()
+            finally:
+                pb.finished()
+        except NoCommonAncestor:
+            raise UnrelatedBranches()

^- It seems like your try/finally for the pb should be outside your try/except
NoCommonAncestor, though the except NoCommonAncestor seems completely
unnecessary at this point, since it looks like find_unique_lca returns
NULL_REVISION rather thain raising NoCommonAncestor.


+    def _maybe_fetch(self, source, target, revision_id):
+        if (source.repository.bzrdir.root_transport.base !=
+            target.repository.bzrdir.root_transport.base):
+            target.fetch(source, revision_id)
+

^- It is a shame that you have to do this sort of work. It would probably be
better to put this sort of effort into Repository.fetch() itself.
We could even have an "SameRepoFetcher()" that just sees that source and target
are the same repository, and does nothing. I'm not going to say you have to do
it, just that this makes it obvious to me that our current Repository.fetch()
is unnecessarily slow.

=== modified file bzrlib/workingtree_4.py // last-changed:abentley at panoramicfee
... dback.com-20070712170702-kefnhuo926w6cvhz
- --- bzrlib/workingtree_4.py
+++ bzrlib/workingtree_4.py
@@ -1488,6 +1488,10 @@
             return parent_details[1]
         return None

+    def get_weave(self, file_id):
+        return self._repository.weave_store.get_weave(file_id,
+                self._repository.get_transaction())
+

^- Why is this necessary?

Anyway, good job overall. Merge has certainly been a command that I "fire and
forget" until later, it will be nice when it is sub-heartbeat.


John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlp/yJdeBCYSNAAMRAgQ/AKCvPJCxP95/WdFNY7H7XgMd4pLwfgCg04NC
1KPqLioJsO/3BkIMZS/ZVjE=
=ms1K
-----END PGP SIGNATURE-----