Rev 3443: (jam) refactor find_unique_ancestors for clarity, in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Wed May 21 06:46:41 BST 2008


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 3443
revision-id:pqm at pqm.ubuntu.com-20080521054623-5ayhndccwr6o45uq
parent: pqm at pqm.ubuntu.com-20080521031911-acjem4ky3hjs8gjh
parent: john at arbash-meinel.com-20080521021300-2pqe6eso7w3jrd0u
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2008-05-21 06:46:23 +0100
message:
  (jam) refactor find_unique_ancestors for clarity,
  	and collapse searchers for performance
modified:
  bzrlib/graph.py                graph_walker.py-20070525030359-y852guab65d4wtn0-1
  bzrlib/tests/test_graph.py     test_graph_walker.py-20070525030405-enq4r60hhi9xrujc-1
    ------------------------------------------------------------
    revno: 3377.3.42.1.9
    revision-id:john at arbash-meinel.com-20080521021300-2pqe6eso7w3jrd0u
    parent: john at arbash-meinel.com-20080521015055-gs6pzl7zpbcyj2p5
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: find_unique_ancestors
    timestamp: Tue 2008-05-20 21:13:00 -0500
    message:
      STEP every 5
    modified:
      bzrlib/graph.py                graph_walker.py-20070525030359-y852guab65d4wtn0-1
    ------------------------------------------------------------
    revno: 3377.3.42.1.8
    revision-id:john at arbash-meinel.com-20080521015055-gs6pzl7zpbcyj2p5
    parent: john at arbash-meinel.com-20080520232241-hodoh6db5wslse1o
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: find_unique_ancestors
    timestamp: Tue 2008-05-20 20:50:55 -0500
    message:
      Final tweaks from Ian
    modified:
      bzrlib/graph.py                graph_walker.py-20070525030359-y852guab65d4wtn0-1
    ------------------------------------------------------------
    revno: 3377.3.42.1.7
    revision-id:john at arbash-meinel.com-20080520232241-hodoh6db5wslse1o
    parent: john at arbash-meinel.com-20080520231020-wh8g3pq0whu2p78i
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: find_unique_ancestors
    timestamp: Tue 2008-05-20 18:22:41 -0500
    message:
      Small documentation and code wrapping cleanup
    modified:
      bzrlib/graph.py                graph_walker.py-20070525030359-y852guab65d4wtn0-1
    ------------------------------------------------------------
    revno: 3377.3.42.1.6
    revision-id:john at arbash-meinel.com-20080520231020-wh8g3pq0whu2p78i
    parent: john at arbash-meinel.com-20080513221726-vqpvwthr1tncpb29
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: find_unique_ancestors
    timestamp: Tue 2008-05-20 18:10:20 -0500
    message:
      Lots of refactoring for find_unique_ancestors.
      
      Split up the function into a bunch of helper functions.
    modified:
      bzrlib/graph.py                graph_walker.py-20070525030359-y852guab65d4wtn0-1
    ------------------------------------------------------------
    revno: 3377.3.42.1.5
    revision-id:john at arbash-meinel.com-20080513221726-vqpvwthr1tncpb29
    parent: john at arbash-meinel.com-20080507220932-1ofutrrd6eu1dtdy
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: find_unique_ancestors
    timestamp: Tue 2008-05-13 17:17:26 -0500
    message:
      Several updates. A bit more debug logging, only step the all_unique searcher 1/10th of the time.
    modified:
      bzrlib/graph.py                graph_walker.py-20070525030359-y852guab65d4wtn0-1
    ------------------------------------------------------------
    revno: 3377.3.42.1.4
    revision-id:john at arbash-meinel.com-20080507220932-1ofutrrd6eu1dtdy
    parent: john at arbash-meinel.com-20080506220718-3nqdforqiwifqvga
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: find_unique_ancestors
    timestamp: Wed 2008-05-07 17:09:32 -0500
    message:
      Restore the previous code, but bring in a couple changes. Including an update to have lsprof show where the time is spent.
    modified:
      bzrlib/graph.py                graph_walker.py-20070525030359-y852guab65d4wtn0-1
    ------------------------------------------------------------
    revno: 3377.3.42.1.3
    revision-id:john at arbash-meinel.com-20080506220718-3nqdforqiwifqvga
    parent: john at arbash-meinel.com-20080506205431-sr3y667ytc6t1pdu
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: find_unique_ancestors
    timestamp: Tue 2008-05-06 17:07:18 -0500
    message:
      try a different method which collects searchers together. However, it ends up performing worse than the existing form
    modified:
      bzrlib/graph.py                graph_walker.py-20070525030359-y852guab65d4wtn0-1
    ------------------------------------------------------------
    revno: 3377.3.42.1.2
    revision-id:john at arbash-meinel.com-20080506205431-sr3y667ytc6t1pdu
    parent: john at arbash-meinel.com-20080501224934-3i8kxtuyr9r21711
    parent: pqm at pqm.ubuntu.com-20080505231558-7w3zaehbvtcjk7jv
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: find_unique_ancestors
    timestamp: Tue 2008-05-06 15:54:31 -0500
    message:
      Merge in the bzr.dev changes
    added:
      bzrlib/tests/branch_implementations/test_check.py test_check.py-20080429151303-1sbfclxhddpz0tnj-1
      bzrlib/tests/branch_implementations/test_reconcile.py test_reconcile.py-20080429161555-qlmccuyeyt6pvho7-1
      bzrlib/tests/interrepository_implementations/test_fetch.py test_fetch.py-20080425213627-j60cjh782ufm83ry-1
    modified:
      Makefile                       Makefile-20050805140406-d96e3498bb61c5bb
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzr                            bzr.py-20050313053754-5485f144c7006fa6
      bzrlib/__init__.py             __init__.py-20050309040759-33e65acf91bbcd5d
      bzrlib/branch.py               branch.py-20050309040759-e4baf4e0d046576e
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
      bzrlib/bundle/bundle_data.py   read_changeset.py-20050619171944-c0d95aa685537640
      bzrlib/bundle/serializer/v4.py v10.py-20070611062757-5ggj7k18s9dej0fr-1
      bzrlib/config.py               config.py-20051011043216-070c74f4e9e338e8
      bzrlib/diff.py                 diff.py-20050309040759-26944fbbf2ebbf36
      bzrlib/errors.py               errors.py-20050309040759-20512168c4e14fbd
      bzrlib/fetch.py                fetch.py-20050818234941-26fea6105696365d
      bzrlib/help_topics/__init__.py help_topics.py-20060920210027-rnim90q9e0bwxvy4-1
      bzrlib/hooks.py                hooks.py-20070325015548-ix4np2q0kd8452au-1
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/lockable_files.py       control_files.py-20051111201905-bb88546e799d669f
      bzrlib/log.py                  log.py-20050505065812-c40ce11702fe5fb1
      bzrlib/mutabletree.py          mutabletree.py-20060906023413-4wlkalbdpsxi2r4y-2
      bzrlib/reconcile.py            reweave_inventory.py-20051108164726-1e5e0934febac06e
      bzrlib/reconfigure.py          reconfigure.py-20070908040425-6ykgo7escxhyrg9p-1
      bzrlib/remote.py               remote.py-20060720103555-yeeg2x51vn0rbtdp-1
      bzrlib/repofmt/knitrepo.py     knitrepo.py-20070206081537-pyy4a00xdas0j4pf-1
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/revisionspec.py         revisionspec.py-20050907152633-17567659fd5c0ddb
      bzrlib/revisiontree.py         revisiontree.py-20060724012533-bg8xyryhxd0o0i0h-1
      bzrlib/smart/repository.py     repository.py-20061128022038-vr5wy5bubyb8xttk-1
      bzrlib/smart/server.py         server.py-20061110062051-chzu10y32vx8gvur-1
      bzrlib/status.py               status.py-20050505062338-431bfa63ec9b19e6
      bzrlib/store/revision/knit.py  knit.py-20060303020652-de5fa299e941a3c7
      bzrlib/store/revision/text.py  text.py-20060303020652-e49155f0da4d14ab
      bzrlib/symbol_versioning.py    symbol_versioning.py-20060105104851-9ecf8af605d15a80
      bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
      bzrlib/tests/blackbox/test_branch.py test_branch.py-20060524161337-noms9gmcwqqrfi8y-1
      bzrlib/tests/blackbox/test_hooks.py test_hooks.py-20080308163236-xljgf9j41hik1x21-1
      bzrlib/tests/blackbox/test_reconcile.py test_fix.py-20060223013051-9a188e15a5ee9451
      bzrlib/tests/blackbox/test_version.py test_version.py-20070312060045-ol7th9z035r3im3d-1
      bzrlib/tests/branch_implementations/__init__.py __init__.py-20060123013057-b12a52c3f361daf4
      bzrlib/tests/branch_implementations/test_branch.py testbranch.py-20050711070244-121d632bc37d7253
      bzrlib/tests/branch_implementations/test_commit.py test_commit.py-20070206022134-117z1i5b644p63r0-1
      bzrlib/tests/branch_implementations/test_hooks.py test_hooks.py-20070129154855-blhpwxmvjs07waei-1
      bzrlib/tests/branch_implementations/test_pull.py test_pull.py-20060410103942-83c35b26657414fc
      bzrlib/tests/branch_implementations/test_push.py test_push.py-20070130153159-fhfap8uoifevg30j-1
      bzrlib/tests/branch_implementations/test_uncommit.py test_uncommit.py-20070205180410-ge7058d9138mvq3x-1
      bzrlib/tests/interrepository_implementations/__init__.py __init__.py-20060220054744-baf49a1f88f17b1a
      bzrlib/tests/interrepository_implementations/test_interrepository.py test_interrepository.py-20060220061411-1ec13fa99e5e3eee
      bzrlib/tests/repository_implementations/test_fetch.py test_fetch.py-20070814052151-5cxha9slx4c93uog-1
      bzrlib/tests/repository_implementations/test_repository.py test_repository.py-20060131092128-ad07f494f5c9d26c
      bzrlib/tests/test_bundle.py    test.py-20050630184834-092aa401ab9f039c
      bzrlib/tests/test_config.py    testconfig.py-20051011041908-742d0c15d8d8c8eb
      bzrlib/tests/test_fetch.py     testfetch.py-20050825090644-f73e07e7dfb1765a
      bzrlib/tests/test_graph.py     test_graph_walker.py-20070525030405-enq4r60hhi9xrujc-1
      bzrlib/tests/test_hooks.py     test_hooks.py-20070628030849-89rtsbe5dmer5npz-1
      bzrlib/tests/test_knit.py      test_knit.py-20051212171302-95d4c00dd5f11f2b
      bzrlib/tests/test_lockable_files.py test_lockable_files.py-20051225183927-365c7fd99591caf1
      bzrlib/tests/test_osutils.py   test_osutils.py-20051201224856-e48ee24c12182989
      bzrlib/tests/test_reconcile.py test_reconcile.py-20060225054842-50aa618584a86f26
      bzrlib/tests/test_reconfigure.py test_reconfigure.py-20070908040425-6ykgo7escxhyrg9p-2
      bzrlib/tests/test_remote.py    test_remote.py-20060720103555-yeeg2x51vn0rbtdp-2
      bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
      bzrlib/tests/test_selftest.py  test_selftest.py-20051202044319-c110a115d8c0456a
      bzrlib/tests/test_smart.py     test_smart.py-20061122024551-ol0l0o0oofsu9b3t-2
      bzrlib/tests/test_smart_transport.py test_ssh_transport.py-20060608202016-c25gvf1ob7ypbus6-2
      bzrlib/tests/test_store.py     teststore.py-20050826022702-f6caadb647395769
      bzrlib/tests/transport_util.py transportutil.py-20070525113600-5v2igk89s8fensom-1
      bzrlib/tests/tree_implementations/__init__.py __init__.py-20060717075546-420s7b0bj9hzeowi-2
      bzrlib/tests/tree_implementations/test_inv.py test_inv.py-20070312023226-0cdvk5uwhutis9vg-1
      bzrlib/tests/tree_implementations/test_test_trees.py test_tree_trees.py-20060720091921-3nwi5h21lf06vf5p-1
      bzrlib/tests/workingtree_implementations/test_basis_inventory.py test_basis_inventory.py-20051218151655-3650468941091309
      bzrlib/tests/workingtree_implementations/test_commit.py test_commit.py-20060421013633-1610ec2331c8190f
      bzrlib/transform.py            transform.py-20060105172343-dd99e54394d91687
      bzrlib/transport/ftp.py        ftp.py-20051116161804-58dc9506548c2a53
      bzrlib/tree.py                 tree.py-20050309040759-9d5f2496be663e77
      bzrlib/workingtree.py          workingtree.py-20050511021032-29b6ec0a681e02e3
      bzrlib/workingtree_4.py        workingtree_4.py-20070208044105-5fgpc5j3ljlh5q6c-1
      doc/default.css                default.css-20060622101119-tgwtdci8z769bjb9-1
      doc/developers/HACKING.txt     HACKING-20050805200004-2a5dc975d870f78c
      doc/developers/network-protocol.txt networkprotocol.txt-20070903044232-woustorrjbmg5zol-1
      doc/en/user-guide/controlling_registration.txt controlling_registra-20071121073725-0corxykv5irjal00-3
      doc/en/user-guide/hooks.txt    hooks.txt-20070829200551-7nr6e5a1io6x78uf-1
      doc/en/user-guide/sending_changes.txt sending_changes.txt-20071123154453-dk2mjhrg1vpjm5w2-4
    ------------------------------------------------------------
    revno: 3377.3.42.1.1
    revision-id:john at arbash-meinel.com-20080501224934-3i8kxtuyr9r21711
    parent: john at arbash-meinel.com-20080501204100-r9pw4d1znr8g728x
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: find_unique_ancestors
    timestamp: Thu 2008-05-01 17:49:34 -0500
    message:
      Work on a way to collapse the searchers earlier.
    modified:
      bzrlib/graph.py                graph_walker.py-20070525030359-y852guab65d4wtn0-1
      bzrlib/tests/test_graph.py     test_graph_walker.py-20070525030405-enq4r60hhi9xrujc-1
=== modified file 'bzrlib/graph.py'
--- a/bzrlib/graph.py	2008-05-08 04:41:46 +0000
+++ b/bzrlib/graph.py	2008-05-21 05:46:23 +0000
@@ -14,6 +14,8 @@
 # along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 
+import time
+
 from bzrlib import (
     debug,
     errors,
@@ -24,6 +26,8 @@
     )
 from bzrlib.deprecated_graph import (node_distances, select_farthest)
 
+STEP_UNIQUE_SEARCHER_EVERY = 5
+
 # DIAGRAM of terminology
 #       A
 #       /\
@@ -236,19 +240,49 @@
         #    information you have so far.
         # 5) Continue searching, stopping the common searches when the search
         #    tip is an ancestor of all unique nodes.
-        # 6) Search is done when all common searchers have completed.
-
+        # 6) Aggregate together unique searchers when they are searching the
+        #    same tips. When all unique searchers are searching the same node,
+        #    stop move it to a single 'all_unique_searcher'.
+        # 7) The 'all_unique_searcher' represents the very 'tip' of searching.
+        #    Most of the time this produces very little important information.
+        #    So don't step it as quickly as the other searchers.
+        # 8) Search is done when all common searchers have completed.
+
+        unique_searcher, common_searcher = self._find_initial_unique_nodes(
+            [unique_revision], common_revisions)
+
+        unique_nodes = unique_searcher.seen.difference(common_searcher.seen)
+        if not unique_nodes:
+            return unique_nodes
+
+        (all_unique_searcher,
+         unique_tip_searchers) = self._make_unique_searchers(unique_nodes,
+                                    unique_searcher, common_searcher)
+
+        self._refine_unique_nodes(unique_searcher, all_unique_searcher,
+                                  unique_tip_searchers, common_searcher)
+        true_unique_nodes = unique_nodes.difference(common_searcher.seen)
         if 'graph' in debug.debug_flags:
-            _mutter = trace.mutter
-        else:
-            def _mutter(*args, **kwargs):
-                pass
-
-        unique_searcher = self._make_breadth_first_searcher([unique_revision])
-        # we know that unique_revision isn't in common_revisions
+            trace.mutter('Found %d truly unique nodes out of %d',
+                         len(true_unique_nodes), len(unique_nodes))
+        return true_unique_nodes
+
+    def _find_initial_unique_nodes(self, unique_revisions, common_revisions):
+        """Steps 1-3 of find_unique_ancestors.
+
+        Find the maximal set of unique nodes. Some of these might actually
+        still be common, but we are sure that there are no other unique nodes.
+
+        :return: (unique_searcher, common_searcher)
+        """
+
+        unique_searcher = self._make_breadth_first_searcher(unique_revisions)
+        # we know that unique_revisions aren't in common_revisions, so skip
+        # past them.
         unique_searcher.next()
         common_searcher = self._make_breadth_first_searcher(common_revisions)
 
+        # As long as we are still finding unique nodes, keep searching
         while unique_searcher._next_query:
             next_unique_nodes = set(unique_searcher.step())
             next_common_nodes = set(common_searcher.step())
@@ -262,120 +296,226 @@
             if unique_are_common_nodes:
                 ancestors = unique_searcher.find_seen_ancestors(
                                 unique_are_common_nodes)
+                # TODO: This is a bit overboard, we only really care about
+                #       the ancestors of the tips because the rest we
+                #       already know. This is *correct* but causes us to
+                #       search too much ancestry.
                 ancestors.update(common_searcher.find_seen_ancestors(ancestors))
                 unique_searcher.stop_searching_any(ancestors)
                 common_searcher.start_searching(ancestors)
 
-        unique_nodes = unique_searcher.seen.difference(common_searcher.seen)
-        if not unique_nodes:
-            return unique_nodes
+        return unique_searcher, common_searcher
+
+    def _make_unique_searchers(self, unique_nodes, unique_searcher,
+                               common_searcher):
+        """Create a searcher for all the unique search tips (step 4).
+
+        As a side effect, the common_searcher will stop searching any nodes
+        that are ancestors of the unique searcher tips.
+
+        :return: (all_unique_searcher, unique_tip_searchers)
+        """
         unique_tips = self._remove_simple_descendants(unique_nodes,
                         self.get_parent_map(unique_nodes))
 
         if len(unique_tips) == 1:
-            unique_searchers = []
+            unique_tip_searchers = []
             ancestor_all_unique = unique_searcher.find_seen_ancestors(unique_tips)
         else:
-            unique_searchers = []
+            unique_tip_searchers = []
             for tip in unique_tips:
                 revs_to_search = unique_searcher.find_seen_ancestors([tip])
+                revs_to_search.update(
+                    common_searcher.find_seen_ancestors(revs_to_search))
                 searcher = self._make_breadth_first_searcher(revs_to_search)
                 # We don't care about the starting nodes.
                 searcher._label = tip
                 searcher.step()
-                unique_searchers.append(searcher)
+                unique_tip_searchers.append(searcher)
 
             ancestor_all_unique = None
-            for searcher in unique_searchers:
+            for searcher in unique_tip_searchers:
                 if ancestor_all_unique is None:
                     ancestor_all_unique = set(searcher.seen)
                 else:
                     ancestor_all_unique = ancestor_all_unique.intersection(
                                                 searcher.seen)
         # Collapse all the common nodes into a single searcher
-        all_unique_searcher = self._make_breadth_first_searcher(ancestor_all_unique)
+        all_unique_searcher = self._make_breadth_first_searcher(
+                                ancestor_all_unique)
         if ancestor_all_unique:
+            # We've seen these nodes in all the searchers, so we'll just go to
+            # the next
             all_unique_searcher.step()
 
             # Stop any search tips that are already known as ancestors of the
             # unique nodes
-            common_searcher.stop_searching_any(
+            stopped_common = common_searcher.stop_searching_any(
                 common_searcher.find_seen_ancestors(ancestor_all_unique))
 
             total_stopped = 0
-            for searcher in unique_searchers:
+            for searcher in unique_tip_searchers:
                 total_stopped += len(searcher.stop_searching_any(
                     searcher.find_seen_ancestors(ancestor_all_unique)))
-            _mutter('For %s unique nodes, created %s + 1 unique searchers'
-                    ' (%s stopped search tips, %s common ancestors)',
-                    len(unique_nodes), len(unique_searchers), total_stopped,
-                    len(ancestor_all_unique))
-            del ancestor_all_unique
-
+        if 'graph' in debug.debug_flags:
+            trace.mutter('For %d unique nodes, created %d + 1 unique searchers'
+                         ' (%d stopped search tips, %d common ancestors'
+                         ' (%d stopped common)',
+                         len(unique_nodes), len(unique_tip_searchers),
+                         total_stopped, len(ancestor_all_unique),
+                         len(stopped_common))
+        return all_unique_searcher, unique_tip_searchers
+
+    def _step_unique_and_common_searchers(self, common_searcher,
+                                          unique_tip_searchers,
+                                          unique_searcher):
+        """Step all the searchers"""
+        newly_seen_common = set(common_searcher.step())
+        newly_seen_unique = set()
+        for searcher in unique_tip_searchers:
+            next = set(searcher.step())
+            next.update(unique_searcher.find_seen_ancestors(next))
+            next.update(common_searcher.find_seen_ancestors(next))
+            for alt_searcher in unique_tip_searchers:
+                if alt_searcher is searcher:
+                    continue
+                next.update(alt_searcher.find_seen_ancestors(next))
+            searcher.start_searching(next)
+            newly_seen_unique.update(next)
+        return newly_seen_common, newly_seen_unique
+
+    def _find_nodes_common_to_all_unique(self, unique_tip_searchers,
+                                         all_unique_searcher,
+                                         newly_seen_unique, step_all_unique):
+        """Find nodes that are common to all unique_tip_searchers.
+
+        If it is time, step the all_unique_searcher, and add its nodes to the
+        result.
+        """
+        common_to_all_unique_nodes = newly_seen_unique.copy()
+        for searcher in unique_tip_searchers:
+            common_to_all_unique_nodes.intersection_update(searcher.seen)
+        common_to_all_unique_nodes.intersection_update(
+                                    all_unique_searcher.seen)
+        # Step all-unique less frequently than the other searchers.
+        # In the common case, we don't need to spider out far here, so
+        # avoid doing extra work.
+        if step_all_unique:
+            tstart = time.clock()
+            nodes = all_unique_searcher.step()
+            common_to_all_unique_nodes.update(nodes)
+            if 'graph' in debug.debug_flags:
+                tdelta = time.clock() - tstart
+                trace.mutter('all_unique_searcher step() took %.3fs'
+                             'for %d nodes (%d total), iteration: %s',
+                             tdelta, len(nodes), len(all_unique_searcher.seen),
+                             all_unique_searcher._iterations)
+        return common_to_all_unique_nodes
+
+    def _collapse_unique_searchers(self, unique_tip_searchers,
+                                   common_to_all_unique_nodes):
+        """Combine searchers that are searching the same tips.
+
+        When two searchers are searching the same tips, we can stop one of the
+        searchers. We also know that the maximal set of common ancestors is the
+        intersection of the two original searchers.
+
+        :return: A list of searchers that are searching unique nodes.
+        """
+        # Filter out searchers that don't actually search different
+        # nodes. We already have the ancestry intersection for them
+        unique_search_tips = {}
+        for searcher in unique_tip_searchers:
+            stopped = searcher.stop_searching_any(common_to_all_unique_nodes)
+            will_search_set = frozenset(searcher._next_query)
+            if not will_search_set:
+                if 'graph' in debug.debug_flags:
+                    trace.mutter('Unique searcher %s was stopped.'
+                                 ' (%s iterations) %d nodes stopped',
+                                 searcher._label,
+                                 searcher._iterations,
+                                 len(stopped))
+            elif will_search_set not in unique_search_tips:
+                # This searcher is searching a unique set of nodes, let it
+                unique_search_tips[will_search_set] = [searcher]
+            else:
+                unique_search_tips[will_search_set].append(searcher)
+        # TODO: it might be possible to collapse searchers faster when they
+        #       only have *some* search tips in common.
+        next_unique_searchers = []
+        for searchers in unique_search_tips.itervalues():
+            if len(searchers) == 1:
+                # Searching unique tips, go for it
+                next_unique_searchers.append(searchers[0])
+            else:
+                # These searchers have started searching the same tips, we
+                # don't need them to cover the same ground. The
+                # intersection of their ancestry won't change, so create a
+                # new searcher, combining their histories.
+                next_searcher = searchers[0]
+                for searcher in searchers[1:]:
+                    next_searcher.seen.intersection_update(searcher.seen)
+                if 'graph' in debug.debug_flags:
+                    trace.mutter('Combining %d searchers into a single'
+                                 ' searcher searching %d nodes with'
+                                 ' %d ancestry',
+                                 len(searchers),
+                                 len(next_searcher._next_query),
+                                 len(next_searcher.seen))
+                next_unique_searchers.append(next_searcher)
+        return next_unique_searchers
+
+    def _refine_unique_nodes(self, unique_searcher, all_unique_searcher,
+                             unique_tip_searchers, common_searcher):
+        """Steps 5-8 of find_unique_ancestors.
+        
+        This function returns when common_searcher has stopped searching for
+        more nodes.
+        """
+        # We step the ancestor_all_unique searcher only every
+        # STEP_UNIQUE_SEARCHER_EVERY steps.
+        step_all_unique_counter = 0
         # While we still have common nodes to search
         while common_searcher._next_query:
-            newly_seen_common = set(common_searcher.step())
-            newly_seen_unique = set()
-            for searcher in unique_searchers:
-                newly_seen_unique.update(searcher.step())
+            (newly_seen_common,
+             newly_seen_unique) = self._step_unique_and_common_searchers(
+                common_searcher, unique_tip_searchers, unique_searcher)
             # These nodes are common ancestors of all unique nodes
-            unique_are_common_nodes = newly_seen_unique.copy()
-            for searcher in unique_searchers:
-                unique_are_common_nodes = unique_are_common_nodes.intersection(
-                                            searcher.seen)
-            unique_are_common_nodes = unique_are_common_nodes.intersection(
-                                        all_unique_searcher.seen)
-            unique_are_common_nodes.update(all_unique_searcher.step())
+            common_to_all_unique_nodes = self._find_nodes_common_to_all_unique(
+                unique_tip_searchers, all_unique_searcher, newly_seen_unique,
+                step_all_unique_counter==0)
+            step_all_unique_counter = ((step_all_unique_counter + 1)
+                                       % STEP_UNIQUE_SEARCHER_EVERY)
+
             if newly_seen_common:
                 # If a 'common' node is an ancestor of all unique searchers, we
                 # can stop searching it.
                 common_searcher.stop_searching_any(
                     all_unique_searcher.seen.intersection(newly_seen_common))
-            if unique_are_common_nodes:
-                # We have new common-to-all-unique-searchers nodes
-                for searcher in unique_searchers:
-                    unique_are_common_nodes.update(
-                        searcher.find_seen_ancestors(unique_are_common_nodes))
-                unique_are_common_nodes.update(
-                    all_unique_searcher.find_seen_ancestors(unique_are_common_nodes))
-                # Since these are common, we can grab another set of ancestors
-                # that we have seen
-                unique_are_common_nodes.update(
-                    common_searcher.find_seen_ancestors(unique_are_common_nodes))
-
+            if common_to_all_unique_nodes:
+                common_to_all_unique_nodes.update(
+                    common_searcher.find_seen_ancestors(
+                        common_to_all_unique_nodes))
                 # The all_unique searcher can start searching the common nodes
                 # but everyone else can stop.
-                all_unique_searcher.start_searching(unique_are_common_nodes)
-                common_searcher.stop_searching_any(unique_are_common_nodes)
+                # This is the sort of thing where we would like to not have it
+                # start_searching all of the nodes, but only mark all of them
+                # as seen, and have it search only the actual tips. Otherwise
+                # it is another get_parent_map() traversal for it to figure out
+                # what we already should know.
+                all_unique_searcher.start_searching(common_to_all_unique_nodes)
+                common_searcher.stop_searching_any(common_to_all_unique_nodes)
 
-                # Filter out searchers that don't actually search different
-                # nodes. We already have the ancestry intersection for them
-                next_unique_searchers = []
-                unique_search_sets = set()
-                for searcher in unique_searchers:
-                    stopped = searcher.stop_searching_any(unique_are_common_nodes)
-                    will_search_set = frozenset(searcher._next_query)
-                    if not will_search_set:
-                        _mutter('Unique searcher %s was stopped.'
-                                ' (%s iterations) %d nodes stopped',
-                                searcher._label,
-                                searcher._iterations,
-                                len(stopped))
-                    elif will_search_set not in unique_search_sets:
-                        # This searcher is searching a unique set of nodes, let it
-                        unique_search_sets.add(will_search_set)
-                        next_unique_searchers.append(searcher)
-                    else:
-                        _mutter('Unique searcher %s stopped for repeated'
-                                ' search of %s nodes', 
-                                searcher._label, len(will_search_set))
-                if len(unique_searchers) != len(next_unique_searchers):
-                    _mutter('Collapsed %s unique searchers => %s'
-                            ' at %s iterations',
-                            len(unique_searchers), len(next_unique_searchers),
-                            all_unique_searcher._iterations)
-                unique_searchers = next_unique_searchers
-        return unique_nodes.difference(common_searcher.seen)
+            next_unique_searchers = self._collapse_unique_searchers(
+                unique_tip_searchers, common_to_all_unique_nodes)
+            if len(unique_tip_searchers) != len(next_unique_searchers):
+                if 'graph' in debug.debug_flags:
+                    trace.mutter('Collapsed %d unique searchers => %d'
+                                 ' at %s iterations',
+                                 len(unique_tip_searchers),
+                                 len(next_unique_searchers),
+                                 all_unique_searcher._iterations)
+            unique_tip_searchers = next_unique_searchers
 
     @symbol_versioning.deprecated_method(symbol_versioning.one_one)
     def get_parents(self, revisions):
@@ -1047,7 +1187,16 @@
         return self
 
     def find_seen_ancestors(self, revisions):
-        """Find ancestors of these revisions that have already been seen."""
+        """Find ancestors of these revisions that have already been seen.
+        
+        This function generally makes the assumption that querying for the
+        parents of a node that has already been queried is reasonably cheap.
+        (eg, not a round trip to a remote host).
+        """
+        # TODO: Often we might ask one searcher for its seen ancestors, and
+        #       then ask another searcher the same question. This can result in
+        #       searching the same revisions repeatedly if the two searchers
+        #       have a lot of overlap.
         all_seen = self.seen
         pending = set(revisions).intersection(all_seen)
         seen_ancestors = set(pending)
@@ -1082,6 +1231,9 @@
         None of the specified revisions are required to be present in the
         search list.  In this case, the call is a no-op.
         """
+        # TODO: does this help performance?
+        # if not revisions:
+        #     return set()
         revisions = frozenset(revisions)
         if self._returning == 'next':
             stopped = self._next_query.intersection(revisions)

=== modified file 'bzrlib/tests/test_graph.py'
--- a/bzrlib/tests/test_graph.py	2008-05-05 20:29:06 +0000
+++ b/bzrlib/tests/test_graph.py	2008-05-06 20:54:31 +0000
@@ -283,11 +283,11 @@
 #     |/
 #     j
 #
-# y is found to be common right away, but is the start of a long series of
+# x is found to be common right away, but is the start of a long series of
 # common commits.
 # o is actually common, but the i-j shortcut makes it look like it is actually
-# unique to j at first, you have to traverse all of y->o to find it.
-# q,n give the walker from j a common point to stop searching, as does p,f.
+# unique to j at first, you have to traverse all of x->o to find it.
+# q,m gives the walker from j a common point to stop searching, as does p,f.
 # k-n exists so that the second pass still has nodes that are worth searching,
 # rather than instantly cancelling the extra walker.
 




More information about the bazaar-commits mailing list