Rev 104: Much better results. Even faster than the previous algo. in http://bzr.arbash-meinel.com/branches/bzr/history_db/tip_numbering

Fri Apr 16 21:17:34 BST 2010

At http://bzr.arbash-meinel.com/branches/bzr/history_db/tip_numbering

------------------------------------------------------------
revno: 104
revision-id: john at arbash-meinel.com-20100416201711-3hntscd7dvobd2fe
parent: john at arbash-meinel.com-20100416195133-sbxs1wvmv2afojc4
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: tip_numbering
timestamp: Fri 2010-04-16 15:17:11 -0500
message:
  Much better results. Even faster than the previous algo.
  
  Stats:
  {'_insert_node_calls': 30621,
   'gdfo hit': 3869,
   'gdfo miss': 811978,
   'is interesting': 804941,
   'nodes_expanded': 4734,
   'not interesting imported': 726,
   'not interesting is mainline': 5456,
   'not interesting known imported': 260871,
   'pushed': 835562,
   'ranges_inserted': 52,
   'revs_in_ranges': 5159,
   'step mainline': 830845,
   'step mainline added': 5002192,
   'step mainline cache missed': 280,
   'step mainline cached': 830565,
   'step mainline unknown': 830845,
   'total_nodes_inserted': 835562}
  
  real    1m44.036s
  
  We eliminate the 'always step at least once', and we also will usually
  step all we need to before we number the first entry (since that is
  going to be the oldest mainline child. Certainly not guaranteed, though).
  There is never overhead for grabbing already-merged dotted revnos.
  As expected, caching the gdfo is probably not a win, since we try hard
  to never hit the same node 2x. (When we do, it should already have been
  imported by then.)
-------------- next part --------------
=== modified file 'history_db.py'

--- a/history_db.py	2010-04-16 19:51:33 +0000
+++ b/history_db.py	2010-04-16 20:17:11 +0000
@@ -199,6 +199,7 @@
         merge_sorted = self._import_tip(self._branch_tip_rev_id)
         if not expand_all:
             return
+        import pdb; pdb.set_trace()
         # We know all the other imports are going to be incremental
         self._incremental = True
         self._stats['nodes_expanded'] += 0 # create an entry
@@ -956,15 +957,16 @@
         """
         node = self._depth_first_stack.pop()
         if node.revno is None:
-            if node._left_parent is not None:
+            last_dot = 1
+            if (node._left_parent is not None
+                and node._left_parent in self._imported_dotted_revno):
+                # If we haven't loaded the left parent, then we know we
+                # won't be numbering from it, it is outside the
+                # 'interesting' ancestry
                 parent_revno = self._imported_dotted_revno[node._left_parent][0]
                 if (parent_revno[0] == node._base_revno
                     and parent_revno[1] == node._branch_num):
                     last_dot = parent_revno[-1] + 1
-                else:
-                    last_dot = 1
-            else:
-                last_dot = 1
             node.revno = static_tuple.StaticTuple(node._base_revno,
                                                   node._branch_num, last_dot)
         if not self._scheduled_stack:
@@ -993,7 +995,9 @@
 
     def _get_gdfo(self, db_id):
         if db_id in self._known_gdfo:
+            self._stats['gdfo hit'] += 1
             return self._known_gdfo[db_id]
+        self._stats['gdfo miss'] += 1
         res = self._cursor.execute("SELECT gdfo"
                                    "  FROM revision WHERE db_id = ?",
                                    [db_id]).fetchone()
@@ -1022,9 +1026,10 @@
         if db_id in self._imported_dotted_revno:
             self._stats['not interesting imported'] +=1
             return False
-        if (gdfo == self._imported_dotted_revno
+        if (gdfo == self._imported_gdfo
             and db_id == self._imported_mainline_id):
             self._stats['not interesting is mainline'] += 1
+            return False
         self._stats['is interesting'] += 1
         return True