Rev 103: The basics seem to be working. in http://bzr.arbash-meinel.com/branches/bzr/history_db/tip_numbering

John Arbash Meinel john at arbash-meinel.com
Fri Apr 16 20:51:56 BST 2010


At http://bzr.arbash-meinel.com/branches/bzr/history_db/tip_numbering

------------------------------------------------------------
revno: 103
revision-id: john at arbash-meinel.com-20100416195133-sbxs1wvmv2afojc4
parent: john at arbash-meinel.com-20100416192741-ahq70lbhz72cxtnt
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: tip_numbering
timestamp: Fri 2010-04-16 14:51:33 -0500
message:
  The basics seem to be working.
  
  We don't need to track the count-per-branch because we are only ever
  working on a single branch tip. So we can keep a single counter that
  we reset.
  
  It is currently quite a bit slower (10m for bzr.dev, vs 2-3min).
  I don't know if it is the 'interesting children' check which I'm missing,
  or if I have some bit of logic subtley wrong.
  
  Certainly we are stepping the mainline far more often now. Oh, and we do this
  one-by-one rather than all search tips at a time. Which might play a role.
  
  Stats:
  {'_insert_node_calls': 30623,
   'is interesting': 6807687,
   'nodes_expanded': 4734,
   'not interesting imported': 399,
   'not interesting known imported': 3619397,
   'pushed': 6838308,
   'ranges_inserted': 52,
   'revs_in_ranges': 5159,
   'step mainline': 6204334,
   'step mainline added': 33087243,
   'step mainline cache missed': 280,
   'step mainline cached': 6204054,
   'step mainline unknown': 6204334,
   'total_nodes_inserted': 6834448}
  
  note that we have no 'not interesting is mainline' which doesn't seem right.
  We also have a lot of 'not interesting known imported' vs the 'not interesting imported'.
  Perhaps when we walk the first merged rev, we (on average) walk back far enough that
  we can determine the branched-from point, which is now 'known imported'.
  
  I take that back, something is definitely wrong. This says there are 6.8M now-dotted
  revisions, and the db is 478MB in size. The number of merged revs should be the same
  (880k), so I need to find the 'leak'.
-------------- next part --------------
=== modified file 'history_db.py'
--- a/history_db.py	2010-04-16 19:27:41 +0000
+++ b/history_db.py	2010-04-16 19:51:33 +0000
@@ -588,9 +588,7 @@
 
         # Revisions that we are walking, to see if they are interesting, or
         # already imported.
-        self._search_tips = None
-        # mainline revno => number of child branches
-        self._revno_to_branch_count = {}
+        ## self._search_tips = None
 
         self._depth_first_stack = None
         self._scheduled_stack = None
@@ -672,7 +670,7 @@
             (tip_db_id,)).fetchone()
         if res is None:
             return None
-        return res[0]
+        return int(res[0])
 
     def DONT_split_search_tips_by_gdfo(self, unknown):
         """For these search tips, mark ones 'interesting' based on gdfo.
@@ -1036,6 +1034,7 @@
             return
 
         expected_base_revno = None
+        branch_count = 0
         expected_branch_count = 0
         while self._depth_first_stack:
             last = self._depth_first_stack[-1]
@@ -1049,7 +1048,7 @@
                     continue
                 if last.merge_depth == 0:
                     expected_base_revno = last._base_revno
-                    expected_branch_count = 0
+                    branch_count = 0
                 base_revno = last._base_revno
                 assert expected_base_revno == base_revno
                 if next_db_id == last._left_parent: #Is the left-parent?
@@ -1057,11 +1056,11 @@
                     next_branch_num = last._branch_num
                 else:
                     next_merge_depth = last.merge_depth + 1
-                    next_branch_num = self._revno_to_branch_count.get(base_revno, 0) + 1
-                    self._revno_to_branch_count[base_revno] = next_branch_num
-                    expected_branch_count += 1
-                    assert next_branch_num == expected_branch_count
-                self._push_node(next_db_id, base_revno, next_branch_num,
+                    #next_branch_num = self._revno_to_branch_count.get(base_revno, 0) + 1
+                    #self._revno_to_branch_count[base_revno] = next_branch_num
+                    branch_count += 1
+                    #assert next_branch_num == expected_branch_count
+                self._push_node(next_db_id, base_revno, branch_count,
                                 next_merge_depth)
                 # And switch to the outer loop
                 break



More information about the bazaar-commits mailing list