Rev 104: Much better results. Even faster than the previous algo. in http://bzr.arbash-meinel.com/branches/bzr/history_db/tip_numbering
John Arbash Meinel
john at arbash-meinel.com
Fri Apr 16 21:17:34 BST 2010
At http://bzr.arbash-meinel.com/branches/bzr/history_db/tip_numbering
------------------------------------------------------------
revno: 104
revision-id: john at arbash-meinel.com-20100416201711-3hntscd7dvobd2fe
parent: john at arbash-meinel.com-20100416195133-sbxs1wvmv2afojc4
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: tip_numbering
timestamp: Fri 2010-04-16 15:17:11 -0500
message:
Much better results. Even faster than the previous algo.
Stats:
{'_insert_node_calls': 30621,
'gdfo hit': 3869,
'gdfo miss': 811978,
'is interesting': 804941,
'nodes_expanded': 4734,
'not interesting imported': 726,
'not interesting is mainline': 5456,
'not interesting known imported': 260871,
'pushed': 835562,
'ranges_inserted': 52,
'revs_in_ranges': 5159,
'step mainline': 830845,
'step mainline added': 5002192,
'step mainline cache missed': 280,
'step mainline cached': 830565,
'step mainline unknown': 830845,
'total_nodes_inserted': 835562}
real 1m44.036s
We eliminate the 'always step at least once', and we also will usually
step all we need to before we number the first entry (since that is
going to be the oldest mainline child. Certainly not guaranteed, though).
There is never overhead for grabbing already-merged dotted revnos.
As expected, caching the gdfo is probably not a win, since we try hard
to never hit the same node 2x. (When we do, it should already have been
imported by then.)
-------------- next part --------------
=== modified file 'history_db.py'
--- a/history_db.py 2010-04-16 19:51:33 +0000
+++ b/history_db.py 2010-04-16 20:17:11 +0000
@@ -199,6 +199,7 @@
merge_sorted = self._import_tip(self._branch_tip_rev_id)
if not expand_all:
return
+ import pdb; pdb.set_trace()
# We know all the other imports are going to be incremental
self._incremental = True
self._stats['nodes_expanded'] += 0 # create an entry
@@ -956,15 +957,16 @@
"""
node = self._depth_first_stack.pop()
if node.revno is None:
- if node._left_parent is not None:
+ last_dot = 1
+ if (node._left_parent is not None
+ and node._left_parent in self._imported_dotted_revno):
+ # If we haven't loaded the left parent, then we know we
+ # won't be numbering from it, it is outside the
+ # 'interesting' ancestry
parent_revno = self._imported_dotted_revno[node._left_parent][0]
if (parent_revno[0] == node._base_revno
and parent_revno[1] == node._branch_num):
last_dot = parent_revno[-1] + 1
- else:
- last_dot = 1
- else:
- last_dot = 1
node.revno = static_tuple.StaticTuple(node._base_revno,
node._branch_num, last_dot)
if not self._scheduled_stack:
@@ -993,7 +995,9 @@
def _get_gdfo(self, db_id):
if db_id in self._known_gdfo:
+ self._stats['gdfo hit'] += 1
return self._known_gdfo[db_id]
+ self._stats['gdfo miss'] += 1
res = self._cursor.execute("SELECT gdfo"
" FROM revision WHERE db_id = ?",
[db_id]).fetchone()
@@ -1022,9 +1026,10 @@
if db_id in self._imported_dotted_revno:
self._stats['not interesting imported'] +=1
return False
- if (gdfo == self._imported_dotted_revno
+ if (gdfo == self._imported_gdfo
and db_id == self._imported_mainline_id):
self._stats['not interesting is mainline'] += 1
+ return False
self._stats['is interesting'] += 1
return True
More information about the bazaar-commits
mailing list