history-db, now speeding up 'bzr log'
Vincent Ladeuil
v.ladeuil+lp at free.fr
Tue Apr 13 08:10:14 BST 2010
>>>>> "jam" == John Arbash Meinel <john at arbash-meinel.com> writes:
jam> So I've finally gotten to the point with bzr-history-db that I
jam> hook into bzrlib internals, and can see an actual effect on
jam> commands. lp:~jameinel/+junk/bzr-history-db
Great !
<snip/>
jam> At this point, I now hook into:
jam> Branch._do_dotted_revno_to_revision_id
jam> Branch._do_revision_id_to_dotted_revno
jam> Branch.iter_merge_sorted_revisions
jam> Branch.hooks['post_change_branch_tip']
It would be nice to use real hooks here for all the features the plugin
provide. That would ensure we provide the right hooks too :)
jam> The plugin is currently quite verbose, though we can easily
jam> decrease this by commenting out 'trace.note()' calls, or
jam> turning them into mutter().
Or a config variable with verbosity level ?
jam> Results:
jam> 1) We could probably get rid of a bunch of the special casing logic in
jam> 'log.py' that was trying to avoid loading the merge_sorted graph.
Amen to that.
Yet, I consider the experiment to still be in progress and I'd like some
clarifications first:
- how big is the database so far (and how easy is it to set it up so
that it's shared between branches, I know you thought about it, I'd
like a summary as of today),
- can you backport some or all of your work to core already (you
mentioned an order of magnitude optimisation in a previous mail, is
the plugin strictly required for that and if not what is missing in
core),
jam> 2) 'time bzr log -n0 -r -10..-1 bzr.dev' is
jam> 1.210s => 0.771s
jam> 3) 'time bzr log -n0 -r -10..-1 mysql-6.0' is
jam> 2.312s => 0.959s
jam> 4) 'time bzr log -n0 -r -1000..-990 mysql-6.0' is
jam> 2.552s => 1.243s
jam> In looking at the verbose commentary, you can also see this:
jam> history_db rev=>dotted took 0.032s, 0.026s to init, 0.006s to query
jam> history_db iter_merge took 0.050s (0.050s query, 0.000s filter)
jam> Which means that iter_merge + dotted_revno lookups are really only
jam> about 80ms out of that 1200ms overall time.
jam> 5) 'time bzr log -n0 -r -100..-1 emacs'
jam> 3.890s => 0.784s
jam> Note, however that:
jam> 'time bzr log -n0 -r -10..-1 emacs'
jam> 0.614s => 0.769s
jam> However, this seems to be an artifact of using --no-plugins. 'bzr
jam> rocks' on my machine is 150ms faster with --no-plugins.
All these results sound very promising. I'd like to correlate them with
the space costs though.
jam> From what I can tell, the big time lost is now that 'cmd_log'
jam> uses 'in_history(b)' rather than 'as_revision_id(b)'. Which
jam> means that Branch.revision_history() always gets called.
Chapter and verse ? Is this fixable ?
jam> I can also say that the impact on other operations for updating
jam> the history-db cache is pretty minimal. Usually fully accounted
jam> for by other operations that we are doing.
Not a big surprise but it's good to confirm.
jam> As an example, merging bzr.dev into an old branch, and then
jam> doing 'commit' takes 15s but only 0.400ms to import. So about
jam> 2.6% overhead. Now, doing the same a heavyweight checkout
jam> seems to take >3s. My guess is that the import is being
jam> triggered on the remote branch, rather than the local one
jam> (because its branch tip is changing first). I don't have a
jam> great answer for that yet.
jam> 'bzr push' is pretty fast. It adds 1 round-trip to check the remote
jam> branch.conf to figure out if the db is enabled, and then sees that the
jam> revisions are already imported.
jam> Anyway, with the current results, I'm pretty sure I've proven
jam> 'proof-of-concept'.
More information about the bazaar
mailing list