history-db, now speeding up 'bzr log'

John Arbash Meinel john at arbash-meinel.com
Mon Apr 12 23:05:22 BST 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So I've finally gotten to the point with bzr-history-db that I hook into
bzrlib internals, and can see an actual effect on commands.
  lp:~jameinel/+junk/bzr-history-db

To enable this, you need to set:
  history_db_path = ?

In either branch.conf, or locations.conf, or bazaar.conf, etc.

Once you've done that, you can seed the database with:

  bzr create-history-db

(Interestingly enough, 'bzr up' also *always* fires the
post-branch-tip-changed hook, even if nothing has changed. So it would
also populate the db. Albeit with 'incremental=True' which is slightly
slower.)

At this point, I now hook into:

  Branch._do_dotted_revno_to_revision_id
  Branch._do_revision_id_to_dotted_revno
  Branch.iter_merge_sorted_revisions
  Branch.hooks['post_change_branch_tip']

The plugin is currently quite verbose, though we can easily decrease
this by commenting out 'trace.note()' calls, or turning them into mutter().

Results:

1) We could probably get rid of a bunch of the special casing logic in
   'log.py' that was trying to avoid loading the merge_sorted graph.

2) 'time bzr log -n0 -r -10..-1 bzr.dev' is
   1.210s => 0.771s

3) 'time bzr log -n0 -r -10..-1 mysql-6.0' is
   2.312s => 0.959s

4) 'time bzr log -n0 -r -1000..-990 mysql-6.0' is
   2.552s => 1.243s

  In looking at the verbose commentary, you can also see this:
    history_db rev=>dotted took 0.032s, 0.026s to init, 0.006s to query
    history_db iter_merge took 0.050s (0.050s query, 0.000s filter)

  Which means that iter_merge + dotted_revno lookups are really only
  about 80ms out of that 1200ms overall time.

5) 'time bzr log -n0 -r -100..-1 emacs'
   3.890s => 0.784s

   Note, however that:
   'time bzr log -n0 -r -10..-1 emacs'
   0.614s => 0.769s

   However, this seems to be an artifact of using --no-plugins. 'bzr
   rocks' on my machine is 150ms faster with --no-plugins.


- From what I can tell, the big time lost is now that 'cmd_log' uses
'in_history(b)' rather than 'as_revision_id(b)'. Which means that
Branch.revision_history() always gets called.

I can also say that the impact on other operations for updating the
history-db cache is pretty minimal. Usually fully accounted for by other
operations that we are doing.

As an example, merging bzr.dev into an old branch, and then doing
'commit' takes 15s but only 0.400ms to import. So about 2.6% overhead.
Now, doing the same a heavyweight checkout seems to take >3s. My guess
is that the import is being triggered on the remote branch, rather than
the local one (because its branch tip is changing first). I don't have a
great answer for that yet.

'bzr push' is pretty fast. It adds 1 round-trip to check the remote
branch.conf to figure out if the db is enabled, and then sees that the
revisions are already imported.

Anyway, with the current results, I'm pretty sure I've proven
'proof-of-concept'.

There is still some polishing and edge-case work to do. However, I think
it has at least shown what I hoped to show.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvDmSEACgkQJdeBCYSNAAMWHACeMqePOh34zXQk88m50Lo6oCO3
wzMAoKcVmr7S87kzC/eDzGdB1XU0bnwX
=hyOx
-----END PGP SIGNATURE-----



More information about the bazaar mailing list