history-db, now speeding up 'bzr log'

Vincent Ladeuil v.ladeuil+lp at free.fr
Tue Apr 13 08:10:14 BST 2010


>>>>> "jam" == John Arbash Meinel <john at arbash-meinel.com> writes:

    jam> So I've finally gotten to the point with bzr-history-db that I
    jam> hook into bzrlib internals, and can see an actual effect on
    jam> commands.  lp:~jameinel/+junk/bzr-history-db

Great !

<snip/>

    jam> At this point, I now hook into:

    jam>   Branch._do_dotted_revno_to_revision_id
    jam>   Branch._do_revision_id_to_dotted_revno
    jam>   Branch.iter_merge_sorted_revisions
    jam>   Branch.hooks['post_change_branch_tip']

It would be nice to use real hooks here for all the features the plugin
provide. That would ensure we provide the right hooks too :)

    jam> The plugin is currently quite verbose, though we can easily
    jam> decrease this by commenting out 'trace.note()' calls, or
    jam> turning them into mutter().

Or a config variable with verbosity level ?

    jam> Results:

    jam> 1) We could probably get rid of a bunch of the special casing logic in
    jam>    'log.py' that was trying to avoid loading the merge_sorted graph.

Amen to that.

Yet, I consider the experiment to still be in progress and I'd like some
clarifications first:

- how big is the database so far (and how easy is it to set it up so
  that it's shared between branches, I know you thought about it, I'd
  like a summary as of today),

- can you backport some or all of your work to core already (you
  mentioned an order of magnitude optimisation in a previous mail, is
  the plugin strictly required for that and if not what is missing in
  core),

    jam> 2) 'time bzr log -n0 -r -10..-1 bzr.dev' is
    jam>    1.210s => 0.771s

    jam> 3) 'time bzr log -n0 -r -10..-1 mysql-6.0' is
    jam>    2.312s => 0.959s

    jam> 4) 'time bzr log -n0 -r -1000..-990 mysql-6.0' is
    jam>    2.552s => 1.243s

    jam>   In looking at the verbose commentary, you can also see this:
    jam>     history_db rev=>dotted took 0.032s, 0.026s to init, 0.006s to query
    jam>     history_db iter_merge took 0.050s (0.050s query, 0.000s filter)

    jam>   Which means that iter_merge + dotted_revno lookups are really only
    jam>   about 80ms out of that 1200ms overall time.

    jam> 5) 'time bzr log -n0 -r -100..-1 emacs'
    jam>    3.890s => 0.784s

    jam>    Note, however that:
    jam>    'time bzr log -n0 -r -10..-1 emacs'
    jam>    0.614s => 0.769s

    jam>    However, this seems to be an artifact of using --no-plugins. 'bzr
    jam>    rocks' on my machine is 150ms faster with --no-plugins.

All these results sound very promising. I'd like to correlate them with
the space costs though.

    jam> From what I can tell, the big time lost is now that 'cmd_log'
    jam> uses 'in_history(b)' rather than 'as_revision_id(b)'. Which
    jam> means that Branch.revision_history() always gets called.

Chapter and verse ? Is this fixable ?

    jam> I can also say that the impact on other operations for updating
    jam> the history-db cache is pretty minimal. Usually fully accounted
    jam> for by other operations that we are doing.

Not a big surprise but it's good to confirm.

    jam> As an example, merging bzr.dev into an old branch, and then
    jam> doing 'commit' takes 15s but only 0.400ms to import. So about
    jam> 2.6% overhead.  Now, doing the same a heavyweight checkout
    jam> seems to take >3s. My guess is that the import is being
    jam> triggered on the remote branch, rather than the local one
    jam> (because its branch tip is changing first). I don't have a
    jam> great answer for that yet.

    jam> 'bzr push' is pretty fast. It adds 1 round-trip to check the remote
    jam> branch.conf to figure out if the db is enabled, and then sees that the
    jam> revisions are already imported.

    jam> Anyway, with the current results, I'm pretty sure I've proven
    jam> 'proof-of-concept'.



More information about the bazaar mailing list