Speedup with history-db (was: Performance improvements for bzr-2.4 on large trees)

Eli Zaretskii eliz at gnu.org
Fri May 27 09:23:45 UTC 2011


> Date: Fri, 27 May 2011 09:56:12 +0200
> From: John Arbash Meinel <john at arbash-meinel.com>
> CC: bazaar at lists.canonical.com
> 
> >> You can use "bzr config --scope=???" to change where things get set. I
> >> don't know bzr config very well. I just edit bazaar.conf if I want it to
> >> apply to everything, and locations.conf if I want it to apply only in a
> >> directory.
> > 
> > I have a shared repo under which I have all the Emacs branches I care
> > about.  Can I use locations.conf to define a single history_db_path
> > for all of these branches, or do I need to modify branch.conf for each
> > such branch?  IOW, can locations.conf be used to make the DB affect
> > _less_ than a whole branch, or also to affect _more_ than just one
> > branch?
> 
> I don't know what you mean by "less than a whole branch". In
> locations.conf if you set a policy like:
> 
> [C:/path/to/my/repo]
> history_db_path = C:/path/to/my/repo/history.db
> 
> Then all branches at  C:/path/to/my/repo/branch, etc will use the same
> historydb file.
> 
> I'm pretty sure the answer is "yes", but I'm a bit confused by the question.

It's probably the result of my own confusion over what you originally
wrote (above): "edit bazaar.conf if I want it to apply to everything,
and locations.conf if I want it to apply only in a directory".  The
"only in a directory" part made me think that you mean applying only
to a single subdirectory of a branch.

Anyway, I used locations.conf to define a DB for the entire Emacs
repo, then I used "bzr history-db-create" in every branch under the
repo.  The first one took about 45 seconds, the rest about 18 sec.

> I certainly recommend sharing history_db for any branch that shares
> history. Updating just the newest revisions for a branch usually takes
> on the order of 100ms or so, while importing the whole history takes
> 15-30s.

On this machine (Windows XP with a single hyper-threaded core at 3GHz)
updating about 20 newest revisions took 400ms, as this excerpt from
.bzr.log shows:

  history_db post-change-hook took 0.403s (0.002s to get_config, 0.063s to init, 0.337s to import)

> Not to mention the size of the history.db file itself.

It's 37MB for now, which isn't too much for such a large project with
such a long history.

A few comments:

  . "bzr history-db-create" displays some strangely formatted
    (probably simply unformatted ;-) statistics when it finishes:

    {'_insert_node_calls': 104366,
     'ranges_inserted': 1044,
     'revs_in_ranges': 104366,
     'total_nodes_inserted': 115077}

     This should be formatted for better human consumption, and
     probably moved to an optional --verbose option.  The same goes to
     what bzr writes to .bzr.log, including from the hooks installed
     by bzr-history-db, e.g.:

      22.359  Stats:
      {'_insert_node_calls': 18,
       'num_search_tips': 6,
       'pushed': 21,
       'ranges_inserted': 1,
       'revs_in_ranges': 84,
       'split already imported': 1,
       'split child imported': 2,
       'split children interesting': 3,
       'split parent imported': 2,
       'step mainline': 27,
       'step mainline added': 38,
       'step mainline cache missed': 27,
       'step mainline initial': 1,
       'step mainline unknown': 26,
       'step search tips': 4,
       'total_nodes_inserted': 21}

  . Most of the commands I originally posted are indeed greatly sped
    up now.  E.g.:

    D:\gnu\bzr\emacs\trunk>timep bzr st -c99634.12.18 >nul

    real    00h00m00.828s
    user    00h00m00.515s
    sys     00h00m00.218s

    (it took 10s without history-db).

    D:\gnu\bzr\emacs\trunk>timep bzr log -l100 -n0 >nul

    real    00h00m03.765s
    user    00h00m03.203s
    sys     00h00m00.546s

    (takes 9.6s without history-db).

    D:\gnu\bzr\emacs\trunk>timep bzr log -r101290.1.25 >nul

    real    00h00m00.843s
    user    00h00m00.453s
    sys     00h00m00.312s

    (takes 10.4s without history-db).

  . However, this command was a disappointment:

    D:\gnu\bzr\emacs\trunk>timep bzr log --include-merges -c104363 >nul

    real    00h00m47.453s
    user    00h00m45.125s
    sys     00h00m02.125s

    It takes 10.3s without history-db, so there's a 4.5 times
    slowdown with the DB.  Why is that?

  . The output of "bzr log -n0 -r101290.1.25..101290.1.32" is
    different with and without the plugin.  I can send you the two
    outputs, but you can easily create them yourself (with the current
    Emacs trunk).  I actually don't understand well enough the
    semantics of this command, as it displays much more revisions than
    I'd expect (or maybe there's an unrelated bug in bzr), but the
    differences introduced by the plugin are disturbing anyway.

  . Finally, could you please tell what does the --expand-all option
    to history-db-create do?

Thanks.



More information about the bazaar mailing list