Speedup with history-db

Fri May 27 11:30:25 UTC 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>   . However, this command was a disappointment:
> 
>     D:\gnu\bzr\emacs\trunk>timep bzr log --include-merges -c104363 >nul
> 
>     real    00h00m47.453s
>     user    00h00m45.125s
>     sys     00h00m02.125s
> 
>     It takes 10.3s without history-db, so there's a 4.5 times
>     slowdown with the DB.  Why is that?

I don't have a specific idea here, though we can investigate. "bzr
- --lsprof-file foo.txt" can be enlightening.

If 104363 was really old, I could see bzr-history-db getting slower. For
example, would expect history-db is probably slower for bzr log -r 10.
bzr-core tends to load the whole graph very often (any time you use
- -n0), history-db is arranged so that we can incrementally load the graph
and get the same results, but sometimes loading everything is faster
than incremental. (In SQL think index scans vs full-table scans. The
heuristic there is that full-table is usually faster if you access >10%
of the table.)

> 
>   . The output of "bzr log -n0 -r101290.1.25..101290.1.32" is
>     different with and without the plugin.  I can send you the two
>     outputs, but you can easily create them yourself (with the current
>     Emacs trunk).  I actually don't understand well enough the
>     semantics of this command, as it displays much more revisions than
>     I'd expect (or maybe there's an unrelated bug in bzr), but the
>     differences introduced by the plugin are disturbing anyway.
> 

If the output itself is different, that is a bit worrisome. I don't know
who to blame on that. history-db is meant to be just caching the same
results and arranging them differently, but bugs exist.

As for the range you are giving, that seems really weird to me. Since
those aren't clearly derived from each other. (Perhaps one merged the
other?) Remember that "bzr log -r X..Y" is not show me all the changes
in Y that aren't in X, though you can use the --exclude-? parameter to
change that.

>   . Finally, could you please tell what does the --expand-all option
>     to history-db-create do?
> 
> Thanks.

History-db caches the sorted graph for a given branch tip, and numbers
the revisions. There tend to be many more branches-in-history than you
have active right now, so it defaults to only caching the graph for tips
that you actually access. --expand-all treats every revision as a tip,
and adds it to the global cache. For example:

 A
 |\
 B C
 |/
 D

- From D's point of view the graph is:

 1
 |\
 2 1.1.1
 |/
 3

But from C's point of view the graph is:

 1
  \
   2

The big thing that history-db does is notice that for any given
revision, we know that the graph is static, and we know that any
children of that revision (that preserve this revision in the lefthand
ancestry) will have the same subset of the graph. So consider:

 A
 |\
 B C
 |/|
 D E
 |/
 F

E's graph looks like
 1
  \
   2
   |
   3

While F's graph looks like:

 1
 |\
 2 1.1.1
 |/|
 3 1.1.2
 |/
 4

So if we have stored C's graph already, when we go to store "E's" graph,
we can say "the same as C, but E 3", and when we store F's graph we can
say "the same as D, but E 1.1.2, F 4"

There are also other tricks in history-db to allow you to walk ancestry
faster than one-by-one. (What are all the ancestors merged in the last
100 mainline revisions from the tip F.)

Anyway, if you have the branch of F, history-db will only store its
graph, and won't store the graphs treating C & E as tips. If you pass
- --expand-all, then we store all possible graphs.

It isn't something I would recommend that you do. I was prototyping this
as a way to change some of the postgres tables in Launchpad (which
currently store the full ancestry for every branch ever seen in
launchpad. So if there are 100 branches of emacs, that is
100*100,000=10M rows in the BranchRevision table.)

I know that history-db is fairly small if you only have a couple
branches with a lot of shared history, but I wanted to test what would
happen if you stored every-possible-branch. It turns out that real-world
data tends to be O(branches*history) still, but with much smaller
constants. (Consider long-lived branches, which continually merge trunk.
If trunk sees 50 feature branches with 2 commits, that is 100 revisions
added to the db table. If you then merge trunk back into your feature
branch, that is 100 revisions that get renumbered in your branch's
ancestry. Though merging that back to trunk will only add the small
handful of revisions that were not already present.)

The commands like "history-db-create" are not really meant as something
people have to run often. It is mostly just an analysis of the graph
logic, etc. We could probably dump everything to .bzr.log and just do
some really simple "imported XX revisions" if we wanted to pretty it up
a bit.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3fi1EACgkQJdeBCYSNAANsagCfXTKo0sQoxC87G/NRNPmuuVLg
xQAAnA7IMS1Z//QJBa8sLS1itjiGhRzX
=+MQm
-----END PGP SIGNATURE-----