Caching 'dotted_revnos' and merge_sort
John Arbash Meinel
john at arbash-meinel.com
Sun Apr 4 17:48:38 BST 2010
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
A few more updates. Mostly, I added an 'expand-all' flag, which treats
every revision as its own branch, and generates the dotted revno data.
It is fairly smart about it, walking from recent to older, and knowing
that a revision will share the mainline with older revs, so skips, etc.
For bzr.dev, it turns out there are ~4,706 unique 'mainlines' (aka
branches) in the history (out of ~30,496 revisions).
It ends up taking ~3.5minutes to import all of them (vs the 5s to just
import bzr.dev). However that averages out to just 40ms per branch. My
guess is that the bulk of that is 'merge_sort'. Which 40ms is pretty
fast, but it is still an O(history) operation.
Running this expands the db from having 30k entries in 'dotted_revnos'
to having 821k entries. Which is an average of around 168 revs per
branch. Which is more than I originally guessed, but still fits the "far
less than O(history)". My guess is that the bulk of that 168 is times
when we had to merge bzr.dev => feature_branch to resolve conflicts
(like NEWS).
It also expands the size on disk a fair amount, from ~8MB => 52MB. I
haven't really worked out its effect on speed, as I think it will
originally be a bit biased. Specifically, the original bzr.dev history
is all imported first, so it should all be nicely clustered in the
table. And it is going to be feature branch 2000 that is scattered.
I also ran the same data against my mysql-6.0 branch. I think Vincent
will be a bit shocked at the result (I was):
1) Just mysql-6.0:
68.8k revisions
2.8k mainline
11.2s import time
20MB on disk
2) mysql-6.0 --expand-all
661k dotted_revno entries
17k branches
32m import time
54m on disk
The really interesting thing here, is that while there are *far* more
branches and total revisions, the average number of new revisions per
branch is actually quite a bit smaller, an average of just 34 revs per
branch.
Note also that for the queries I've generated, mysql also sees similar
orders-of-magnitude speedups.
get_dotted_revno -r -1
5ms across the board
get_dotted_revno -r -110
29ms bzr
18ms db-db-id
6ms db-range
get_dotted_revno -r 2497.329.15 (in the area of -r -100)
2.434s bzr
0.020s db-db-id
0.007s db-range
get_dotted_revno -r 1700.338.1 (in the range of -r -1000)
2.421s bzr
0.108s db-db-id
0.013s db-range
walk_ancestry
2.043s bzr-Graph().iter-ancestry
1.354s bzr-get_known_graph_ancestry()
4.260s db-rev-id (intersects with the 'revision' table, walks-by-one)
0.353s db-db-id (walks by 100, only uses db-ids)
0.345s db-range (finds mainline, walks by 100 others)
0.087s db-dotted (intersects mainline_range with
dotted_revno.merged_revision)
John
=:->
John Arbash Meinel wrote:
> I've been toying around with some ideas about how to make history
> management a bit better in bzr. Specifically focusing on dotted_revnos
> and some other aspects that are currently 'whole ancestry' operations.
> I've started experimenting with it by defining some SQL schemas that we
> could use, and prototyping the ideas in an sqlite database. I think most
> of this could be extended into bzr-specific btree indices if we want to
> avoid using sqlite for it.
>
> I've gotten far enough to determine that some of the tricks I had
> thought about really do seem to work. The plugin is currently available
> here:
>
> lp:~jameinel/+junk/bzr-history-db
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAku4wuYACgkQJdeBCYSNAANStQCdEgv78zp0RWy/2903DxHlbu6a
ZowAoLvHhSCxFuzXuFKgKs3RLNksVxCl
=1w7M
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list