Caching 'dotted_revnos' and merge_sort

John Arbash Meinel john at arbash-meinel.com
Sun Apr 4 17:48:38 BST 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

A few more updates. Mostly, I added an 'expand-all' flag, which treats
every revision as its own branch, and generates the dotted revno data.
It is fairly smart about it, walking from recent to older, and knowing
that a revision will share the mainline with older revs, so skips, etc.

For bzr.dev, it turns out there are ~4,706 unique 'mainlines' (aka
branches) in the history (out of ~30,496 revisions).

It ends up taking ~3.5minutes to import all of them (vs the 5s to just
import bzr.dev). However that averages out to just 40ms per branch. My
guess is that the bulk of that is 'merge_sort'. Which 40ms is pretty
fast, but it is still an O(history) operation.

Running this expands the db from having 30k entries in 'dotted_revnos'
to having 821k entries. Which is an average of around 168 revs per
branch. Which is more than I originally guessed, but still fits the "far
less than O(history)". My guess is that the bulk of that 168 is times
when we had to merge bzr.dev => feature_branch to resolve conflicts
(like NEWS).

It also expands the size on disk a fair amount, from ~8MB => 52MB. I
haven't really worked out its effect on speed, as I think it will
originally be a bit biased. Specifically, the original bzr.dev history
is all imported first, so it should all be nicely clustered in the
table. And it is going to be feature branch 2000 that is scattered.


I also ran the same data against my mysql-6.0 branch. I think Vincent
will be a bit shocked at the result (I was):

1) Just mysql-6.0:
	68.8k revisions
	2.8k mainline
	11.2s import time
        20MB on disk
2) mysql-6.0 --expand-all
	661k dotted_revno entries
	 17k branches
	 32m import time
 	 54m on disk

The really interesting thing here, is that while there are *far* more
branches and total revisions, the average number of new revisions per
branch is actually quite a bit smaller, an average of just 34 revs per
branch.

Note also that for the queries I've generated, mysql also sees similar
orders-of-magnitude speedups.

get_dotted_revno -r -1
  5ms across the board
get_dotted_revno -r -110
  29ms	bzr
  18ms	db-db-id
   6ms  db-range
get_dotted_revno -r 2497.329.15 (in the area of -r -100)
  2.434s  bzr
  0.020s  db-db-id
  0.007s  db-range
get_dotted_revno -r 1700.338.1 (in the range of -r -1000)
  2.421s  bzr
  0.108s  db-db-id
  0.013s  db-range
walk_ancestry
  2.043s  bzr-Graph().iter-ancestry
  1.354s  bzr-get_known_graph_ancestry()
  4.260s  db-rev-id (intersects with the 'revision' table, walks-by-one)
  0.353s  db-db-id (walks by 100, only uses db-ids)
  0.345s  db-range (finds mainline, walks by 100 others)
  0.087s  db-dotted (intersects mainline_range with
                     dotted_revno.merged_revision)


John
=:->


John Arbash Meinel wrote:
> I've been toying around with some ideas about how to make history
> management a bit better in bzr. Specifically focusing on dotted_revnos
> and some other aspects that are currently 'whole ancestry' operations.
> I've started experimenting with it by defining some SQL schemas that we
> could use, and prototyping the ideas in an sqlite database. I think most
> of this could be extended into bzr-specific btree indices if we want to
> avoid using sqlite for it.
> 
> I've gotten far enough to determine that some of the tricks I had
> thought about really do seem to work. The plugin is currently available
> here:
> 
>  lp:~jameinel/+junk/bzr-history-db

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAku4wuYACgkQJdeBCYSNAANStQCdEgv78zp0RWy/2903DxHlbu6a
ZowAoLvHhSCxFuzXuFKgKs3RLNksVxCl
=1w7M
-----END PGP SIGNATURE-----



More information about the bazaar mailing list