[MERGE][1.0] Graph.heads() optimization... 1500 times faster

Thu Dec 6 22:09:50 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In trying to work on Graph.find_difference() I found some sub-optimal code for
Graph.heads(). The attached patch speeds up a heads() call for me from 11.4s to
7.48 milliseconds with a pack repository.

There are a few pieces:

a) The loop in Graph.heads() was checking each searcher for each revision found
in common, and having it search the ancestry and stop searching the ancestry.
This changes it to build up a set of all revisions which are in common, and
then only have one search for revisions to stop on each loop.

b) The search was being done using another _BreadthFirstSearcher, rather than
just looping and building up a set.

I benchmarked grabbing heads() from the parents of bzr.dev revno 3084. Those 2
changes dropped the heads() time from 922ms down to 27ms. On a Pack repository
it changed the heads() time [*faster* machine] from 11,400ms to 376ms.

c) Implement a CachingParentsProvider, and preload_parents functionality.
Basically, on each loop, each searcher knows what nodes it wants to access
next. We can make a single call for all parents, rather than doing it for each
searcher one at a time.  For Knits, that actually slows it down to 30ms (3ms
slower), but I'll talk more later about that.

For packs, it changes it from 11.4s => 376ms => 7.48ms

So, the way it is written, Provider.preload_parents() is supposed to return a
set of revisions that it could not preload. The idea is for
StackedParentsProvider which will ask all its providers to preload revisions
that it wasn't able to get from the first one. Which seemed to make sense when
you are asking questions between 2 pack repositories. 3ms is ~10% slowdown for
knits, and is probably slightly slower for packs. But I have the feeling that
3ms on some operations is greatly offset by the speedup on others.

With these changes, we might want to go back to not using the per-file graph
for heads(). At the very least, I think it is something we should re-evaluate.
Having a heads() search drop 30x on Knit repositories, and 1524x on Packs is
something worth triggering a re-evaluation of paying for IO versus in-memory
processing.

I've done some playing with Graph.find_difference() but I ran out of time, and
felt like this was more than worth merging.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHWHMuJdeBCYSNAAMRAqBsAKCc76UZXkT1jeihl2AceCDVNyJQGACfZ5VT
kuUFvW/G4T+kuY9AqlOo/Wk=
=xzHW
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: graph_update.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20071206/c0a4e83a/attachment-0001.diff