[MERGE] Improvements to find_unique_ancestors()

Tue May 20 23:12:41 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
| John Arbash Meinel wrote:
|
|> For a performance comparison...
|>
|> If I run a graph search on the last 100 merge nodes in bzr.dev, it takes an
|> average of 320ms to find the unique nodes. With this code, it takes an
|> average
|> of 115ms. So it is approx 2x as fast.
|
| Really well done on these improvements! I'm rather keen to see this
| land so I've spent much of this afternoon/evening reviewing it. I can't
| see anything wrong code wise but I'm not comfortable approving it yet.
|
| I think it needs 2 changes made:
|
| 1. find_unique_ancestors() is simply too long - it covers 4+ pages
|    when printed - making it unmaintainable by people like me with
|    smaller brains. I'm aware of the performance emphasis here but
|    still feel it could be broken up into a master method that calls
|    two or three helper methods. Those helper methods can contain the
|    loops, rather than being called inside the loops, so performance
|    shouldn't be impacted by a noticeable amount.

Heavy algorithmic code is always going to be a pain to maintain. I'll see what I
can do. But it took me a week of probing at it to really get up to speed. I
wouldn't say you have a small brain, just it takes a while to understand the
graph searching logic, and the implications of race conditions, etc.

I certainly don't fully understand it, and mostly go off of what seems right,
and then check its performance against real world data. (For example, it may
seem like you want to move the front edge fast, so that it can converge, but in
reality it often diverges because of all shortcuts.)

I think the reason I didn't break it up more is that there is a decent amount of
state that needs to be transfered between the different sections, which leads to
a lot of parameters, and long function calls. But I'll give it a shot, I might
really like what I end up with.

|
| 2. The algorithm outlined in the opening large comment needs updating
|    to cover the information contained in your email.
|
| As a minor note, several of the _mutter calls probably ought to be
| using %d instead of %s. It's pretty minor compared to the issues above
| though.

I %s works just fine for any time, int, float, Class, etc. I specifically chose
%s because it is less likely to break. (If you pass a class to %d you get an
exception, floats are ok.)

|
| bb:resubmit.
|
| I realise that this patch doesn't necessarily make things that much
| less readable than the existing code. FWIW, I was happy to approve the
| earlier patch because I knew you had a follow-on one coming.
| Let's hope we don't need to come back to this method again for a while
| now - let's leave it both faster and easier to read please.
|
| Ian C.
|

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgzTNkACgkQJdeBCYSNAAMbUACfcfjN8+Iix0NgxdleKngDESmI
9PwAn2qx1Jyvi2Uka3krKijpSBuAMISH
=9r5e
-----END PGP SIGNATURE-----