[MERGE] Improvements to find_unique_ancestors()
John Arbash Meinel
john at arbash-meinel.com
Wed May 21 00:28:50 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ian Clatworthy wrote:
| John Arbash Meinel wrote:
|
|> For a performance comparison...
|>
|> If I run a graph search on the last 100 merge nodes in bzr.dev, it takes an
|> average of 320ms to find the unique nodes. With this code, it takes an
|> average
|> of 115ms. So it is approx 2x as fast.
|
| Really well done on these improvements! I'm rather keen to see this
| land so I've spent much of this afternoon/evening reviewing it. I can't
| see anything wrong code wise but I'm not comfortable approving it yet.
|
| I think it needs 2 changes made:
|
| 1. find_unique_ancestors() is simply too long - it covers 4+ pages
| when printed - making it unmaintainable by people like me with
| smaller brains. I'm aware of the performance emphasis here but
| still feel it could be broken up into a master method that calls
| two or three helper methods. Those helper methods can contain the
| loops, rather than being called inside the loops, so performance
| shouldn't be impacted by a noticeable amount.
I did this. It *might* be slightly faster with the current refactoring. I don't
see any specific reason why it would be, but in my testing I see 128ms => 118ms.
The only thing I can think of might be memory pressure. Or maybe an "if()" check
is cheaper than a no-op _mutter() call.
The diff looks a little messy because of the rearranging. Some code stayed
linked, some of it moved. You may just want to look at the raw functions.
themselves. Certainly "find_unique_ancestors" is nice and short now. If you take
out the comments and debug stuff:
~ def find_unique_ancestors(self, unique_revision, common_revisions):
~ if unique_revision in common_revisions:
~ return set()
~ unique_searcher, common_searcher = self._find_initial_unique_nodes(
~ [unique_revision], common_revisions)
~ unique_nodes = unique_searcher.seen.difference(common_searcher.seen)
~ if not unique_nodes:
~ return unique_nodes
~ (all_unique_searcher,
~ unique_tip_searchers) = self._make_unique_searchers(unique_nodes,
~ unique_searcher, common_searcher)
~ self._refine_unique_nodes(unique_searcher, all_unique_searcher,
~ unique_tip_searchers, common_searcher)
~ true_unique_nodes = unique_nodes.difference(common_searcher.seen)
~ return true_unique_nodes
|
| 2. The algorithm outlined in the opening large comment needs updating
| to cover the information contained in your email.
It actually didn't have much that was changed. I expanded a bit, but the basic
structure is the same.
|
| As a minor note, several of the _mutter calls probably ought to be
| using %d instead of %s. It's pretty minor compared to the issues above
| though.
I didn't do this, since "%d" % (int) == "%s" % (int). The reasons for %d are to
handle "%2.2d" or to round from a float. %s "always works" which means I won't
accidentally get an error if I change an int into an object. I'll do it if you
think it adds clarity, though.
|
| bb:resubmit.
|
| I realise that this patch doesn't necessarily make things that much
| less readable than the existing code. FWIW, I was happy to approve the
| earlier patch because I knew you had a follow-on one coming.
| Let's hope we don't need to come back to this method again for a while
| now - let's leave it both faster and easier to read please.
|
| Ian C.
|
As long as you review it, it wasn't very hard to break it into multiple functions.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkgzXrIACgkQJdeBCYSNAAN/LgCcCOIWXoEfiBr7DCIUhL9pw5bn
qkMAni+DDrGmTFIwSIRgw79+FBMi9zq3
=5fPN
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: find_unique_refactored.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20080520/7c4f4d2d/attachment-0001.diff
More information about the bazaar
mailing list