[MERGE] Improvements to find_unique_ancestors()

John Arbash Meinel john at arbash-meinel.com
Wed May 21 00:28:50 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
| John Arbash Meinel wrote:
|
|> For a performance comparison...
|>
|> If I run a graph search on the last 100 merge nodes in bzr.dev, it takes an
|> average of 320ms to find the unique nodes. With this code, it takes an
|> average
|> of 115ms. So it is approx 2x as fast.
|
| Really well done on these improvements! I'm rather keen to see this
| land so I've spent much of this afternoon/evening reviewing it. I can't
| see anything wrong code wise but I'm not comfortable approving it yet.
|
| I think it needs 2 changes made:
|
| 1. find_unique_ancestors() is simply too long - it covers 4+ pages
|    when printed - making it unmaintainable by people like me with
|    smaller brains. I'm aware of the performance emphasis here but
|    still feel it could be broken up into a master method that calls
|    two or three helper methods. Those helper methods can contain the
|    loops, rather than being called inside the loops, so performance
|    shouldn't be impacted by a noticeable amount.

I did this. It *might* be slightly faster with the current refactoring. I don't
see any specific reason why it would be, but in my testing I see 128ms => 118ms.
The only thing I can think of might be memory pressure. Or maybe an "if()" check
is cheaper than a no-op _mutter() call.

The diff looks a little messy because of the rearranging. Some code stayed
linked, some of it moved. You may just want to look at the raw functions.
themselves. Certainly "find_unique_ancestors" is nice and short now. If you take
out the comments and debug stuff:

~    def find_unique_ancestors(self, unique_revision, common_revisions):
~        if unique_revision in common_revisions:
~            return set()

~        unique_searcher, common_searcher = self._find_initial_unique_nodes(
~            [unique_revision], common_revisions)

~        unique_nodes = unique_searcher.seen.difference(common_searcher.seen)
~        if not unique_nodes:
~            return unique_nodes

~        (all_unique_searcher,
~         unique_tip_searchers) = self._make_unique_searchers(unique_nodes,
~                                    unique_searcher, common_searcher)

~        self._refine_unique_nodes(unique_searcher, all_unique_searcher,
~                                  unique_tip_searchers, common_searcher)
~        true_unique_nodes = unique_nodes.difference(common_searcher.seen)
~        return true_unique_nodes

|
| 2. The algorithm outlined in the opening large comment needs updating
|    to cover the information contained in your email.

It actually didn't have much that was changed. I expanded a bit, but the basic
structure is the same.

|
| As a minor note, several of the _mutter calls probably ought to be
| using %d instead of %s. It's pretty minor compared to the issues above
| though.

I didn't do this, since "%d" % (int) == "%s" % (int). The reasons for %d are to
handle "%2.2d" or to round from a float. %s "always works" which means I won't
accidentally get an error if I change an int into an object. I'll do it if you
think it adds clarity, though.

|
| bb:resubmit.
|
| I realise that this patch doesn't necessarily make things that much
| less readable than the existing code. FWIW, I was happy to approve the
| earlier patch because I knew you had a follow-on one coming.
| Let's hope we don't need to come back to this method again for a while
| now - let's leave it both faster and easier to read please.
|
| Ian C.
|

As long as you review it, it wasn't very hard to break it into multiple functions.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgzXrIACgkQJdeBCYSNAAN/LgCcCOIWXoEfiBr7DCIUhL9pw5bn
qkMAni+DDrGmTFIwSIRgw79+FBMi9zq3
=5fPN
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: find_unique_refactored.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20080520/7c4f4d2d/attachment-0001.diff 


More information about the bazaar mailing list