[RFC] tsort.MergeSorter uses slots

Thu Apr 19 01:28:35 BST 2007

John Arbash Meinel wrote:
> I was working nearby and I had cause to profile merge_sort.
> 
> It seemed that _push_node() and _pop_node() were taking a lot longer
> than they should (though I was accidentally reading TopoSorter instead).
> 
> So I added __slots__, and it turns out to be quite a bit faster.
> 
> On my machine doing list(merge_sort()) for all of bzr.dev changes from
> taking ~705ms to taking 600ms. (~15% improvement).
> 
> This is a rather simple change for such a big improvement, so it seems
> reasonable to merge it.
> 
> John
> =:->
> 

Since I'm a sucker for optimizing, I kept going. The attached bundle
provides 2 more optimizations. Each one doing more "ugly" stuff in
exchange for making things faster.

 2426 John Arbash Meinel        2007-04-18
      Use __slots__ for MergeSorter

 2427 John Arbash Meinel        2007-04-18
      Change valid self._foo variables into local variables.
      Using __slots__ changed sort time from ~700ms => ~600ms.
      Using local variables drops it down to 530 - 550ms.

 2428 John Arbash Meinel        2007-04-18
      Inline self._pop_node and self._push_node
      These are still separate functions, but rather than using
self._a_stack.append
      we assign a local variable a_stack_append, and call it directly.
      This drops the merge_sort() time down to approx 385ms-400ms
      With that large of a speed-up it seems worth the loss
      in readability. (This is almost 50% of the original time)

We can merge any level of this, but considering the last optimization
helps a lot, it seems worthy.

Basically it changes lots of "self._foo.func()" calls into local
variables with "foo_func=self._foo.func". By pre-compiling them into the
function we don't have a global lookup, or a lookup of the attribute for
all the lists and sets.

It would be good to test these on a faster machine. But there are a lot
of functions which need to do a merge_sort() of the whole ancestry graph.

This has a measurable effect on "bzr log --short -r-10..-1 --forward" of
bzr.dev. Without this patch it takes around 1.82s, and with it 1.66s.
This is a slow machine, but it is a 10% speed-up for 'bzr log'.

The real fix would be to find a way to make 'merge_sort' not require the
complete graph. I think it would work if we just graphed the nodes of
set(tip) - set(base), but we sort of need the ancestry to do the set
difference so that seems to enter a cyclical definition. The best I
could come up with is to have a function like "get_revision_graph"
called "get_revision_graph_subset()".
And when building up the "reachable" subset, it would remove any nodes
reachable by another subset.

John
=:->
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: merge_sort_inlined.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20070418/565ecbd3/attachment-0001.diff 

[RFC] tsort.MergeSorter uses __slots__

[RFC] tsort.MergeSorter uses slots