[RFC] tsort.MergeSorter uses __slots__
John Arbash Meinel
john at arbash-meinel.com
Thu Apr 19 01:28:35 BST 2007
John Arbash Meinel wrote:
> I was working nearby and I had cause to profile merge_sort.
>
> It seemed that _push_node() and _pop_node() were taking a lot longer
> than they should (though I was accidentally reading TopoSorter instead).
>
> So I added __slots__, and it turns out to be quite a bit faster.
>
> On my machine doing list(merge_sort()) for all of bzr.dev changes from
> taking ~705ms to taking 600ms. (~15% improvement).
>
> This is a rather simple change for such a big improvement, so it seems
> reasonable to merge it.
>
> John
> =:->
>
Since I'm a sucker for optimizing, I kept going. The attached bundle
provides 2 more optimizations. Each one doing more "ugly" stuff in
exchange for making things faster.
2426 John Arbash Meinel 2007-04-18
Use __slots__ for MergeSorter
2427 John Arbash Meinel 2007-04-18
Change valid self._foo variables into local variables.
Using __slots__ changed sort time from ~700ms => ~600ms.
Using local variables drops it down to 530 - 550ms.
2428 John Arbash Meinel 2007-04-18
Inline self._pop_node and self._push_node
These are still separate functions, but rather than using
self._a_stack.append
we assign a local variable a_stack_append, and call it directly.
This drops the merge_sort() time down to approx 385ms-400ms
With that large of a speed-up it seems worth the loss
in readability. (This is almost 50% of the original time)
We can merge any level of this, but considering the last optimization
helps a lot, it seems worthy.
Basically it changes lots of "self._foo.func()" calls into local
variables with "foo_func=self._foo.func". By pre-compiling them into the
function we don't have a global lookup, or a lookup of the attribute for
all the lists and sets.
It would be good to test these on a faster machine. But there are a lot
of functions which need to do a merge_sort() of the whole ancestry graph.
This has a measurable effect on "bzr log --short -r-10..-1 --forward" of
bzr.dev. Without this patch it takes around 1.82s, and with it 1.66s.
This is a slow machine, but it is a 10% speed-up for 'bzr log'.
The real fix would be to find a way to make 'merge_sort' not require the
complete graph. I think it would work if we just graphed the nodes of
set(tip) - set(base), but we sort of need the ancestry to do the set
difference so that seems to enter a cyclical definition. The best I
could come up with is to have a function like "get_revision_graph"
called "get_revision_graph_subset()".
And when building up the "reachable" subset, it would remove any nodes
reachable by another subset.
John
=:->
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: merge_sort_inlined.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20070418/565ecbd3/attachment-0001.diff
More information about the bazaar
mailing list