[PATCH] sort with key and reverse keywors in Python2.4

Andrew Bennetts andrew at canonical.com
Thu Jun 2 16:42:57 BST 2005


(This turned into a mini-rant; sorry about the length)

On Thu, Jun 02, 2005 at 04:03:13PM +0200, Mario Pernici wrote:
> 
> On Thu, 2 Jun 2005, Aaron Bentley wrote:
> > Is there a compelling reason to use Python 2.4's sort facilities?  I
> > recognize that they're nicer, but in general, I think it's better to
> > have one codepath, instead of two.  The code's cleaner that way, and
> > it's easier to test.
> > 
> > Of course, there are always exceptions if there's a good enough reason.
> 
> In a simple case such as
> def key1(x):
>   return x%10
> 
> a = range(1000)
> my_sort(a, key1)
> 
> Python2.4 is about 9 times faster than Python2.3 on my machine.

Here's what I see on my laptop:

andrew at trogdor $ python /usr/lib/python2.4/timeit.py -s "def key1(x): return x%10" "range(1000).sort(key=key1)"
1000 loops, best of 3: 1.42 msec per loop
andrew at trogdor $ python /usr/lib/python2.4/timeit.py -s "def key1(x): return x%10" -s "def cmpr(x, y): return cmp(key1(x), key1(y))" "range(1000).sort(cmpr)"
100 loops, best of 3: 12.6 msec per loop

I.e. for your example of sorting sequence of a thousand integers by a key of
"x % 10", this is the difference between 0.00012.6 seconds and 0.0000014
seconds.

That's 88 times faster, but still completely insignificant.  That's right,
eighty-eight times faster is insignificant: my laptop can do this the slow
way a hundred times and my screen won't even have refreshed once! :)

If you can demonstrate an improvement at a scale that humans can actually
notice, preferably with actual bzr code and real data rather than contrived
examples, then you'll get me interested.  Python's sort does all sorts of
clever tricks to take advantage of partially-sorted sequences, so using
contrived data might give misleading results -- and comparing integers is
special-cased inside the Python VM.  It's really hard to have any confidence
that your numbers or mine have any bearing at all to bzr.

Anyway, I doubt that with your change you'll see even a 1% increase in
performance for any given bzr operation, because I've seen nothing to
suggest that bzr spends a significant portion of its time sorting changesets
(although I haven't profiled).  I'd love to be proved wrong :)

> It is an optimization at no cost; the proposed patch
> works with both Python versions, so it gives no problems
> in testing.

There's a significant cost in terms of readability and difficulty of
acheiving test coverage for both code paths, both of which negatively impact
maintainability of the code.  This is far from no cost!

If anything, maintainance costs are more important than runtime costs --
that's one reason why people choose Python in the first place.  (To me the
big advantage of the 2.4 sorting features is how much more readable the code
is; the extra speed is just icing)

just-say-no-to-premature-optimisations-ly yrs,

-Andrew.





More information about the bazaar mailing list