[Merge] performance increases for 'bzr status'

Andrew Bennetts andrew at canonical.com
Tue May 30 16:26:27 BST 2006


On Tue, May 30, 2006 at 08:28:59AM -0500, John Arbash Meinel wrote:
> Andrew Bennetts wrote:
[...]
> > 
> > The simplest way is "self.assertEqual(sorted(some_list), some_list)" I think.
> > 
> > -Andrew.
> 
> Actually, I though 'sorted()' sorts the list and just returns it. ie the
> definition is:
> 
> def sorted(lst):
>   lst.sort()
>   return lst

No, it's more like:

    def sorted(sequence):
        lst = list(sequence)
        lst.sort()
        return lst

For example:

    >>> l = [3, 2, 1]
    >>> sorted(l) == l
    False
    >>> l == sorted(l)
    False

(playing with stuff like this interactively is very easy, I highly recommend
getting into the habit of doing it.)

See also "What's New in Python 2.4", which says:

    There is a new built-in function sorted(iterable) that works like the
    in-place list.sort() method but can be used in expressions. The differences
    are:
    
        * the input may be any iterable;
        * a newly formed copy is sorted, leaving the original intact; and
        * the expression returns the new sorted copy

(file:///usr/share/doc/python2.4/html/whatsnew/node12.html on my system)

> Which means that the above would always return true.

As you can see, that's not right :)

> I was hoping to have something inexpensive that wouldn't require copying
> the list, but the best I could think of was:
> 
> assert sorted(some_list[:]) == some_list, "lists are not sorted".
> 
> (This was runtime code, not test code)

If you want the theoretically fast way to check that a sequence is sorted, i.e.
without copying the whole sequence and sorting it (which is potentially O(n log
n), although I believe CPython's implementation is O(n) in the case the list is
already sorted), check that the invariant "seq[x] < seq[x+1]" holds for the
whole sequence.  i.e.:

    iterator = iter(seq)
    previous_element = iterator.next()
    for element in iterator:
        assert previous_element < element
        previous_element = element

(note that for simplicity this abuses the fact that iter(iter(l)) is iter(l)
when l is a list object -- this isn't necessarily true for all iterators.)

I think you'll find that in many cases that "sorted(l) == l" will be much faster
than a for loop that has to execute python bytecodes every iteration.  Even
though it does more work, it does it all in C -- and it's more readable anyway.

-Andrew.





More information about the bazaar mailing list