behaviour of sort command

Dan Christensen jdc at uwo.ca
Fri Jan 20 21:57:59 UTC 2012


Executive summary:  sort thinks that " " < "_" but that "_1" < " 2".
Is this a bug?

Longer version:

I've noticed that the sort command behaves in a way that is surprising
to me.  If you feed it the following input:

 1
_1
 2
_2

consisting of four lines each with two characters, it returns those
lines in the same order.  I had assumed that it sorted lines by
comparing the first byte, and only looking to the next byte if
the first bytes agree, but it seems like the second byte can affect
the sort order.  Put another way, no matter whether sort considers
" " < "_" or vice versa, I would expect the lines starting with a 
space to be grouped together, and those starting with an underscore
to also be grouped together.

If I feed in two lines, with one containing just a space and
one just an underscore, it sorts the space first.  But if the
two lines are

_1
 2

then it puts the line with the underscore first.

Is this a bug in sort?  It's not explained in the man page or the info
page, and I think most people would expect that adding text to the end
of unequal lines shouldn't change their sort order.

I'm using sort 8.5 in coreutils 8.5-1ubuntu3 under maverick,
without any locale environment variables set.

Dan





More information about the ubuntu-users mailing list