[Bug 846628] Re: gnu sort extremely slow in non C locale
Dave Gilbert
ubuntu at treblig.org
Sat Jan 14 16:01:15 UTC 2012
Hi Bijan,
I think it would probably worth asking on the coreutils mailing list about this - make it clear that it's
the speed rather than the behaviour you are concerned about.
I tried a similar test; I created a file with 2M copies of the lines:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaabaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aa1aaaaaabaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaa
now as far as I'm aware the sorting rules on that should be
the same on both en_GB and C
dg at major:~$ time sort te > /dev/null
real 0m9.045s
user 0m42.455s
sys 0m0.952s
dg at major:~$ time LANG=c sort te > /dev/null
real 0m3.508s
user 0m9.577s
sys 0m0.964s
So it's still siginifcantly faster with that file - so the time difference doesn't seem
to be related to the change in the actual sorted data.
Yes, I agree it should be possible to convert it to the same type of sorting order - however
what I don't know is what the costs of doing that are (or if it already tries to do that).
(I'm quite impressed sort now does multi threaded!).
Dave
** Changed in: coreutils (Ubuntu)
Status: Invalid => New
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to coreutils in Ubuntu.
https://bugs.launchpad.net/bugs/846628
Title:
gnu sort extremely slow in non C locale
Status in “coreutils” package in Ubuntu:
New
Bug description:
I tried sorting an ascii file of about 300 Megs and 8 million lines
with gnu sort and it was taking forever.
After 10 minutes I stopped it. I tried another sort program and it
finished in about 40 seconds.
I then took the output of that second sort and I checked it in gnu
sort, which reported that some lines were out of order.
The following lines:
....bbbbbbbbbwbbwwbwwwwwww.ww...1
....bbbbbbbbbwbbwwbwwwwwwwww....0
....bbbbbbbbbwbwwbwbwwwww.ww..w.1
But they are not as far as I can tell. Then I thought the problem was the locale. Indeed my locale was set to:
LANG=en_CA.UTF-8
setting it to:
LANG=C
both made gnu sort finish the sort in 40 seconds, and confirm the
proper order.
Since the file is %100 ASCII (it only has the 6 characters ".01bw\n" I
think this is a bug, that the locale should make any difference.
Best regards,
Bijan
ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: coreutils 8.5-1ubuntu6
ProcVersionSignature: Ubuntu 2.6.38-11.48-generic 2.6.38.8
Uname: Linux 2.6.38-11-generic i686
Architecture: i386
Date: Sat Sep 10 15:59:07 2011
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Release i386 (20110427.1)
ProcEnviron:
LANGUAGE=en_CA:en
LANG=en_CA.UTF-8
SHELL=/bin/bash
SourcePackage: coreutils
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/846628/+subscriptions
More information about the foundations-bugs
mailing list