External Sort Code in C/C++

Karl Auer kauer at biplane.com.au
Wed Feb 24 06:11:19 UTC 2010


On Tue, 2010-02-23 at 22:17 -0600, MirJafar Ali wrote:
> External sorting is well defined by Prof. Knuth as sorting process in
> the files and not in memory.  For example, I want to sort 100GB on my
> machine with 1GB RAM. I don't think that STL and standard binary sort
> will do that they are all in-memory sorting codes.
> 
Oh, OK, I see what you mean.

The general principle is to reduce the external data to a set of keys
that DOES fit in memory, or to index the external data on disk, then
traverse the index. Actually sorting the file *on disk* is rarely
useful.

As far as the sorting step itself goes, almost all sorts do indeed rely
on having some form of randomly addressable array, even if that "array"
is randomly addressable records in a file. If your data is accessible
only sequentially (as it might be off tape) or if the records are of
different sizes within the file (as with a text file) then you do need
to use different algorithms. Or at least use a preparation step.

Rather than telling us what solution you think you need, why not tell us
the problem, and ask for suggested solutions? There may be other ways
around it than the one you are thinking of.

Regards, K.

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer (kauer at biplane.com.au)                   +61-2-64957160 (h)
http://www.biplane.com.au/~kauer/                  +61-428-957160 (mob)

GPG fingerprint: B386 7819 B227 2961 8301 C5A9 2EBC 754B CD97 0156
Old fingerprint: 07F3 1DF9 9D45 8BCD 7DD5 00CE 4A44 6A03 F43A 7DEF
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20100224/a86b629b/attachment.sig>


More information about the ubuntu-users mailing list