External Sort Code in C/C++

MirJafar Ali mirjafarali at gmail.com
Wed Feb 24 06:22:45 UTC 2010


On Wed, Feb 24, 2010 at 12:11 AM, Karl Auer <kauer at biplane.com.au> wrote:

> On Tue, 2010-02-23 at 22:17 -0600, MirJafar Ali wrote:
> > External sorting is well defined by Prof. Knuth as sorting process in
> > the files and not in memory.  For example, I want to sort 100GB on my
> > machine with 1GB RAM. I don't think that STL and standard binary sort
> > will do that they are all in-memory sorting codes.
> >
> Oh, OK, I see what you mean.
>
> The general principle is to reduce the external data to a set of keys
> that DOES fit in memory, or to index the external data on disk, then
> traverse the index. Actually sorting the file *on disk* is rarely
> useful.
>
> As far as the sorting step itself goes, almost all sorts do indeed rely
> on having some form of randomly addressable array, even if that "array"
> is randomly addressable records in a file. If your data is accessible
> only sequentially (as it might be off tape) or if the records are of
> different sizes within the file (as with a text file) then you do need
> to use different algorithms. Or at least use a preparation step.
>
> Rather than telling us what solution you think you need, why not tell us
> the problem, and ask for suggested solutions? There may be other ways
> around it than the one you are thinking of.
>
> Regards, K.
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Karl Auer (kauer at biplane.com.au)                   +61-2-64957160 (h)
> http://www.biplane.com.au/~kauer/ <http://www.biplane.com.au/%7Ekauer/>
>                +61-428-957160 (mob)
>
> GPG fingerprint: B386 7819 B227 2961 8301 C5A9 2EBC 754B CD97 0156
> Old fingerprint: 07F3 1DF9 9D45 8BCD 7DD5 00CE 4A44 6A03 F43A 7DEF
>
> --
> ubuntu-users mailing list
> ubuntu-users at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>


Hello,

Actually I am trying to perform some studies on file system dynamic data
clustering.  Many papers
from Remzi group ( At Wisconsin, Madison and Berkeley groups ) cite external
sorting as a good
application case to  study file system performance at low level i.e. how
fragmentation affects the overall
performance. Therefore, in a way, I am just trying to reproduce their work
to start my own work.


Mir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20100224/14c154ce/attachment.html>


More information about the ubuntu-users mailing list