<br><br><div class="gmail_quote">On Wed, Feb 24, 2010 at 12:11 AM, Karl Auer <span dir="ltr"><<a href="mailto:kauer@biplane.com.au">kauer@biplane.com.au</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="im">On Tue, 2010-02-23 at 22:17 -0600, MirJafar Ali wrote:<br>
> External sorting is well defined by Prof. Knuth as sorting process in<br>
> the files and not in memory. For example, I want to sort 100GB on my<br>
> machine with 1GB RAM. I don't think that STL and standard binary sort<br>
> will do that they are all in-memory sorting codes.<br>
><br>
</div>Oh, OK, I see what you mean.<br>
<br>
The general principle is to reduce the external data to a set of keys<br>
that DOES fit in memory, or to index the external data on disk, then<br>
traverse the index. Actually sorting the file *on disk* is rarely<br>
useful.<br>
<br>
As far as the sorting step itself goes, almost all sorts do indeed rely<br>
on having some form of randomly addressable array, even if that "array"<br>
is randomly addressable records in a file. If your data is accessible<br>
only sequentially (as it might be off tape) or if the records are of<br>
different sizes within the file (as with a text file) then you do need<br>
to use different algorithms. Or at least use a preparation step.<br>
<br>
Rather than telling us what solution you think you need, why not tell us<br>
the problem, and ask for suggested solutions? There may be other ways<br>
around it than the one you are thinking of.<br>
<div><div></div><div class="h5"><br>
Regards, K.<br>
<br>
--<br>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br>
Karl Auer (<a href="mailto:kauer@biplane.com.au">kauer@biplane.com.au</a>) +61-2-64957160 (h)<br>
<a href="http://www.biplane.com.au/%7Ekauer/" target="_blank">http://www.biplane.com.au/~kauer/</a> +61-428-957160 (mob)<br>
<br>
GPG fingerprint: B386 7819 B227 2961 8301 C5A9 2EBC 754B CD97 0156<br>
Old fingerprint: 07F3 1DF9 9D45 8BCD 7DD5 00CE 4A44 6A03 F43A 7DEF<br>
</div></div><br>--<br>
ubuntu-users mailing list<br>
<a href="mailto:ubuntu-users@lists.ubuntu.com">ubuntu-users@lists.ubuntu.com</a><br>
Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/ubuntu-users" target="_blank">https://lists.ubuntu.com/mailman/listinfo/ubuntu-users</a><br></blockquote><div><br><br>Hello,<br><br>Actually I am trying to perform some studies on file system dynamic data clustering. Many papers<br>
from Remzi group ( At Wisconsin, Madison and Berkeley groups ) cite external sorting as a good <br>application case to study file system performance at low level i.e. how fragmentation affects the overall <br>performance. Therefore, in a way, I am just trying to reproduce their work to start my own work.<br>
<br><br>Mir<br><br></div></div><br>