[ubuntu-uk] High Performance Computing

Tony Travis ajt at rri.sari.ac.uk
Sun Aug 5 22:41:17 BST 2007


Ian Pascoe wrote:
> G'day all
> 
> Anyone out there involved with, or has theoretical  experience with HPC
> using clusters?  Before I approach the various projects and make myself look
> a complete twonk, I'd appreciate some views and thoughts.  Please?

Hello, Ian.

I've built a 92-node Ubuntu 6.06.1 LTS + openMosix Beowulf cluster that 
I use for bioinformatics:

	http://bioinformatics.rri.sari.ac.uk

> Been looking at Rocks Clusters (http://www.rockclusters.org) which provides
> the HPC platform based on REL 4, and the Linux Terminal Server project.  The
> reason is a theoretical one of utilising redundant computing power, ie old
> out of spec machines that are to be chucked, in an environmentally nice way.

Rocks is about distributed computing, and LTSP is about using 'thin' 
clients to run applications on a powerful central server. Using 'old' 
PC's as 'thin' clients makes sense, but the overhead of network latency 
makes is less attractive to use old PC's for distributed HPC computing.

The reliability of old PC's may not be good enough for them to be used 
in an HPC cluster. I know this sounds disappointing, but I have a lot of 
experience of using COTS (Commodity Off The Shelf) PC's as cluster nodes 
and even new COTS PC's can be unreliable. This is important if you want 
to do 'serious' work, because the results may be inaccurate.

In particular, COTS PC's don't have ECC (Error Correcting Code) memory 
and recent PC's don't even have parity checking memory. In desktop use 
COTS PC memory is reliable enough to run for a few hours without error, 
but not when these PC's are run for months without rebooting, as HPC 
compute nodes. I posted a message about this on the openMosix Wiki:

http://howto.x-tend.be/openMosixWiki/index.php/Additions_to_the_FAQ

> So the solution I came up with was to run the computing nodes as diskless
> work stations, getting their kernal / apps from the LTSP server, and dealing
> with the cluster server for the work queue.  This sounds pretty straight
> forward to me.

I think you've misunderstood what LTSP is for: The apps run on the LTSP 
server, and are just displayed on the 'thin' clients. What you are 
describing is different, and what I do on our Beowulf cluster. The 
compute nodes (COTS PC's) PXE boot from one of the cluster servers, and 
they run an NFSROOT kernel.

> However, all references I find to cluster computing shows that the computing
> nodes each are headless systems; which is fine but I wanted to look at
> reducing the green footprint by taking the heavy power requirements out of
> the equation, ie the HDDs etc.

In fact it's the CPU that consumes most of the power, especially when 
it's working hard. I've got 'dataless' compute nodes, actually. The 
reason is, again, network latency. The idea is to use a local disk on 
each node for swap and /tmp.

> I have already identified some technical aspects that knock this on the
> head - the main one being I envisage two seperate ethernet networks, one for
> LTSP and the other for the cluster, but neither software supports more than
> one NIC on a terminal / computing node.

My Beowulf is based on the design of EPCC's (Edinburgh Parallel Computer 
Centre) BOBCAT (Budget-Optimised Beowulf Cluster using Affordable 
Technology). A particular feature of this architecture is the use of two 
separate network fabrics - One is the 'system' network, the other is the 
'application' network. The original BOBCAT web site no longer exists, 
but these links might be of interest:

http://www.hoise.com/primeur/00/articles/monthly/AE-PR-10-00-1.html
http://www.dl.ac.uk/TCSC/DisCo/TechPapers/Beowulf/node7.html

There is also a more detailed report about this type of Beowulf cluster 
(PostScript format) at:

	http://www.dl.ac.uk/TCSC/DisCo/TechPapers/Beowulf/beowulf.ps

> The next problem is that of storage space on the diskless terminal.  By
> utilising the LTSP server as the processor rather spoils the whole thing, so
> I've looked at the terminal running the kernal and any apps locally, using
> the LTSP server to host the files required by the kernal / apps to run.
> This will reduce the load on the LTSP network.  However, the terminal will
> still require to store stuff temporarily, like the swap partition, so I
> thought about using either flash drives, too expensive, or USB pen drives,
> preferred.

I think you're confusing two things: LTSP runs applications centrally, 
but displays output on distributed clients. Rocks runs distributed 
applications but you could, if you wanted, display output from HPC on a 
'thin' client. However, the idea of running distributed applications in 
'thin' clients is not very good. The openMosix software I run can be 
used to do what you want to, but by CPU 'cycle 'stealing' to distribute 
jobs between powerful workstations that may sometimes be idle:

	http://openmosix.sourceforge.net/

> I chose Rocks over other projects mainly due to it's pedigree and support
> infrastructure, and LTSP as it seems to work with practically everything.

I think you are more impressed with Rocks' pedigree than you should be!

> "But why?" I hopefully hear you groan.  As I say it's all theoretical, but
> doing some research there is certainly need for this type of setup.  Maybe
> not for a top level production system, but one that just plods along and
> does the job.

If you want to learn about HPC, your approach is fine but if you want to 
get a job done then buy a new quad core AMD64 motherboard, which will 
out perform a network of many 'old' PC's...

> Sorry, I know it's not exactly Ubuntu orientated  .... but this area really
> interests me.

Well, I'd better put my flame-proof underpants on too as I've gone on a 
bit about it, but the Beowulf cluster I've built does run Ubuntu 6.06.1 
LTS with a linux-2.4.26-om1 openMosix kernel, recompiled with NFSROOT 
for the PXE 'dataless' compute nodes. I've made deb's of this if you're 
interested:

	http://bioinformatics.rri.sari.ac.uk/~ajt/openmosix

Best wishes,

	Tony.
-- 
Dr. A.J.Travis,                     |  mailto:ajt at rri.sari.ac.uk
Rowett Research Institute,          |    http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn,          |   phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK.    |     fax:+44 (0)1224 716687



More information about the ubuntu-uk mailing list