[ubuntu-uk] High Performance Computing
Tony Travis
ajt at rri.sari.ac.uk
Sun Aug 5 22:41:17 BST 2007
Ian Pascoe wrote:
> G'day all
>
> Anyone out there involved with, or has theoretical experience with HPC
> using clusters? Before I approach the various projects and make myself look
> a complete twonk, I'd appreciate some views and thoughts. Please?
Hello, Ian.
I've built a 92-node Ubuntu 6.06.1 LTS + openMosix Beowulf cluster that
I use for bioinformatics:
http://bioinformatics.rri.sari.ac.uk
> Been looking at Rocks Clusters (http://www.rockclusters.org) which provides
> the HPC platform based on REL 4, and the Linux Terminal Server project. The
> reason is a theoretical one of utilising redundant computing power, ie old
> out of spec machines that are to be chucked, in an environmentally nice way.
Rocks is about distributed computing, and LTSP is about using 'thin'
clients to run applications on a powerful central server. Using 'old'
PC's as 'thin' clients makes sense, but the overhead of network latency
makes is less attractive to use old PC's for distributed HPC computing.
The reliability of old PC's may not be good enough for them to be used
in an HPC cluster. I know this sounds disappointing, but I have a lot of
experience of using COTS (Commodity Off The Shelf) PC's as cluster nodes
and even new COTS PC's can be unreliable. This is important if you want
to do 'serious' work, because the results may be inaccurate.
In particular, COTS PC's don't have ECC (Error Correcting Code) memory
and recent PC's don't even have parity checking memory. In desktop use
COTS PC memory is reliable enough to run for a few hours without error,
but not when these PC's are run for months without rebooting, as HPC
compute nodes. I posted a message about this on the openMosix Wiki:
http://howto.x-tend.be/openMosixWiki/index.php/Additions_to_the_FAQ
> So the solution I came up with was to run the computing nodes as diskless
> work stations, getting their kernal / apps from the LTSP server, and dealing
> with the cluster server for the work queue. This sounds pretty straight
> forward to me.
I think you've misunderstood what LTSP is for: The apps run on the LTSP
server, and are just displayed on the 'thin' clients. What you are
describing is different, and what I do on our Beowulf cluster. The
compute nodes (COTS PC's) PXE boot from one of the cluster servers, and
they run an NFSROOT kernel.
> However, all references I find to cluster computing shows that the computing
> nodes each are headless systems; which is fine but I wanted to look at
> reducing the green footprint by taking the heavy power requirements out of
> the equation, ie the HDDs etc.
In fact it's the CPU that consumes most of the power, especially when
it's working hard. I've got 'dataless' compute nodes, actually. The
reason is, again, network latency. The idea is to use a local disk on
each node for swap and /tmp.
> I have already identified some technical aspects that knock this on the
> head - the main one being I envisage two seperate ethernet networks, one for
> LTSP and the other for the cluster, but neither software supports more than
> one NIC on a terminal / computing node.
My Beowulf is based on the design of EPCC's (Edinburgh Parallel Computer
Centre) BOBCAT (Budget-Optimised Beowulf Cluster using Affordable
Technology). A particular feature of this architecture is the use of two
separate network fabrics - One is the 'system' network, the other is the
'application' network. The original BOBCAT web site no longer exists,
but these links might be of interest:
http://www.hoise.com/primeur/00/articles/monthly/AE-PR-10-00-1.html
http://www.dl.ac.uk/TCSC/DisCo/TechPapers/Beowulf/node7.html
There is also a more detailed report about this type of Beowulf cluster
(PostScript format) at:
http://www.dl.ac.uk/TCSC/DisCo/TechPapers/Beowulf/beowulf.ps
> The next problem is that of storage space on the diskless terminal. By
> utilising the LTSP server as the processor rather spoils the whole thing, so
> I've looked at the terminal running the kernal and any apps locally, using
> the LTSP server to host the files required by the kernal / apps to run.
> This will reduce the load on the LTSP network. However, the terminal will
> still require to store stuff temporarily, like the swap partition, so I
> thought about using either flash drives, too expensive, or USB pen drives,
> preferred.
I think you're confusing two things: LTSP runs applications centrally,
but displays output on distributed clients. Rocks runs distributed
applications but you could, if you wanted, display output from HPC on a
'thin' client. However, the idea of running distributed applications in
'thin' clients is not very good. The openMosix software I run can be
used to do what you want to, but by CPU 'cycle 'stealing' to distribute
jobs between powerful workstations that may sometimes be idle:
http://openmosix.sourceforge.net/
> I chose Rocks over other projects mainly due to it's pedigree and support
> infrastructure, and LTSP as it seems to work with practically everything.
I think you are more impressed with Rocks' pedigree than you should be!
> "But why?" I hopefully hear you groan. As I say it's all theoretical, but
> doing some research there is certainly need for this type of setup. Maybe
> not for a top level production system, but one that just plods along and
> does the job.
If you want to learn about HPC, your approach is fine but if you want to
get a job done then buy a new quad core AMD64 motherboard, which will
out perform a network of many 'old' PC's...
> Sorry, I know it's not exactly Ubuntu orientated .... but this area really
> interests me.
Well, I'd better put my flame-proof underpants on too as I've gone on a
bit about it, but the Beowulf cluster I've built does run Ubuntu 6.06.1
LTS with a linux-2.4.26-om1 openMosix kernel, recompiled with NFSROOT
for the PXE 'dataless' compute nodes. I've made deb's of this if you're
interested:
http://bioinformatics.rri.sari.ac.uk/~ajt/openmosix
Best wishes,
Tony.
--
Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk
Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687
More information about the ubuntu-uk
mailing list