Red Hat Cluster Suite / Handling of shutdown and mounting of GFS

Chris Joelly chris-m-lists at joelly.net
Fri Sep 5 14:52:57 UTC 2008


On Fre, Sep 05, 2008, Ante Karamatic wrote:
> On Fri, 5 Sep 2008 12:51:42 +0200
> Chris Joelly <chris-m-lists at joelly.net> wrote:
> 
> Moving services isn't an issue here (you could remove all services from
> node with /etc/init.d/rgmanager stop). This problem is related with
> cluster membership. I don't know exactly where the problem is (I'm
> just a user, not developer :).

unfortunately no developers out there reading our posts :-) i posted to
linux-cluster list too but no recommendations yet. i'm very enthusiastic
tracking down problems, but i'm mainly used to track down java related
problems as its my main occupation ;-)

> I'll repeat once more, having only two nodes in cluster is worst
> possible scenario for RHCS.

But you then have to use some other shared storage, DRBD won't work with
more than 2 nodes. and thats too expensive for the actual project ...

> I wouldn't use it on two-node cluster if I really don't have to (but I
> do in one case), but it's far away from useless. It's great :) The same
> problem exist on all distributions (FWIW my crappy two node cluster is
> on RedHat and all others are on Ubuntu).

This means that its better to switch to heartbeat managed services in an
active/passive manner? At least for a two node setup?

> Since RHCS isn't aware of DRBD, you can't really rely on it to handle
> GFS mount. This is why I don't manage GFS mounts with RHCS. I rather
> mount GFS on both machines and then let the services read it when they
> need to. For example:
> 
> If I have two apache nodes, then I mount /var/www as GFS on both
> (underneath this GFS is a DRBD device with both nodes in
> primary-primary). As soon as first node dies, service is started on the
> other node. RHCS doesn't manage my /var/www mount.

ok. so you define the services as "not auto start" in cluster.conf so
that your are able to bring up the underlaying drbd-clvm-gfs stuff.

as i conclude:
if the node a service runs on fails the failover node (in a 2-node 
scenario) fences the failed node and takes the service of the failed node. 
then when the failed node recovers, either from a reboot or from manually
intervention you first bring up the GFS mounts and then move the service
back to the re-joined node?

sounds reasonable... ;) and avoids the requirement of the rgmanager to
check if a 'shared' resource (GFS in this case) is already activated by
another service on the same node ...

But i have something left open, at least in my head...
how do i safely remove one node from a running cluster, so that the
services on the remaining node keep running. 

Chris





More information about the ubuntu-server mailing list