[Linux-HA] asymmetrical 4-node cluster

Andrew Beekhof beekhof at gmail.com
Thu Mar 1 07:29:39 MST 2007


On 3/1/07, Yan Fitterer <yfitterer at novell.com> wrote:
>
> >> > Feb 27 19:23:04 rbxw02 db2[22847]: [22866]: ERROR: DB2 instance
> >> > [pdot] not available
> >> >
> >> > The reason for the error is simple: DB2 not installed.
> >> >
> >> > Is there a way to tell heartbeat not to check a resource on a node
> >> > which is anyway not eligible to run it? If not, how difficult
> >> > would it be to implement such a feature?
> >>
> >> No. This is not possible.
> >>
> >> The real fix is in the DB2 resource agent: It should detect that the
> >> binaries are absent and thus the resource very clearly is not active.
> >> (It could do a quick ps xa | grep check to be 100% sure.)
> >
> > or just remove the RA.  more recent versions of heartbeat handle this
> > correctly.
> > (you'll know pretty soon if the version you have doesn't :- )
>
> When I tried this (resource with missing RA on some nodes) inadvertently a couple of days ago on 2.0.7-1.2 it brought
> the whole cluster down in a spin. I could get no more sense out of it, and it stonithed 3/4 of  the nodes. Since it was
> simplest to ensure the RA was on all nodes, that's what I did...

nod - older version will do that because a missing RA was being
treated as a failure which meant:
* probes looked like the resource was "active but failed"
* the stops failed

and the only way to clean up a failed stop is to then shoot the machine :(


More information about the Linux-HA mailing list