[Linux-HA] Resources failing to start takes down all HB resources

Ragnar Kjørstad linux-ha at ragnark.vestdata.no
Wed Mar 21 19:33:16 MDT 2007


Let's say we have a group of resources A, B and C, in that order.
We have two nodes online, node1 and node2.
The resource group is active on node1.

Then resource C fails, and fails to restart.
The whole group is migrated to node2 by stopping C, B and A, and then
starting A, B and C on node2.
However, C fails to start on node2 by the same reason as on node1.

Now heartbeat stops C, B and A, but have nowhere else to move them.

The end result is that neither services A or B run, even though they
could run perfectly fine on either node1 or node2. Ironically A and B
are not available _because_ we're using a HA configuration.


We've looked for ways to use explisit rules instead of groups to see if
there is a way to avoid A and B beeing stopped just because C can't run,
but have not found any way to specify this.

Did we miss anything, is this a design flaw or a bug? I'm sure I'm not
alone in thinking it's important that resources are as available as
possible in a HA cluster. :-)




-- 
Ragnar Kjørstad
Software Engineer
Scali - http://www.scali.com
Scaling the Linux Datacenter


More information about the Linux-HA mailing list