[Linux-HA] Automatically failing over resources that depend on a master

Jeremy N Thornhill jeremy.thornhill at duke.edu
Mon Nov 5 07:13:00 MST 2007


First, thanks for the help on my earlier master/slave issue.  I'm now 
trying to get automatic fail over working, and have encountered an issue 
with services that must run on a "master" of a master/slave set.

In my setup, I've got a resource group (resgroup0 - composed of httpd and 
tomcat) that must run on a DRBD master.  I've got a filesystem (fs0) 
constrained to the master instance of a DRBD master/slave set (ms-drbd0), 
and I've got resgroup0 constrained to fs0.  I've also got order 
constraints so that things start up properly. My setup is largely based on 
the following docs:

http://linux-ha.org/v2/faq/forced_failover
http://www.linux-ha.org/DRBD/HowTov2

In testing, this looks like:

Resource Group: resgroup0
    httpd   (heartbeat::ocf:apache):        Started NodeA
    tomcat     (heartbeat::ocf:tomcat):        Started NodeA
Master/Slave Set: ms-drbd0
    drbd0:0     (heartbeat::ocf:drbd):  Started NodeB
    drbd0:1     (heartbeat::ocf:drbd):  Master NodeA
fs0     (heartbeat::ocf:Filesystem):    Started NodeA

If NodeA "goes away" (shut down, heartbeat killed, cable unplugged, etc) 
everything happily moves over to NodeB as I'd expect.

However, this does not happen in response to service failures in 
resgroup0.  I've got monitoring events defined that are successfully 
restarting services as they fail - however, if I break a service such that 
it should fail over (as defined by "resource stickiness" and "resource 
failure stickiness"), resgroup0 and fs0 BOTH stop, logging the following:

pengine[14827]: 2007/11/02_17:35:20 info: unpack_find_resource: Internally 
renamed drbd0:0 on NodeA to drbd0:1
pengine[14827]: 2007/11/02_17:35:20 info: group_print: Resource Group: 
resgroup0
pengine[14827]: 2007/11/02_17:35:20 info: native_print:     httpd 
(heartbeat::ocf:apache):        Started NodeA FAILED
pengine[14827]: 2007/11/02_17:35:20 info: native_print:     tomcat 
(heartbeat::ocf:tomcat):        Started NodeA
pengine[14827]: 2007/11/02_17:35:20 info: clone_print: Master/Slave Set: 
ms-drbd0
pengine[14827]: 2007/11/02_17:35:20 info: native_print:     drbd0:0 
(heartbeat::ocf:drbd):  Slave NodeB
pengine[14827]: 2007/11/02_17:35:20 info: native_print:     drbd0:1 
(heartbeat::ocf:drbd):  Master NodeA
pengine[14827]: 2007/11/02_17:35:20 info: native_print: fs0 
(heartbeat::ocf:Filesystem):    Started NodeA
pengine[14827]: 2007/11/02_17:35:20 info: master_color: Promoting drbd0:1
pengine[14827]: 2007/11/02_17:35:20 info: master_color: Promoted 1 
instances of a possible 1 to master
pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource fs0 
cannot run anywhere
pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource httpd 
cannot run anywhere
pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource tomcat 
cannot run anywhere

This is confusing to me, as it claims to be "promoting" drbd0:1 which is 
already running the "Master" resource.  In this instance I would have 
expected it to promote drbd0:0, which would have allowed all of my 
resources to run on NodeB.

I believe this means I either have a constraint incorrectly defined, or 
I'm missing a needed constraint.  If anybody can point me in the right 
direction, I'd appreciate it - the constraints in my cib.xml file are:

     <constraints>
       <rsc_order id="drbd0_before_fs0" from="fs0" action="start" 
to="ms-drbd0" to_action="promote"/>
       <rsc_order id="fs0_before_resgroup0" from="resgroup0" 
action="start" to="fs0" type="after"/>
       <rsc_colocation id="fs0_on_drbd0" to="ms-drbd0" to_role="master" 
from="fs0" score="infinity"/>
       <rsc_colocation score="infinity" id="resgroup0_on_fs0" to="fs0" 
from="resgroup0"/>
     </constraints>

Thanks,
Jeremy Thornhill
jeremy.thornhill at duke.edu


More information about the Linux-HA mailing list