[Linux-HA] Automatically failing over resources that depend on a
master
Jeremy N Thornhill
jeremy.thornhill at duke.edu
Mon Nov 5 07:13:00 MST 2007
First, thanks for the help on my earlier master/slave issue. I'm now
trying to get automatic fail over working, and have encountered an issue
with services that must run on a "master" of a master/slave set.
In my setup, I've got a resource group (resgroup0 - composed of httpd and
tomcat) that must run on a DRBD master. I've got a filesystem (fs0)
constrained to the master instance of a DRBD master/slave set (ms-drbd0),
and I've got resgroup0 constrained to fs0. I've also got order
constraints so that things start up properly. My setup is largely based on
the following docs:
http://linux-ha.org/v2/faq/forced_failover
http://www.linux-ha.org/DRBD/HowTov2
In testing, this looks like:
Resource Group: resgroup0
httpd (heartbeat::ocf:apache): Started NodeA
tomcat (heartbeat::ocf:tomcat): Started NodeA
Master/Slave Set: ms-drbd0
drbd0:0 (heartbeat::ocf:drbd): Started NodeB
drbd0:1 (heartbeat::ocf:drbd): Master NodeA
fs0 (heartbeat::ocf:Filesystem): Started NodeA
If NodeA "goes away" (shut down, heartbeat killed, cable unplugged, etc)
everything happily moves over to NodeB as I'd expect.
However, this does not happen in response to service failures in
resgroup0. I've got monitoring events defined that are successfully
restarting services as they fail - however, if I break a service such that
it should fail over (as defined by "resource stickiness" and "resource
failure stickiness"), resgroup0 and fs0 BOTH stop, logging the following:
pengine[14827]: 2007/11/02_17:35:20 info: unpack_find_resource: Internally
renamed drbd0:0 on NodeA to drbd0:1
pengine[14827]: 2007/11/02_17:35:20 info: group_print: Resource Group:
resgroup0
pengine[14827]: 2007/11/02_17:35:20 info: native_print: httpd
(heartbeat::ocf:apache): Started NodeA FAILED
pengine[14827]: 2007/11/02_17:35:20 info: native_print: tomcat
(heartbeat::ocf:tomcat): Started NodeA
pengine[14827]: 2007/11/02_17:35:20 info: clone_print: Master/Slave Set:
ms-drbd0
pengine[14827]: 2007/11/02_17:35:20 info: native_print: drbd0:0
(heartbeat::ocf:drbd): Slave NodeB
pengine[14827]: 2007/11/02_17:35:20 info: native_print: drbd0:1
(heartbeat::ocf:drbd): Master NodeA
pengine[14827]: 2007/11/02_17:35:20 info: native_print: fs0
(heartbeat::ocf:Filesystem): Started NodeA
pengine[14827]: 2007/11/02_17:35:20 info: master_color: Promoting drbd0:1
pengine[14827]: 2007/11/02_17:35:20 info: master_color: Promoted 1
instances of a possible 1 to master
pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource fs0
cannot run anywhere
pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource httpd
cannot run anywhere
pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource tomcat
cannot run anywhere
This is confusing to me, as it claims to be "promoting" drbd0:1 which is
already running the "Master" resource. In this instance I would have
expected it to promote drbd0:0, which would have allowed all of my
resources to run on NodeB.
I believe this means I either have a constraint incorrectly defined, or
I'm missing a needed constraint. If anybody can point me in the right
direction, I'd appreciate it - the constraints in my cib.xml file are:
<constraints>
<rsc_order id="drbd0_before_fs0" from="fs0" action="start"
to="ms-drbd0" to_action="promote"/>
<rsc_order id="fs0_before_resgroup0" from="resgroup0"
action="start" to="fs0" type="after"/>
<rsc_colocation id="fs0_on_drbd0" to="ms-drbd0" to_role="master"
from="fs0" score="infinity"/>
<rsc_colocation score="infinity" id="resgroup0_on_fs0" to="fs0"
from="resgroup0"/>
</constraints>
Thanks,
Jeremy Thornhill
jeremy.thornhill at duke.edu
More information about the Linux-HA
mailing list