[Linux-HA] Automatically failing over resources that depend on a
master
Andrew Beekhof
beekhof at gmail.com
Mon Nov 5 10:42:08 MST 2007
I believe you'll find this works as expected if you use the latest Interim build
http://software.opensuse.org/download/server:/ha-clustering
(no, they're not just for openSUSE users)
On 11/5/07, Jeremy N Thornhill <jeremy.thornhill at duke.edu> wrote:
> First, thanks for the help on my earlier master/slave issue. I'm now
> trying to get automatic fail over working, and have encountered an issue
> with services that must run on a "master" of a master/slave set.
>
> In my setup, I've got a resource group (resgroup0 - composed of httpd and
> tomcat) that must run on a DRBD master. I've got a filesystem (fs0)
> constrained to the master instance of a DRBD master/slave set (ms-drbd0),
> and I've got resgroup0 constrained to fs0. I've also got order
> constraints so that things start up properly. My setup is largely based on
> the following docs:
>
> http://linux-ha.org/v2/faq/forced_failover
> http://www.linux-ha.org/DRBD/HowTov2
>
> In testing, this looks like:
>
> Resource Group: resgroup0
> httpd (heartbeat::ocf:apache): Started NodeA
> tomcat (heartbeat::ocf:tomcat): Started NodeA
> Master/Slave Set: ms-drbd0
> drbd0:0 (heartbeat::ocf:drbd): Started NodeB
> drbd0:1 (heartbeat::ocf:drbd): Master NodeA
> fs0 (heartbeat::ocf:Filesystem): Started NodeA
>
> If NodeA "goes away" (shut down, heartbeat killed, cable unplugged, etc)
> everything happily moves over to NodeB as I'd expect.
>
> However, this does not happen in response to service failures in
> resgroup0. I've got monitoring events defined that are successfully
> restarting services as they fail - however, if I break a service such that
> it should fail over (as defined by "resource stickiness" and "resource
> failure stickiness"), resgroup0 and fs0 BOTH stop, logging the following:
>
> pengine[14827]: 2007/11/02_17:35:20 info: unpack_find_resource: Internally
> renamed drbd0:0 on NodeA to drbd0:1
> pengine[14827]: 2007/11/02_17:35:20 info: group_print: Resource Group:
> resgroup0
> pengine[14827]: 2007/11/02_17:35:20 info: native_print: httpd
> (heartbeat::ocf:apache): Started NodeA FAILED
> pengine[14827]: 2007/11/02_17:35:20 info: native_print: tomcat
> (heartbeat::ocf:tomcat): Started NodeA
> pengine[14827]: 2007/11/02_17:35:20 info: clone_print: Master/Slave Set:
> ms-drbd0
> pengine[14827]: 2007/11/02_17:35:20 info: native_print: drbd0:0
> (heartbeat::ocf:drbd): Slave NodeB
> pengine[14827]: 2007/11/02_17:35:20 info: native_print: drbd0:1
> (heartbeat::ocf:drbd): Master NodeA
> pengine[14827]: 2007/11/02_17:35:20 info: native_print: fs0
> (heartbeat::ocf:Filesystem): Started NodeA
> pengine[14827]: 2007/11/02_17:35:20 info: master_color: Promoting drbd0:1
> pengine[14827]: 2007/11/02_17:35:20 info: master_color: Promoted 1
> instances of a possible 1 to master
> pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource fs0
> cannot run anywhere
> pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource httpd
> cannot run anywhere
> pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource tomcat
> cannot run anywhere
>
> This is confusing to me, as it claims to be "promoting" drbd0:1 which is
> already running the "Master" resource. In this instance I would have
> expected it to promote drbd0:0, which would have allowed all of my
> resources to run on NodeB.
>
> I believe this means I either have a constraint incorrectly defined, or
> I'm missing a needed constraint. If anybody can point me in the right
> direction, I'd appreciate it - the constraints in my cib.xml file are:
>
> <constraints>
> <rsc_order id="drbd0_before_fs0" from="fs0" action="start"
> to="ms-drbd0" to_action="promote"/>
> <rsc_order id="fs0_before_resgroup0" from="resgroup0"
> action="start" to="fs0" type="after"/>
> <rsc_colocation id="fs0_on_drbd0" to="ms-drbd0" to_role="master"
> from="fs0" score="infinity"/>
> <rsc_colocation score="infinity" id="resgroup0_on_fs0" to="fs0"
> from="resgroup0"/>
> </constraints>
>
> Thanks,
> Jeremy Thornhill
> jeremy.thornhill at duke.edu
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list