[Linux-HA] Automatically failing over resources that depend on a master

Andrew Beekhof beekhof at gmail.com
Mon Nov 5 10:42:08 MST 2007


I believe you'll find this works as expected if you use the latest Interim build
   http://software.opensuse.org/download/server:/ha-clustering

(no, they're not just for openSUSE users)

On 11/5/07, Jeremy N Thornhill <jeremy.thornhill at duke.edu> wrote:
> First, thanks for the help on my earlier master/slave issue.  I'm now
> trying to get automatic fail over working, and have encountered an issue
> with services that must run on a "master" of a master/slave set.
>
> In my setup, I've got a resource group (resgroup0 - composed of httpd and
> tomcat) that must run on a DRBD master.  I've got a filesystem (fs0)
> constrained to the master instance of a DRBD master/slave set (ms-drbd0),
> and I've got resgroup0 constrained to fs0.  I've also got order
> constraints so that things start up properly. My setup is largely based on
> the following docs:
>
> http://linux-ha.org/v2/faq/forced_failover
> http://www.linux-ha.org/DRBD/HowTov2
>
> In testing, this looks like:
>
> Resource Group: resgroup0
>     httpd   (heartbeat::ocf:apache):        Started NodeA
>     tomcat     (heartbeat::ocf:tomcat):        Started NodeA
> Master/Slave Set: ms-drbd0
>     drbd0:0     (heartbeat::ocf:drbd):  Started NodeB
>     drbd0:1     (heartbeat::ocf:drbd):  Master NodeA
> fs0     (heartbeat::ocf:Filesystem):    Started NodeA
>
> If NodeA "goes away" (shut down, heartbeat killed, cable unplugged, etc)
> everything happily moves over to NodeB as I'd expect.
>
> However, this does not happen in response to service failures in
> resgroup0.  I've got monitoring events defined that are successfully
> restarting services as they fail - however, if I break a service such that
> it should fail over (as defined by "resource stickiness" and "resource
> failure stickiness"), resgroup0 and fs0 BOTH stop, logging the following:
>
> pengine[14827]: 2007/11/02_17:35:20 info: unpack_find_resource: Internally
> renamed drbd0:0 on NodeA to drbd0:1
> pengine[14827]: 2007/11/02_17:35:20 info: group_print: Resource Group:
> resgroup0
> pengine[14827]: 2007/11/02_17:35:20 info: native_print:     httpd
> (heartbeat::ocf:apache):        Started NodeA FAILED
> pengine[14827]: 2007/11/02_17:35:20 info: native_print:     tomcat
> (heartbeat::ocf:tomcat):        Started NodeA
> pengine[14827]: 2007/11/02_17:35:20 info: clone_print: Master/Slave Set:
> ms-drbd0
> pengine[14827]: 2007/11/02_17:35:20 info: native_print:     drbd0:0
> (heartbeat::ocf:drbd):  Slave NodeB
> pengine[14827]: 2007/11/02_17:35:20 info: native_print:     drbd0:1
> (heartbeat::ocf:drbd):  Master NodeA
> pengine[14827]: 2007/11/02_17:35:20 info: native_print: fs0
> (heartbeat::ocf:Filesystem):    Started NodeA
> pengine[14827]: 2007/11/02_17:35:20 info: master_color: Promoting drbd0:1
> pengine[14827]: 2007/11/02_17:35:20 info: master_color: Promoted 1
> instances of a possible 1 to master
> pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource fs0
> cannot run anywhere
> pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource httpd
> cannot run anywhere
> pengine[14827]: 2007/11/02_17:35:20 WARN: native_color: Resource tomcat
> cannot run anywhere
>
> This is confusing to me, as it claims to be "promoting" drbd0:1 which is
> already running the "Master" resource.  In this instance I would have
> expected it to promote drbd0:0, which would have allowed all of my
> resources to run on NodeB.
>
> I believe this means I either have a constraint incorrectly defined, or
> I'm missing a needed constraint.  If anybody can point me in the right
> direction, I'd appreciate it - the constraints in my cib.xml file are:
>
>      <constraints>
>        <rsc_order id="drbd0_before_fs0" from="fs0" action="start"
> to="ms-drbd0" to_action="promote"/>
>        <rsc_order id="fs0_before_resgroup0" from="resgroup0"
> action="start" to="fs0" type="after"/>
>        <rsc_colocation id="fs0_on_drbd0" to="ms-drbd0" to_role="master"
> from="fs0" score="infinity"/>
>        <rsc_colocation score="infinity" id="resgroup0_on_fs0" to="fs0"
> from="resgroup0"/>
>      </constraints>
>
> Thanks,
> Jeremy Thornhill
> jeremy.thornhill at duke.edu
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list