[Linux-HA] Resource Agent script passing info back
toHeartbeat question.
Andrew Beekhof
beekhof at gmail.com
Thu Oct 12 23:57:42 MDT 2006
On 10/13/06, Peter Wong <peter.wong at mobidia.com> wrote:
> Greetings:
>
> Thanks for the explanation.
>
> I have the following scenario:
> ---
> If the resource/service fails consecutively for a number
> of times (say, N times) on the active node then fail
> over the resource/service from the local node to the
> standby node.
> ---
>
> The question is then:
> What "*stickiness" attributes do I need to look at and
> what values should I set them to in order to have this
> scenario happening.
failure
>
> Regards!
>
> Peter.
>
> > -----Original Message-----
> > From: linux-ha-bounces at lists.linux-ha.org
> > [mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of
> > Andrew Beekhof
> > Sent: Wednesday, October 11, 2006 12:41 AM
> > To: General Linux-HA mailing list
> > Subject: Re: Re: [Linux-HA] Resource Agent script passing
> > info back toHeartbeat question.
> >
> > On 10/11/06, Max Hofer <max.hofer at apus.co.at> wrote:
> > > I'm assuming you use heartbeat V2 where with the CRM (if
> > not your are out of
> > > luck doing the things you want).
> > >
> > > The RAs are returing their well defined return codes to the
> > monitoring daemon
> > > (CRM). It is the configuration of the CRM, i.e. the CIB
> > file which has to
> > > define what the CRM should do when something sepcial
> > happenend (ie. resource
> > > failure, some attribites changed etc.).
> > >
> > > This is the solution i found for the problem you described
> > (maybe there is a
> > > better way, please correct me).
> >
> > A very good explanation - thankyou :-)
> >
> > A couple of minor points...
> > If resource_failure_stickiness == 0, then the value of
> > fail_count is ignored.
> > In such cases, if resource_stickiness is > 0, then the most likely
> > action after a failed monitor action is a restart on the same node.
> >
> > > When you want the resource to restart after a failure
> > configure the CIB in the
> > > following way:
> > > * make sure the resource runs only on this node and never
> > on another node
> > > * make sure the resource does not get a negative stickiness
> > when it fails
> > > (resouce_failure_stickiness)
> > > * make sure the resource is restarted after a monitoring failure
> > > (on_fail="restart" for the monitoring operation).
> > >
> > > Example: a resource called "dummy" should run on node
> > "paul" (excerpt from
> > > CIB.xml)
> > >
> > > <resources>
> > > <primitive class="ocf" id="dummy_resource" provider="heartbeat"
> > > type="Dummy" resource_failure_stickiness="0">
> > > <instance_attributes>
> > > <!-- this attribute is set because the Dummy resource
> > would use a
> > > default value of 10 seconds which is anoying for tests
> > > -->
> > > <attributes>
> > > <nvpair id="startup_time" name="start_delay" value="2"/>
> > > </attributes>
> > > </instance_attributes>
> > > <operations>
> > > <!-- Attention: a restart would usually restart a
> > resource to a node
> > > where the faicount of this resource is near
> > 0. Thus make sure
> > > resource runs only on this node. See constraints.
> > > -->
> > > <op id="dummy_monitor" interval="8s" name="monitor"
> > timeout="15s"
> > > on_fail="restart"/>
> > > </operations>
> > > </primitive>
> > > </resources>
> > >
> > > <constraints>
> > > <!-- first make sure dummy runs only on paul so a restart
> > does not move the
> > > resource somewhere else -->
> > > <rsc_location id="dummy_only_on_paul" rsc="dummy">
> > > <rule score="-INFINITY">
> > > <expression attribute="#uname" operation="ne"
> > value="management2"/>
> > > </rule>
> > > <rule score="INFINITY">
> > > <expression attribute="#uname" operation="eq"
> > value="management2"/>
> > > </rule>
> > > </rsc_location>
> > > </constraints>
> > >
> > > A second way i could thinkg of restarting a resource is:
> > > * run a cronjob which periodically which runs a script
> > which checks if the
> > > resource runs and if it is not running resets the failcount
> > to 0 (which
> > > should trigger a resource start).
> > >
> > > kind regards
> > > Max
> > >
> > >
> > > On Tuesday 10 October 2006 21:20, Peter Wong wrote:
> > > > Greetings:
> > > >
> > > > Is there a way for the Resource Agent to return some
> > > > parameters/exit-code back to the Heartbeat monitoring
> > > > daemon during the "status" subcommand to tell Heartbeat
> > > > to either restart the Resource on the local node or
> > > > do a failover from the local node to a standby node?
> > > >
> > > > The reason I need to do this is that in some case I
> > > > just want to restart the resource on the local node
> > > > because the situation is not severe enough to go to
> > > > a standby node.
> > > >
> > > > Thanks in advance for any help!
> > > >
> > > > Peter.
> > > >
> > > > _______________________________________________
> > > > Linux-HA mailing list
> > > > Linux-HA at lists.linux-ha.org
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > >
> > > --
> > > Max Hofer
> > > APUS Software G.m.b.H.
> > > A-8074 Raaba, Bahnhofstraße 1/1
> > > T| +43 316 401629 11
> > > F| +43 316 401629 9
> > > W| www.apus.co.at
> > > E| max.hofer at apus.co.at
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list