[Linux-HA] Ping node disconnection triggers stopping of resources
Andrew Beekhof
beekhof at gmail.com
Wed Feb 14 08:56:23 MST 2007
On 2/14/07, Pavol Gono <palo.gono at gmail.com> wrote:
> Revision 10161 from http://hg.linux-ha.org/dev/
damn.
i'll look into this
>
> On 2/14/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> > what version are you running? there was a well known bug in .7 that did
> > this.
> >
> > On 2/14/07, Pavol Gono <palo.gono at gmail.com> wrote:
> > > Hi
> > >
> > > When using
> > > respawn root /usr/local/lib/heartbeat/pingd -m 10 -d 5s
> > > in my ha.cf, and this constraint:
> > > <rsc_location id="x_location_pingd" rsc="x_Dummy">
> > > <rule id="x_rule_pingd" score_attribute="pingd">
> > > <expression id="x_expr_pingd" attribute="pingd" operation="defined"/>
> > > </rule>
> > > </rsc_location>
> > >
> > > I expected the following happens:
> > > - let's destroy network connection between a ping node and some of cluster
> > nodes
> > > - after 5 seconds, each cluster node shall know exactly if it has
> > > connection to the ping node
> > > - cluster nodes with no connection shall decrease score of x_Dummy by 10
> > > - now the decision about stopping / starting resources shall appear
> > >
> > > But I observerd that decision about stopping resources comes sooner
> > > than score changes for all cluster nodes.
> > > The result: after disconnecting ping node from two-node
> > > cluster,resources are stopped temporarily (sometimes).
> > >
> > > Is this known and expected behaviour?
> > >
> > >
> > > Test environment:
> > > HB revision 10161 with small patch of Dummy (see attachment)
> > > cluster nodeA - sk16251c (DC)
> > > cluster nodeB - linux-sles1
> > > pingnodes - 10.0.0.8, 10.0.0.9
> > >
> > > In the testscript I emulate disconnection of the ping node from both
> > > cluster nodes. Since I use the iptables on the cluster nodes, there
> > > are two possibilities:
> > > - disconnect nodeA, then nodeB
> > > - disconnect nodeB, then nodeA
> > >
> > > Test script puts group of characters (such MMMM, NNNN, OOOO, PPPP)
> > > after each test script step, so that I can identify relevant log
> > > messages. I run the test script on the DC machine (cluster nodeA,
> > > sk16251c). I use parsing script to filter relevant messages from the
> > > DC's log.
> > >
> > > >From the filtered output I recognized following behaviour:
> > >
> > > In case the first disconnect possibility was performed:
> > > disconnect nodeA - sk16251c (DC)
> > > disconnect nodeB - linux-sles1
> > >
> > > After this operation, scoring evaluated nodeA with score less by 10
> > > than on the other node. However, iptables disconnected them almost at
> > > the same moment. This results in STOP request on this node. But, the
> > > very next scoring evaluation has updated info from the nodeB and is
> > > therefore comparing equal scores (resources stopped, one pingnode
> > > connected) and chooses the DC node to run resources, so they are
> > > restarted.
> > >
> > > In case the second disconnect possibility was performed:
> > > disconnect nodeB - linux-sles1
> > > disconnect nodeA - sk16251c (DC)
> > >
> > > Now there are two results observed, depending on the moment in which
> > > the message from the nodeB is processed. If it is received before the
> > > first score evaluation, both nodes are scored as they have only one
> > > ping node connection alive, and nothing happens, resources remain on
> > > the DC node.
> > > But if the message from the peer node is processed after the first
> > > scoring evaluation, then this leads to the same behaviour (STOP, and
> > > then after next evaluation START on the same node) as described above
> > > in section.
> > >
> > > Palo
> > >
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> > >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list