[Linux-HA] pingd in V2
Andrew Beekhof
beekhof at gmail.com
Wed Jun 6 02:04:21 MDT 2007
On 6/5/07, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> Ok. rsc_order from a resource to pingd and timeout in the pingd RA
> fixed the problem. Here is a pacth for pingd.in if you want to apply
> it:
it assumes "dampen" is always in seconds
>
> --- resources/OCF/pingd.in.distr 2007-06-05 09:38:31.000000000 -0600
> +++ resources/OCF/pingd.in 2007-06-05 09:39:16.000000000 -0600
> @@ -161,6 +161,8 @@
>
> rc=$?
> if [ $rc = 0 ]; then
> + #Give it some time to populate scores.
> + sleep `expr ${OCF_RESKEY_dampen%%s} + 5`
> exit $OCF_SUCCESS
> fi
>
>
> On 6/5/07, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> > On 6/5/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> > > On 6/5/07, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> > > > Hello -
> > > >
> > > > I played with pingd in v2 heartbeat and found some problems (or
> > > > inconvenience) there:
> > > >
> > > > My configuration includes a group of resources and a rsc_location rule
> > > > for a primary node. If I configure pingd in the ha.cf and add
> > > > rsc_location rule with score -INFINITY for pingd attribute not define
> > > > or less or equal then 0 everything works like it should. My group
> > > > starts on a primary node and fails over to backup node if primary
> > > > looses its network connection.
> > > >
> > > > Problems start when I move pingd from ha.cf to cib.xml and configure a
> > > > clone for it there. It looks like (I'm not absolutely sure in that)
> > > > that when pingd starts up it doesn't have enough time to update CIB
> > > > before Heartbeat starts other resources.
> > >
> > > do you have ordering constraints between the pingd resource and the
> > > other resources?
> >
> > Putting ordering constraints didn't help. Probably constraints and
> > timeout in RA would help. I'm going to test it.
> >
> > >
> > > > Because of that Heartbeat
> > > > complains that there is no nodes available for resources or that
> > > > resources can't run on any node in the cluster.
> > >
> > > presumably because there are no pingd scores yet - thats perfectly normal so far
> >
> > Absolutely true.
> >
> > >
> > > > With the second check
> > > > heartbeat sees nodes available but at this time there is no guarantee
> > > > that resources will be started on a desired primary node.
> > >
> > > this bit i'm not sure i understand
> >
> > Ok, here I tried to explain that after pingd score have been populated
> > the other rsc_location rule (that defines primary box) gets ignored.
> > That probably because pingd score for a secondary box get populated a
> > bit earlier then for a primary.
> >
> > >
> > > do you mean the pingd scores haven't stabilized?
> > > or that they're equal and you can't make the resource start on a
> > > particular node?
> >
> > They stabilized but as I said I can't make a resource to guarantee to
> > start on the primary node. Some times it starts on primary, sometimes
> > on a backup node.
> >
> > >
> > > >
> > > > I hope that I explained the problem correctly. The possible fix could
> > > > be implementing a short timeout (OCF_RESKEY_dampen + 3s for example)
> > > > in the start function of pingd RA.
> > > >
> > > > There were also some mistakes in the v2/faq/pingd document that I
> > > > corrected in wiki.linux-ha.org
> > >
> > > thanks!
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list