[Linux-HA] pingd in V2
Serge Dubrouski
sergeyfd at gmail.com
Wed Jun 6 09:00:32 MDT 2007
Logs and cib.xml are attached. As you can see per configuration
TestGroup has to start on a node called goodman (and it does if I
configure pingd in ha.cf), but it starts on miller.
On 6/6/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> On 6/5/07, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> > On 6/5/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> > > On 6/5/07, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> > > > Hello -
> > > >
> > > > I played with pingd in v2 heartbeat and found some problems (or
> > > > inconvenience) there:
> > > >
> > > > My configuration includes a group of resources and a rsc_location rule
> > > > for a primary node. If I configure pingd in the ha.cf and add
> > > > rsc_location rule with score -INFINITY for pingd attribute not define
> > > > or less or equal then 0 everything works like it should. My group
> > > > starts on a primary node and fails over to backup node if primary
> > > > looses its network connection.
> > > >
> > > > Problems start when I move pingd from ha.cf to cib.xml and configure a
> > > > clone for it there. It looks like (I'm not absolutely sure in that)
> > > > that when pingd starts up it doesn't have enough time to update CIB
> > > > before Heartbeat starts other resources.
> > >
> > > do you have ordering constraints between the pingd resource and the
> > > other resources?
> >
> > Putting ordering constraints didn't help. Probably constraints and
> > timeout in RA would help. I'm going to test it.
> >
> > >
> > > > Because of that Heartbeat
> > > > complains that there is no nodes available for resources or that
> > > > resources can't run on any node in the cluster.
> > >
> > > presumably because there are no pingd scores yet - thats perfectly normal so far
> >
> > Absolutely true.
> >
> > >
> > > > With the second check
> > > > heartbeat sees nodes available but at this time there is no guarantee
> > > > that resources will be started on a desired primary node.
> > >
> > > this bit i'm not sure i understand
> >
> > Ok, here I tried to explain that after pingd score have been populated
> > the other rsc_location rule (that defines primary box) gets ignored.
>
> ignored? no way.
>
> > That probably because pingd score for a secondary box get populated a
> > bit earlier then for a primary.
>
> do you have logs showing this? any delay should be extremely negligible.
>
> > >
> > > do you mean the pingd scores haven't stabilized?
> > > or that they're equal and you can't make the resource start on a
> > > particular node?
> >
> > They stabilized but as I said I can't make a resource to guarantee to
> > start on the primary node. Some times it starts on primary, sometimes
> > on a backup node.
> >
> > >
> > > >
> > > > I hope that I explained the problem correctly. The possible fix could
> > > > be implementing a short timeout (OCF_RESKEY_dampen + 3s for example)
> > > > in the start function of pingd RA.
> > > >
> > > > There were also some mistakes in the v2/faq/pingd document that I
> > > > corrected in wiki.linux-ha.org
> > >
> > > thanks!
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list