[Linux-HA] Re: pingd in V2
Serge Dubrouski
sergeyfd at gmail.com
Mon Jun 4 16:20:03 MDT 2007
Implementing timeout in the pingd actually RA didn't help.
On 6/4/07, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> Hello -
>
> I played with pingd in v2 heartbeat and found some problems (or
> inconvenience) there:
>
> My configuration includes a group of resources and a rsc_location rule
> for a primary node. If I configure pingd in the ha.cf and add
> rsc_location rule with score -INFINITY for pingd attribute not define
> or less or equal then 0 everything works like it should. My group
> starts on a primary node and fails over to backup node if primary
> looses its network connection.
>
> Problems start when I move pingd from ha.cf to cib.xml and configure a
> clone for it there. It looks like (I'm not absolutely sure in that)
> that when pingd starts up it doesn't have enough time to update CIB
> before Heartbeat starts other resources. Because of that Heartbeat
> complains that there is no nodes available for resources or that
> resources can't run on any node in the cluster. With the second check
> heartbeat sees nodes available but at this time there is no guarantee
> that resources will be started on a desired primary node.
>
> I hope that I explained the problem correctly. The possible fix could
> be implementing a short timeout (OCF_RESKEY_dampen + 3s for example)
> in the start function of pingd RA.
>
> There were also some mistakes in the v2/faq/pingd document that I
> corrected in wiki.linux-ha.org
>
> Serge.
>
More information about the Linux-HA
mailing list