[Linux-HA] Ipfail support for heartbeat 2.0.x

Alan Robertson alanr at unix.sh
Tue Oct 18 13:41:57 MDT 2005

Guochun Shi wrote:
> At 09:14 PM 10/18/2005 +0200, you wrote:
>>On 2005-10-18T20:50:12, Tim Verhoeven <tim.verhoeven.be at gmail.com> wrote:
>>>Will ipfail ever get support in 2.0.x / CRM style resource management ?
>>Yes, we'll eventually implement similar functionality in 2.0.x.
>>A cheap-skate version would be quite easily implemented: just feed the
>>number of nodes any node can ping into the CIB as a node attribute and
>>put dependencies on that.
>>Or use a resource agent which just checks the ping status on "monitor"
>>and have it fail if not; then the resources would migrate too.
>>However, this isn't good enough, and the reason is subtle and seems to
>>escape most people most of the time ;-)
>>The problem here is that this will cause pointless resource bouncing if
>>the ping node is actually having the problem, or if the error affects
>>all nodes (or even just a subset); in that case, bouncing the resource
>>around is totally pointless and causes actual harm - because we might
>>bounce it to a node which just hasn't yet noticed it has the same
> I think the cheap version is good.
> If  ping nodes are unstable, then we cannot do anything about it but moving resources 
> accordingly. As for normal cases whether a ping node dies, we can avoid moving
> a resource if ipfial delays reporting the node by one heartbeat interval -- at that time all nodes should
> all notices the ping node is dead.
>>So, the node attribute needs to be coordinated, dampened and hysteresis
>>etc be implemented. Preferrably be a small external daemon or something
>>which provides this feature generically, so we can also use it for, say,
>>connectivity to storage too...

I had a slightly different thought...

If we did this based on variables, then it wouldn't be _too_ hard to 
make variable setting from external programs to be hysteresis damped...

In other words, wait until time 't' has elapsed before calling the PE 
for a change in a variable.  't' could presumably be somehow specified 
in the CIB.

This would make it universally available, and since the DC only runs on 
one machine, the hysteresis damping would be easier.

     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 

More information about the Linux-HA mailing list