[Linux-HA] Ipfail support for heartbeat 2.0.x

Alan Robertson alanr at unix.sh
Tue Oct 18 13:41:57 MDT 2005


Guochun Shi wrote:
> At 09:14 PM 10/18/2005 +0200, you wrote:
>>On 2005-10-18T20:50:12, Tim Verhoeven <tim.verhoeven.be at gmail.com> wrote:
>>
>>>Will ipfail ever get support in 2.0.x / CRM style resource management ?
>>Yes, we'll eventually implement similar functionality in 2.0.x.
>>
>>A cheap-skate version would be quite easily implemented: just feed the
>>number of nodes any node can ping into the CIB as a node attribute and
>>put dependencies on that.
>>
>>Or use a resource agent which just checks the ping status on "monitor"
>>and have it fail if not; then the resources would migrate too.
>>
>>However, this isn't good enough, and the reason is subtle and seems to
>>escape most people most of the time ;-)
>>
>>The problem here is that this will cause pointless resource bouncing if
>>the ping node is actually having the problem, or if the error affects
>>all nodes (or even just a subset); in that case, bouncing the resource
>>around is totally pointless and causes actual harm - because we might
>>bounce it to a node which just hasn't yet noticed it has the same
>>error.
> 
> I think the cheap version is good.
> If  ping nodes are unstable, then we cannot do anything about it but moving resources 
> accordingly. As for normal cases whether a ping node dies, we can avoid moving
> a resource if ipfial delays reporting the node by one heartbeat interval -- at that time all nodes should
> all notices the ping node is dead.
> 
> 
>>So, the node attribute needs to be coordinated, dampened and hysteresis
>>etc be implemented. Preferrably be a small external daemon or something
>>which provides this feature generically, so we can also use it for, say,
>>connectivity to storage too...

I had a slightly different thought...

If we did this based on variables, then it wouldn't be _too_ hard to 
make variable setting from external programs to be hysteresis damped...

In other words, wait until time 't' has elapsed before calling the PE 
for a change in a variable.  't' could presumably be somehow specified 
in the CIB.

This would make it universally available, and since the DC only runs on 
one machine, the hysteresis damping would be easier.

-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce



More information about the Linux-HA mailing list