[Linux-HA] Ipfail support for heartbeat 2.0.x

Guochun Shi gshi at ncsa.uiuc.edu
Tue Oct 18 14:44:02 MDT 2005


At 09:58 PM 10/18/2005 +0200, you wrote:
>On 2005-10-18T14:33:06, Guochun Shi <gshi at ncsa.uiuc.edu> wrote:
>
>> I think the cheap version is good.
>> If  ping nodes are unstable, then we cannot do anything about it but
>> moving resources accordingly.
>
>Note that this would be a feature regression; ipfail does not do this,
>it does "vote" on which side can see the most nodes.
>
>Customers will hate you if you bounce resources around like that and
>cause more observed downtime (due to the unneeded migrations) then
>necessary.
>
>> As for normal cases whether a ping node dies, we can avoid moving a
>> resource if ipfial delays reporting the node by one heartbeat interval
>> -- at that time all nodes should all notices the ping node is dead.
>
>This doesn't work.
>0,      heartbeat interval
>1,0     heartbeat interval
>1,9     node1 notices that ping node is dead, delays by one interval
>2,0     heartbeat interval
>2,1     node2 notices, delays by one interval
>2,9     node1 reports
>...
>
>Remember they are not running in lockstep.

OK,  I meant to say whoever made the decision (PE?) to move resources around
delays certain time before it ask for alive ping nodes from other cluster nodes.

Actually we may not need the program ipfail at all. Ipfail in v2 does not make any
decision as it does in V1, therefore it only tells # of alive ping nodes 

1. crm can get # of  alive ping nodes from heartbeat (heartbeat needs to provide such API)
2. get # of alive ping nodes for other nodes by sending out a request message
 (crm's task)

then crm can make decisions. 

-Guochun




More information about the Linux-HA mailing list