[Linux-HA] Ipfail support for heartbeat 2.0.x

Lars Marowsky-Bree lmb at suse.de
Tue Oct 18 13:14:28 MDT 2005

On 2005-10-18T20:50:12, Tim Verhoeven <tim.verhoeven.be at gmail.com> wrote:

> Will ipfail ever get support in 2.0.x / CRM style resource management ?

Yes, we'll eventually implement similar functionality in 2.0.x.

A cheap-skate version would be quite easily implemented: just feed the
number of nodes any node can ping into the CIB as a node attribute and
put dependencies on that.

Or use a resource agent which just checks the ping status on "monitor"
and have it fail if not; then the resources would migrate too.

However, this isn't good enough, and the reason is subtle and seems to
escape most people most of the time ;-)

The problem here is that this will cause pointless resource bouncing if
the ping node is actually having the problem, or if the error affects
all nodes (or even just a subset); in that case, bouncing the resource
around is totally pointless and causes actual harm - because we might
bounce it to a node which just hasn't yet noticed it has the same

So, the node attribute needs to be coordinated, dampened and hysteresis
etc be implemented. Preferrably be a small external daemon or something
which provides this feature generically, so we can also use it for, say,
connectivity to storage too...

Any takers? ;-)

    Lars Marowsky-Brée <lmb at suse.de>

High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

