[Linux-HA] Ipfail support for heartbeat 2.0.x
lmb at suse.de
Tue Oct 18 13:14:28 MDT 2005
On 2005-10-18T20:50:12, Tim Verhoeven <tim.verhoeven.be at gmail.com> wrote:
> Will ipfail ever get support in 2.0.x / CRM style resource management ?
Yes, we'll eventually implement similar functionality in 2.0.x.
A cheap-skate version would be quite easily implemented: just feed the
number of nodes any node can ping into the CIB as a node attribute and
put dependencies on that.
Or use a resource agent which just checks the ping status on "monitor"
and have it fail if not; then the resources would migrate too.
However, this isn't good enough, and the reason is subtle and seems to
escape most people most of the time ;-)
The problem here is that this will cause pointless resource bouncing if
the ping node is actually having the problem, or if the error affects
all nodes (or even just a subset); in that case, bouncing the resource
around is totally pointless and causes actual harm - because we might
bounce it to a node which just hasn't yet noticed it has the same
So, the node attribute needs to be coordinated, dampened and hysteresis
etc be implemented. Preferrably be a small external daemon or something
which provides this feature generically, so we can also use it for, say,
connectivity to storage too...
Any takers? ;-)
Lars Marowsky-Brée <lmb at suse.de>
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"
More information about the Linux-HA