[Linux-HA] Heartbeat setup questions

Serge Dubrouski sergeyfd at gmail.com
Fri Oct 26 12:02:30 MDT 2007

Hi Artem -

It looks like you selected wrong (or not exactly right) product for
you project. Heartbeat (Linux-HA) cluster was designed to provide High
Availability features for applications, not for  distributed
computations. In HA cluster in most cases a node that loses connection
to the cluster  hast to be killed (see STONITH feature) to prevent
data corruption on shared devices in a split-brain situation.

Take a look at this product: http://oscar.openclustergroup.org may be
it fits better for your needs.

On 10/26/07, Artem Pervin <ArtemPervin at botik.ru> wrote:
> Hello, all,
> I'm new to the heartbeat project and I'm experiencing some problems
> in setting things up. I will be very grateful for any help.
> I want to use the heartbeat to detect a disconnection of a node in a
> computational
> cluster. As well, I need to detect when the node is back on the
> network. On both events a custom script should be started either on the
> head node
> or both on the head node and on the computational node.
> I've managed to setup the heartbeat on two nodes (A and B) and I can
> observe the nodes status with the help of
> crm_mon. When I simulate the lost of connectivity on the node B (using
> ifdown) after some time I can see the
> changing in the status of this node: on the node A
> crm_mon reports that B is "OFFLINE" and A - "online", and on the node B
> crm_mon reports that the A is "OFFLINE" and
> B - "online". In this case I simply cannot determine from crm_mon data
> (or cib.xml) which node is actually down and
> where should I shutdown my processes.
> Moreover when I turn on the network interface on the node B back the
> status of the node doesn't change at all. Only
> when I restart the heartbeat service either on the node B or on the node
> A, the status of the node B returns to
> "online". That's pretty odd behavior, I think.
> My questions are the following:
> 1) Is the heartbeat an appropriate solution for my task or should I use
> something else?
> 2) If the heartbeat is fine, then what am I doing wrong? Why the status
> of the node B doesn't return to "online"
> state when turn on the network interface?
> 3) Is there any API to check the status of the node? Parsing the cib.xml
> is not very convenient.
> Here's my configuration:
> CentOS, kernel 2.6.18, x86
> heartbeat version is 2.0.1 installed as rpm package
> ha.cf:
> ----------------------------------------------------------------------------------
> use_logd yes
> bcast eth0
> node A B
> crm on
> auto_failback on
> ---------------------------------------------------------------------------------
> authkeys
> ---------------------------------------------------------------------------------
> auth 1
> 1 sha1 helloworld
> ---------------------------------------------------------------------------------
> logd.cf
> ---------------------------------------------------------------------------------
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility     daemon
> ---------------------------------------------------------------------------------
> Thank you for your help.
> --
> Artem Pervin
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

Serge Dubrouski.

More information about the Linux-HA mailing list