[Linux-HA] dedicated heartbeat link or not?

Alt, Martin Martin.Alt at plath.de
Wed Nov 28 10:24:34 MST 2007


Hello all,

I am planning to use heartbeat (version 2) to increase the availability of some services running in a cluster and I am currently stuck with two questions (one of which is rather general and not directly related to heartbeat - but I hope someone can help me anyway ;-)).

The setup is as follows: the cluster contains 10 nodes, each running its own set of services, which communicate with each other using sockets. When one node fails, the services should be restarted on another node. The system is connected to a few client machines (3 or 4) but NOT connected to any public networks. I have already installed and configured heartbeat on a test system and everything is running quite fine so far. 

However, we are now thinking about what hardware to buy for the production system and my basic question is, whether it makes sense to use dedicated heartbeat links, over which the heartbeat processes running on the cluster nodes can communicate, or not. 

Let's assume I have two networks that each cluster node is connected to. I can either combine the two networks and use both redundantly for "normal" data and heartbeats, or I could use one network for normal data and one for heartbeats.

If I use both networks for ALL traffic (i.e. normal and heartbeat traffic) and one network (or one NIC in a node) fails, the traffic is routed through the other one. This has the advantage that application data and heartbeats get through to a node as long as it is connected to at least one of the networks. The only problem with this configuration that I see is that heartbeat packets could get lost if the network is overloaded. However, the system is not connected to any public networks, and the network load is quite foreseeable. Therefore, I do not think packets will get lost due to network saturation (besides I have tried to saturate the network of my testcluster by copying large files between all nodes using scp, and there were no heartbeats lost). 

Are there any other problems with this configuration??? Does someone have any experience with using a single (redundant) network for heartbeats and data?

The alternative would be to use one network for "normal" traffic, and the other one for "heartbeat" traffic only. This has the advantage that the heartbeat program has its own dedicated network, which it can use to communicate with the other nodes. However, if a node's heartbeat network connection fails, then that node will be seen as dead, even though its "normal" network connection is still working. Thus the services running on the node are moved to a different node although this is not necessary. Or would it be advisable/possible to use one network only for heartbeats, and the other one for both heartbeats and normal traffic. 

On the other handside, if the "normal" network connection fails, heartbeat (the program ;-)) will not detect it, since it can still exchange heartbeat packages with the node. This leads to the next question: How can I configure heartbeat so that it will detect that a node is not reachable via the "normal" network and initiates a failover of services?


Thanks and best regards,
Martin



Dr. Martin Alt
System und Softwarearchitektur
Plath GmbH
Gotenstrasse   18
D - 20097 Hamburg
Tel: +49 40/237 34-361
Fax: +49 40/237 34-173 
Email: martin.alt at plath.de
http://www.plath.de

Hamburg HRB7401
Geschäftsführer: Dipl.-Kfm. Nico Scharfe
 



More information about the Linux-HA mailing list