[Linux-HA] dedicated heartbeat link or not?

Dejan Muhamedagic dejanmm at fastmail.fm
Wed Nov 28 11:55:53 MST 2007


Hi,

On Wed, Nov 28, 2007 at 06:24:34PM +0100, Alt, Martin wrote:
> Hello all,
> 
> I am planning to use heartbeat (version 2) to increase the
> availability of some services running in a cluster and I am
> currently stuck with two questions (one of which is rather
> general and not directly related to heartbeat - but I hope
> someone can help me anyway ;-)).
> 
> The setup is as follows: the cluster contains 10 nodes, each

How big is your configuration? If the size reflects the number of
nodes, which is impressive, then it could be that only with the
future releases (there should be one in December), Heartbeat
would manage.

> running its own set of services, which communicate with each
> other using sockets. When one node fails, the services should
> be restarted on another node. The system is connected to a few
> client machines (3 or 4) but NOT connected to any public
> networks. I have already installed and configured heartbeat on
> a test system and everything is running quite fine so far. 
> 
> However, we are now thinking about what hardware to buy for the
> production system and my basic question is, whether it makes
> sense to use dedicated heartbeat links, over which the
> heartbeat processes running on the cluster nodes can
> communicate, or not. 
> 
> Let's assume I have two networks that each cluster node is
> connected to. I can either combine the two networks and use
> both redundantly for "normal" data and heartbeats, or I could
> use one network for normal data and one for heartbeats.
> 
> If I use both networks for ALL traffic (i.e. normal and
> heartbeat traffic) and one network (or one NIC in a node)
> fails, the traffic is routed through the other one. This has
> the advantage that application data and heartbeats get through
> to a node as long as it is connected to at least one of the
> networks. The only problem with this configuration that I see
> is that heartbeat packets could get lost if the network is
> overloaded. However, the system is not connected to any public
> networks, and the network load is quite foreseeable. Therefore,
> I do not think packets will get lost due to network saturation
> (besides I have tried to saturate the network of my testcluster
> by copying large files between all nodes using scp, and there
> were no heartbeats lost). 
> 
> Are there any other problems with this configuration??? Does
> someone have any experience with using a single (redundant)
> network for heartbeats and data?
> 
> The alternative would be to use one network for "normal"
> traffic, and the other one for "heartbeat" traffic only. This
> has the advantage that the heartbeat program has its own
> dedicated network, which it can use to communicate with the
> other nodes. However, if a node's heartbeat network connection
> fails, then that node will be seen as dead, even though its
> "normal" network connection is still working. Thus the services
> running on the node are moved to a different node although this
> is not necessary. Or would it be advisable/possible to use one
> network only for heartbeats, and the other one for both
> heartbeats and normal traffic. 

It would be better to combine the two networks. In particular,
you should always have redundant communication paths for
heartbeats, i.e. don't consider splitting the user data and
heartbeats. One typical setup is to run user data on one network
and heartbeats on both. You should also have redundant switches.

As far as the network load goes, it's hard to say, not many
people are running clusters with this many nodes.

> On the other handside, if the "normal" network connection
> fails, heartbeat (the program ;-)) will not detect it, since it
> can still exchange heartbeat packages with the node. This leads
> to the next question: How can I configure heartbeat so that it
> will detect that a node is not reachable via the "normal"
> network and initiates a failover of services?

See http://www.linux-ha.org/pingd

Thanks,

Dejan

> 
> Thanks and best regards,
> Martin
> 
> 
> 
> Dr. Martin Alt
> System und Softwarearchitektur
> Plath GmbH
> Gotenstrasse   18
> D - 20097 Hamburg
> Tel: +49 40/237 34-361
> Fax: +49 40/237 34-173 
> Email: martin.alt at plath.de
> http://www.plath.de
> 
> Hamburg HRB7401
> Gesch?ftsf?hrer: Dipl.-Kfm. Nico Scharfe
>  
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


More information about the Linux-HA mailing list