[Linux-HA] Using heartbeat with multiple nodes, directors and switches

Michael Alger linux-ha at mm.quex.org
Fri Jul 25 07:50:39 MDT 2008


On Fri, Jul 25, 2008 at 01:24:37PM +0100, Matthew Macdonald-Wallace wrote:
> The system should be able to cope with the following:
> 
> a) A Director Outage
> b) A Web Node Outage
> c) A Network Outage
> 
> 
> 2) What I currently have (excuse the poor ascii art!):
> 
> 
> '----------'  heartbeat '----------'
> |Director 1|------------|Director 2|
> '----------'            '----------'
>      |                     |
> '--------'              '--------'
> |Switch 1|______        |Switch 2|
> '--------'  ____X______/'--------'
>      |     /     \           |
> '---------'       \     '---------'
> |Webnode 1|        \____|Webnode 2|
> '---------'             '---------'
> 
> 
> I hope that the following will give me complete redundancy on a
> network, director and node level however the issue I am
> encountering is with the way in which the IP Addresses are
> assigned to the interfaces.
> 
> Currently, the Directors have Three NICs in them - WAN (+virtual
> IP), LAN (+virtual IP) & Heartbeat
> 
> The web-nodes also have three NICs - WAN, LAN1, LAN2
> 
> I am trying to complete this task by applying Occam's Razor[0],
> although I'm not sure that you can in this kind of situation.

For our redundant network setups, we use "load-balancing" on the
servers so they treat their two network cards as if they were one.
When you're connecting them to different switches, you'll want to
use some kind of "fail over with preference order" - i.e. no actual
load balancing. Instead, it uses the primary card when it's
available, and falls back to the secondary only if the primary link
is down. You can also usually configure some kind of poll (e.g. ping
the switch) and have it pretend the link is down if that test fails.

Under Linux you can use the bonding driver for this. For Windows,
your server vendor will hopefully be able to provide a driver that
can do this. I'm not sure if there's a general one.

In addition, I'd connect the switches to each other, and if possible
the directors should be connected to both switches, as well.  The
modified ASCII art network diagram would look like this:

 '----------'  heartbeat '----------'
 |Director 1|------------|Director 2|
 '----------'____ _______'----------'
      |    ______X_______   |
 '--------'______________'--------'
 |Switch 1|______        |Switch 2|
 '--------'  ____X______/'--------'
      |     /     \           |
 '---------'       \     '---------'
 |Webnode 1|        \____|Webnode 2|
 '---------'             '---------'
 
This provides full redundancy in your connectivity, in the event
that any one of any kind of node fails. In addition, both D1 and D2
can see your W1 and W2, which means even if S1 goes down, D1 can
keep acting as the primary since it has a path to W1 and W2. D2 only
needs to take over if something goes wrong with D1 itself.

In theory, that makes managing failover scenarios a fair bit easier.

> My planned IP Addressing is as follows:
> 
> Director 1:
> WAN - what ever is assigned by the Data Centre (inc. virtual IP)
> LAN - 10.27.1.201 (x corresponds to the director number)
> LAN (Virtual) - 10.27.1.254
> 
> Director 2:
> WAN - what ever is assigned by the Data Centre (inc. virtual IP)
> LAN - 10.27.1.202 (x corresponds to the director number)
> LAN (Virtual) - 10.27.1.254
> 
> WebNode 1:
> WAN - what ever is assigned by the Data Centre
> LAN1 - 10.27.1.101
> LAN2 - 10.27.1.102

Rather than having two diferent LAN IP addresses, you'd have a
single one here:

LAN - 10.27.1.101

> WebNode 2:
> WAN - what ever is assigned by the Data Centre
> LAN1 - 10.27.1.103
> LAN2 - 10.27.1.104

And this would be:

LAN - 10.27.1.102

> The issue that I am encountering is that when a webnode has two
> interfaces that are on the same subnet, I can ssh to one, but if I
> try and ssh to the other it fails.

Either the packets aren't reaching it (because the IP + MAC address
are on a different physical network); or your servers are responding
out via NIC1 as that would probably be its default route for the
subnet.

This is why you want to bond them - so it doesn't matter which NIC
is actually in use.

> I think that I may end up requiring a subnet for each network that
> is connected to a director (i.e. 10.27.1.x for director1 and
> 10.27.2.x for director2) and a third "virtual" subnet (on eth0:0
> for example) for the HA stuff.  This seems unnecessarily complex
> to me, and I'm sure there's a better way of doing it!
>
> 3) Here it is folks - your excuse to tell me why I'm wrong and
> where I need to look/read/research/test in order to fix this
> issue.

Well, that's my shot at it. What do you think?


More information about the Linux-HA mailing list