MAC address takeover; Re: switches

Robert Minichino rmini at joni.pasture.net
Mon Oct 19 07:34:49 MDT 1998


Hello everyone.  I'm new to the mailing list, and in reading the list traffic I 
noticed what may be some confusion regarding MAC address takeover and ethernet 
switches.  I think I may be able to clarify things somewhat.

As you may or may not already know, ethernet switches look at the traffic 
entering a specific port, and update their bridge tables based upon the 
originating MAC address in the incoming packets; before these tables are full, 
the switch acts much like an ordinary repeater, sending packets to each port if 
the destination MAC address' port is unknown to it.  Here's an example of what a 
switch would do with MAC address failover:

System A -----> Switch <----- World (many machines behind this link)
                  ^
System B ---------+

System A and B are a server cluster, perhaps with a serial link for heartbeat, a 
private ethernet, and a shared SCSI bus.  A is the primary server, and B is the 
backup, a lower-powered machine that is still capable of handling all requests, 
but perhaps with less speed and agility, and equal reliablity.

Finally, our cluster is set up.  We power on the switch, and it sits with an 
empty bridging table.  System A starts and sends out a gratuitous ARP, 
containing its MAC address in the origin field of the ethernet packet.  The 
switch then knows that the server is on the left port of the switch, and all 
communications between World and the server is done through the left port.  
However, if A fails to send any packets towards the switch before World talks to 
the servers, the switch must send the packets to both A and B (and any other 
ports) to assure connectivity.  Then upon the reply from A the tables would be 
updated.

Everything is peaceful in our server cluster until System A becomes the victim 
of a faulty power supply and dies a quiet death.  System B notices the missing 
heartbeat of System A, calls 911, and assumes its MAC address.  System B then 
sends out a gratuitous ARP upon reconfiguration of the interface (this it MUST 
do however, or the switch will continue sending packets destined for the server 
to the left port), and the switch updates its bridge table to reflect this 
change.  All subsequent communication from World to a server goes through the 
bottom switch port to System B.  The paramedics then arrive and System A gets a 
shiny new redundant power supply.  System A then, through a private 
communications channel with B, can decide to again serve requests by having B 
reconfigure to its old MAC address, and A sends out some packet towards the 
switch to notify it of the change.  Meanwhile, other than perhaps a timed-out 
request or two, World is unaware of what has happened.

Despite how simple this seems to implement, I am sure that there are a handful 
of broken switch implementations.  In my experience, this is the sequence of 
events I've seen most often.  This mode of operation is by no means uncommon 
however; It's encountered every time the connection between a host and the 
switch are moved to a different port without power cycling the switch.  These 
table-updating semantics are what keep switches mostly transparent.

However, due to the mission-critical nature of HA computing (HA features would 
not be neccessary if the operation was non-critical), I suggest a list of all 
compliant hardware be compiled, if not done already.  Linux-HA is much unlike 
Linux on the desktop or low-end servers in that it can dictate (to a greater 
extent, at least) which hardware it runs on, instead of the hardware dictating 
to the software.  But then again, we knew that already ;)

Cheers,

Robert Minichino
Chief Engineer
Denarius Enterprises, Inc.
http://www.denarius.com/




More information about the Linux-HA mailing list