Any tools for switch over, changing IP addresses, etc?

David Lang dlang at diginsite.com
Tue Oct 6 17:48:33 MDT 1998


-----BEGIN PGP SIGNED MESSAGE-----

is there a easy way to generate the gratuitous ARP packets in linux now?

David Lang

On Tue, 6 Oct 1998, Michael Rowan wrote:

> Date: Tue, 06 Oct 1998 19:39:16 -0400
> From: Michael Rowan <mtr at cutaway.com>
> To: Per-Ola Mard <peo at hds.com>, linux-ha at muc.de
> Subject: Re: Any tools for switch over, changing IP addresses, etc?
> 
> Per-Ola Mard wrote:
> 
> > Hello Richard,
> >
> > Three follow-up questions,
> >
> > 1) What type of apps do you run at the two (web/oracle/sybase/informix?
> > 2) How quickly (no outage, one transaction loss or  3 minutes outage)?
> > 3) What type of meltdown do you plan to sustain?
> > (app/os/disk/controller/memory/cpu)?
> >
> > I'm just a bit curious about how we, people on this list, think and what the
> > targets are.
> > Where are the boundaries/limitations, targets and possibilities?
> >
> > Regards,
> >   /peo
> >
> > Richard Sharpe wrote:
> >
> > > Hi,
> > >
> > > I notice that this list is not a high volume list.
> > >
> > > I have a site where we have two identical machines (dual 200MHz
> > > dual-processor Pentium Pro's), and we would like to switch the machines
> > > around quickly in the event of a failure on the primary machine.
> > >
> > > Are there any tools to help with this?  We need to move IP addresses
> > > around, and it seems like Linux 2.0.35 does not do gratuitous ARP when an
> > > interface is ifconfig'd.
> > >
> > > Regards
> > > -------
> > > Richard Sharpe, sharpe at ns.aus.com, NIC-Handle:RJS96
> > > NS Computer Software and Services P/L,
> > > Ph: +61-8-8281-0063, FAX: +61-8-8250-2080,
> > > Samba, Linux, Apache, Digital UNIX, AIX, Netscape, Stronghold, C, ...
> 
> It's not really the server machines that are a problem when it comes to the IP
> takeover/arp issue.  It's the client machines.  There are two general methods
> of forcing ARP updates in a enterprise where there are OS's that don't pick up
> ARP responses not directed at them.  Both are annoying.  In the first case, you
> ping a list of client machines, and the ping will force an arp update.  The
> second is that the client machines, upon seeing a change through a number of
> possible mechanisms, deletes the approriate entries in their arp cache.   There
> are other combinations or derivatives of the two (like machines in the local
> loop dinking the whole arp cache every N seconds) Luckily you only need to do
> that on your side of the router, so the list is finite, though that doesn't
> have to mean small.
> 
> Or you can opt for taking over addresses at the MAC address level -- the nic's
> hardware address.  You avoid ARP issues in this case since ARP is a translation
> from logical IP addresses to hardware MAC addresses.  Swap both and the network
> continues running without any arp fuss.
> 
> Back to IP takeover, the other issue with doing this comes from conflicting
> addresses on two machines -- if you have an address flipping back and forth,
> there needs to be at least three address assigned -- one for each machine to
> come online during the boot process, and the third shared address to toss back
> and forth.  Otherwise, you might do a takeover so that your second machine is
> using the primary IP address you have, and the primary machine gets rebooted,
> or comes back online, and shoves the old IP address back out on the nic, and
> thoroughly confuses thiings from there on out.
> 
> On timings, there are two time issues really -- how fast do you want to
> recongize the failure and how fast can you have the cluster "black box" back up
> and operational.  The former is fairly fixed depending on a few give/take
> issues; a certian amount of time has to elaspe in order to be sure that you are
> not dealing with a simple short term outage versus a real machine/nic failure.
> You can crank this number way down, but you end up with more false failures
> (where you think it failed, and initiate an (expensive) failover, but in
> reality the network has an issue that delayed packets by a few seconds, or a
> machine was abnormally busy during a burst that normally would have fixed
> itself in a very short time period). Crank it up and you get a much better
> chance of only dealing with real failures, but your failover window grows with
> it.
> 
> The latter (getting the cluster operational again) is dependent on many, many
> things.  Acquring disks that are on a shared SCSI bus or the equiiv, how many
> of these disks are there, re-mounting filesystems and checking them after a
> failure (or redoing the logs if you have a logged or journaled system), how
> many of those are there?  How long does it take to start your shared services
> (oracle, web servers, what not).... all of these things add up and are
> inclusive in the failure recovery time you are targeting.
> 
> Timing these things at the transaction level is not feasable, although you can
> do some things at that level, but more at the middleware layer or disk driver
> layer assuming you are doing some kind of network forwarding of disk data, but
> even then...  Transactional integrity is a higher level concept....
> 
> mtr
> 
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQEVAwUBNhqsUz7msCGEppcbAQGOKAf9FUxAuAxEY9fKTCZdd9/KQr6LNWQOitPY
7HuAM1P9XjEYoNKSKGupf6cCm2Wn8Hi0UYyuDUp9Bz9nmt9h9Gr3YpbTr95afowA
DcCvkuG+b2rNtaneClkxUrZUhlZLvO2CqNqhr+jQbbrYtUgFVKCoitcWXOFdrQl1
EgoYLt8EZTtoH3OzbYLdv3VYrDeFbKNstpDUJGJDxlNETNopcJRSdKcdsGtdO1lF
m/cIRJ/EN0eKslF1tS2gr4Uy4rpN1RRN0pmkLmV8Ys8N/iiuRvgH2RdXPwW3YIUM
Xx0hy5dlyGDVyDUx0AeRarIjMXLayXOiX9VQb3fPt1mD6eTFwg3CSA==
=j+Bi
-----END PGP SIGNATURE-----




More information about the Linux-HA mailing list