[Linux-HA] methods of dealing with network failover
Andrew Beekhof
beekhof at gmail.com
Wed Jan 10 03:39:40 MST 2007
On 1/9/07, Marshal Newrock <marshal at idealso.com> wrote:
> On Tue, 9 Jan 2007 16:49:54 +0100
> "Andreas Kurz" <andreas.kurz at gmail.com> wrote:
>
> > On 1/9/07, Marshal Newrock <marshal at idealso.com> wrote:
> > > I am setting up a 2-machine cluster. Each machine has two
> > > interfaces, eth0 has the public IP, eth1 has just a crossover cable
> > > to the other machine for heartbeat and drbd.
> > >
> > > If eth0 fails, I want the node to fail itself, and all services
> > > migrate to the other machine. I think I understand the docs well
> > > enough to be able to set this up.
> > >
> > > If eth1 fails, then I don't want it to do anything. Even more, if
> > > eth0 fails while eth1 is down, or possibly if the two fail within X
> > > seconds of each other, then I still don't want it to do anything.
> > > It will be dead until the problem is fixed, but I consider that to
> > > be better than having the disks for the two nodes be inconsistent.
> >
> > If all heartbeat channels fail you have a split brain scenario. In the
> > default configuration heartbeat2 will react as you described above:
> >
> > stonith_enabled (boolean, default=FALSE)
> > no_quorum_policy (enum, default=stop)
> >
> > So without a possibility to stonith each other the nodes will shut
> > down their running services and wait for the heartbeat channel to come
> > up again and to regain a quorum.
>
> Sounds good. So, if I were to set no_quorum_policy to freeze instead
> of stop, then if the heartbeat link fails, services will continue to
> run but it will not attempt to gain control of services on the other
> machine.
for that you would need no_quorum_policy=freeze
but i still dont think that helps you because 2 node clusters _always_
have quorum (so this particular policy will never take effect)
>
> With the setup as described above, if eth1 fails and then eth0 fails,
> will the machine continue running the services, as they can't be
> migrated off? Since an eth0 failure would probably be the switch or
> cable, this seems like the best situation, as then when connectivity is
> re-established, services will still be running.
>
> --
> Marshal Newrock, Ideal Solution LLC
> http://www.idealso.com
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list