[Linux-HA] methods of dealing with network failover
Andrew Beekhof
beekhof at gmail.com
Wed Jan 10 04:49:18 MST 2007
On 1/10/07, Andreas Kurz <andreas.kurz at gmail.com> wrote:
> On 1/10/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> > On 1/9/07, Marshal Newrock <marshal at idealso.com> wrote:
> > > On Tue, 9 Jan 2007 16:49:54 +0100
> > > "Andreas Kurz" <andreas.kurz at gmail.com> wrote:
> > >
> > > > On 1/9/07, Marshal Newrock <marshal at idealso.com> wrote:
> > > > > I am setting up a 2-machine cluster. Each machine has two
> > > > > interfaces, eth0 has the public IP, eth1 has just a crossover cable
> > > > > to the other machine for heartbeat and drbd.
> > > > >
> > > > > If eth0 fails, I want the node to fail itself, and all services
> > > > > migrate to the other machine. I think I understand the docs well
> > > > > enough to be able to set this up.
> > > > >
> > > > > If eth1 fails, then I don't want it to do anything. Even more, if
> > > > > eth0 fails while eth1 is down, or possibly if the two fail within X
> > > > > seconds of each other, then I still don't want it to do anything.
> > > > > It will be dead until the problem is fixed, but I consider that to
> > > > > be better than having the disks for the two nodes be inconsistent.
> > > >
> > > > If all heartbeat channels fail you have a split brain scenario. In the
> > > > default configuration heartbeat2 will react as you described above:
> > > >
> > > > stonith_enabled (boolean, default=FALSE)
> > > > no_quorum_policy (enum, default=stop)
> > > >
> > > > So without a possibility to stonith each other the nodes will shut
> > > > down their running services and wait for the heartbeat channel to come
> > > > up again and to regain a quorum.
> > >
> > > Sounds good. So, if I were to set no_quorum_policy to freeze instead
> > > of stop, then if the heartbeat link fails, services will continue to
> > > run but it will not attempt to gain control of services on the other
> > > machine.
> >
> > for that you would need no_quorum_policy=freeze
> >
> > but i still dont think that helps you because 2 node clusters _always_
> > have quorum (so this particular policy will never take effect)
>
> Can you explain this a bit further? Do you mean in the case of a
> heartbeat start one node is sufficient to have quorum?
correct - this is how v1 worked
since it only supported 2 nodes it had to work this way otherwise it
would never be able to do resource takeover
> > > With the setup as described above, if eth1 fails and then eth0 fails,
> > > will the machine continue running the services, as they can't be
> > > migrated off? Since an eth0 failure would probably be the switch or
> > > cable, this seems like the best situation, as then when connectivity is
> > > re-established, services will still be running.
> > >
> > > --
> > > Marshal Newrock, Ideal Solution LLC
> > > http://www.idealso.com
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list