[Linux-HA] STONITH without special hardware

Andreas Kurz andreas.kurz at gmail.com
Tue Sep 11 10:43:08 MDT 2007


On 9/11/07, Departamento Técnico de El Norte de Castilla
<tecnicos.norte at gmail.com> wrote:
> 2007/9/11, Dejan Muhamedagic <dejanmm at fastmail.fm>:
> >
> > Hi,
> >
> > On Tue, Sep 11, 2007 at 11:25:24AM +0200, Jose Jerez wrote:
> > > You can always use the ssh stonith agent (external/ssh), it's mainly
> > > for testing but I guess its better than no stonith at all.
> >
> > I'd disagree. An unreliable stonith device gives a false sense of
> > security. It's a workaround which may or may not work. You do
> > want your stonith device to work in case you need it.
>
>
> Yes, but I heard about some magical key combinations that restart systems
> even in a kernel panic (Something like Alt + Sys Req + some key) and I
> thought that maybe some kernel module waiting for a  serial com signal could
> do the job. I know that ssh and meatware kernel modules are just for test
> scenaries and that, specially with my cluster, a Compaq DL380 with a CR3500,
> is really very important to Shoot The Other Node In The Head because the
> shared disk between the nodes (I've made some test without STONITH and all I
> obtained was filesystem corruption).
>
>
> > On 9/10/07, Departamento Técnico de El Norte de Castilla
> > > <tecnicos.norte at gmail.com> wrote:
> > > > What kind of STONITH service, if there is one, could I use with
> > confidence
> > > > without special STONITH hardware installed on the nodes? Thnx in
> > advance,
> >
> > If this is about your data: how much does it cost? How much time
> > it would take you to recover it? Is it then worth having a proper
> > STONITH setup?
> >
> > Thanks,
> >
> > Dejan
>
>
> You're certainly right but the Compaq Proliant DL380 doesn't have STONITH

You have a Proliant DL380 without a Rilo management card? ... I've
never seen DL machines without it .... and for the rilo mgmt card
there is the external riloe stonith resource agent available in
Heartbeat which works well for our Proliant cluster systems.

> hardware and the privative cluster software solutions from Compaq (Keepalive
> for Linux or the software solutions for Windows) doesn't need it and works
> fine so... Why in Linux with Heartbeat is it necessary? Maybe with the

You can build a three node cluster (or maybe use quorumd) and rely on
the quorum decision only in case a node gets isolated ... but the only
way to be really sure a node is down and not accessing shared storage
is to stonith it .... or to use some form of resource fencing. Without
stonith  you can minimize the propability of a split-brain situation
with redundant heartbeat paths.

Regards,
Andreas

> correct file system could it work without STONITH? I'm reading documentation
> about OCFS2 and GFS but, could this filesystems help me to solve the
> problem? Thnx again in advance,
>
> Israel Sanchez
>
> > > _______________________________________________
> > > > Linux-HA mailing list
> > > > Linux-HA at lists.linux-ha.org
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > > >
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



More information about the Linux-HA mailing list