[Linux-HA] R2 Two-node apache cluster with STONITH
Dejan Muhamedagic
dejanmm at fastmail.fm
Thu Apr 19 12:01:20 MDT 2007
On Tue, Apr 17, 2007 at 03:53:41PM -0400, Bjorn Oglefjorn wrote:
> Here they are again.
It looks like that this
Apr 4 11:28:20 test-2 stonithd: [13658]: info: Failed to STONITH the node test-1.domain: optype=1, op_result=2
means that the stonith operation timed out. I'll fix the code to
raise this to an error condition and include the descriptions.
Before, we see:
Apr 4 11:27:50 test-2 tengine: [13668]: info: te_fence_node:actions.c Executing reboot fencing operation (16) on test-1.domain (timeout=30000)
Note the timeout: 30secs. After some digging I found that it's
transition_timeout. Is 30 seconds enough time for your stonith
agent to perform the reset?
Anyway, in CIB I found only this (crm_verify doesn't complain) I
find these two timeouts:
<nvpair id="cib-bootstrap-options-transition_idle_timeout" name="transition_idle_timeout" value="5min"/>
...
<op id="test-1_DRAC_reset" name="reset" timeout="3min" prereq="nothing"/>
1. transition_timeout is not in the annotated CIB.
2. Should user specify this timeout in the crm_config section and
calculate the maximum value of all rsc operations' timeouts?
3. What's the difference between the transition_timeout and the
transition_idle_timeout?
Andrew, can you please take a look.
Thanks.
>
> On 4/17/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> >
> >On 4/17/07, Bjorn Oglefjorn <sys.mailing at gmail.com> wrote:
> >> I know that my plugin is getting called because of the logging that the
> >> plugin does.
> >
> >do we get to see that logging at all? preferably in the context of
> >the other log messages
> >
> >> That said, I also know my plugin is not receiving any 'reset'
> >> operation request from heartbeat. If you see below, request actions are
> >> logged. The only actions logged when node failure is simulated are:
> >> getconfignames, status, and gethosts, in that order. We should also see
> >> getinfo-devid and reset operations logged, but they are never present.
> >> --BO
> >>
> >> On 4/17/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> >> >
> >> > On 4/17/07, Bjorn Oglefjorn <sys.mailing at gmail.com> wrote:
> >> > > Yes, I most certainly have. The stonith command-line tool has no
> >> > problem at
> >> > > all with the plugin. The following was run from test-1.domain. The
> >> > > indented log entries are from the debug log of the stonith plugin:
> >> >
> >> > I'm no stonith expert, but the outputs certainly look plausible
> >enough.
> >> > You kept the same CIB?
> >> > Are you sure your plugin is getting called?
> >> >
> >> > > root:~ # stonith -t external/drac4
> >> > > DRAC_ADDR=test-2.drac.domainDRAC_LOGIN=root DRAC_PASSWD=******** -lS
> >> > > stonith: external/drac4 device OK.
> >> > > test-2.drac.domain
> >> > >
> >> > > [Tue Apr 17 09:57:20 2007] Requested Action for : getconfignames
> >> > > [Tue Apr 17 09:57:22 2007] Requested Action for test-2.drac.domain
> >:
> >> > status
> >> > > [Tue Apr 17 09:57:22 2007] Success: test-2.drac.domain is
> >reachable
> >> > > [Tue Apr 17 09:57:23 2007] Requested Action for : getinfo-devid
> >> > > [Tue Apr 17 09:57:24 2007] Requested Action for test-2.drac.domain
> >:
> >> > > gethosts
> >> > >
> >> > > root:~ # stonith -t external/drac4
> >> > > DRAC_ADDR=test-2.drac.domainDRAC_LOGIN=root DRAC_PASSWD=******** -T
> >on
> >> > > test-2.domain
> >> > >
> >> > > [Tue Apr 17 09:57:28 2007] Requested Action for : getconfignames
> >> > > [Tue Apr 17 09:57:30 2007] Requested Action for test-2.drac.domain
> >:
> >> > status
> >> > > [Tue Apr 17 09:57:30 2007] Success: test-2.drac.domain is
> >reachable
> >> > > [Tue Apr 17 09:57:31 2007] Requested Action for : getinfo-devid
> >> > > [Tue Apr 17 09:57:33 2007] Requested Action for test-2.drac.domain:
> >on
> >> > > [Tue Apr 17 09:57:33 2007] test-2.drac.domain Initial Power Status
> >=
> >> > ON
> >> > > [Tue Apr 17 09:57:33 2007] Success: test-2.drac.domain Power
> >Status =
> >> > ON
> >> > >
> >> > > root:~ # stonith -t external/drac4
> >> > > DRAC_ADDR=test-2.drac.domainDRAC_LOGIN=root DRAC_PASSWD=******** -T
> >> > > reset
> >> > > test-2.domain
> >> > >
> >> > > [Tue Apr 17 09:57:46 2007] Requested Action for : getconfignames
> >> > > [Tue Apr 17 09:57:48 2007] Requested Action for test-2.drac.domain
> >:
> >> > status
> >> > > [Tue Apr 17 09:57:48 2007] Success: test-2.drac.domain is
> >reachable
> >> > > [Tue Apr 17 09:57:49 2007] Requested Action for : getinfo-devid
> >> > > [Tue Apr 17 09:57:50 2007] Requested Action for test-2.drac.domain
> >:
> >> > reset
> >> > > [Tue Apr 17 09:57:50 2007] test-2.drac.domain Initial Power Status
> >=
> >> > ON
> >> > > [Tue Apr 17 09:57:58 2007] Success: test-2.drac.domain Power
> >Status =
> >> > > RESET
> >> > >
> >> > > --BO
> >> > >
> >> > > On 4/17/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> >> > > >
> >> > > > On 4/16/07, Bjorn Oglefjorn <sys.mailing at gmail.com> wrote:
> >> > > > > No ideas?
> >> > > >
> >> > > > none at all - have you tried calling it manually using the stonith
> >> > > > command-line tool to make sure it works?
> >> > > >
> >> > > > > On 4/9/07, Bjorn Oglefjorn <sys.mailing at gmail.com> wrote:
> >> > > > > >
> >> > > > > > I quickly put together a STONITH plugin for testing this. It
> >> > conforms
> >> > > > to
> >> > > > > > the heartbeat spec and always lies to heartbeat returning
> >success
> >> > no
> >> > > > matter
> >> > > > > > what. With this plugin in place I'm still getting this error:
> >> > > > > >
> >> > > > > > Apr 9 15:40:47 test-2 stonithd: [8791]: info: Failed to
> >STONITH
> >> > the
> >> > > > node
> >> > > > > > test-1.domain: optype=1, op_result=2
> >> > > > > > Apr 9 15:40:47 test-2 tengine: [8803]: info:
> >> > > > tengine_stonith_callback:
> >> > > > > > callbacks.c call=-4, optype=1, node_name= test-1.domain,
> >result=2,
> >> > > > > > node_list=, action=13;5:6eaeba12-87c3-465e-98f1-78585e71e495
> >> > > > > > Apr 9 15:40:47 test-2 tengine: [8803]: ERROR:
> >> > > > tengine_stonith_callback:
> >> > > > > > callbacks.c Stonith of test-1.domain failed (2)... aborting
> >> > > > transition.
> >> > > > > >
> >> > > > > > --BO
> >> > > > _______________________________________________
> >> > > > Linux-HA mailing list
> >> > > > Linux-HA at lists.linux-ha.org
> >> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> > > > See also: http://linux-ha.org/ReportingProblems
> >> > > >
> >> > > _______________________________________________
> >> > > Linux-HA mailing list
> >> > > Linux-HA at lists.linux-ha.org
> >> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> > > See also: http://linux-ha.org/ReportingProblems
> >> > >
> >> > _______________________________________________
> >> > Linux-HA mailing list
> >> > Linux-HA at lists.linux-ha.org
> >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> > See also: http://linux-ha.org/ReportingProblems
> >> >
> >> _______________________________________________
> >> Linux-HA mailing list
> >> Linux-HA at lists.linux-ha.org
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>
> >_______________________________________________
> >Linux-HA mailing list
> >Linux-HA at lists.linux-ha.org
> >http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
--
Dejan
More information about the Linux-HA
mailing list