[Linux-HA] help regarding constraints

Andrew Beekhof beekhof at gmail.com
Wed Feb 21 07:03:03 MST 2007


On 2/17/07, Terry L. Inzauro <tinzauro at ha-solutions.net> wrote:
> Lars Marowsky-Bree wrote:
> > On 2007-02-16T18:46:30, "Terry L. Inzauro" <tinzauro at ha-solutions.net> wrote:
> >
> >> roxetta linux # cat /etc/ha.d/ha.cf
> >> baud 38400
> >> serial /dev/ttyS0
> >
> > This basically doesn't work with crm yes. That's too slow a link for the
> > CIB updates and heartbeat packets, and heartbeat doesn't handle it
> > well.
> >
> > Drop the serial link and see whether it helps. Try increasing the speed
> > to 115200, or replacing it with a second ethernet link.
> >
> >> bcast eth2
> >
> > As this is called eth2, I guess you're likely to have several other
> > ethernet interfaces which you can use as well.
> >
> >> coredumps true
> >> deadping 20
> >> initdead 35
> >> udpport 6940
> >> keepalive 1
> >> deadtime 10
> >> initdead 80
> >> warntime 10
> >
> > So, which of the two initdeadtimes do you want heartbeat to use? ;-)
> >
> > And, making the warntime == deadtime is kind of pointless. You should
> > try setting this to warntime 5, for example.
> >
> >> auto_failback on
> >
> > auto_failback is meaningless in a crm-style cluster.
> >
> >> apiauth ipfail uid=hacluster,root
> >> apiauth ccm uid=hacluster,root
> >> apiauth cms uid=hacluster,root
> >> apiauth ping gid=haclient uid=toor,root
> >
> > All of these are implied in the "crm yes" directly and can be dropped.
> >
> >> roxetta linux # /etc/init.d/openvpn_170-2000 stop; echo $?
> >>  * WARNING:  openvpn_170-2000 has not yet been started.
> >> 0
> >
> > OK, it warns, but returns a successful error code, that's alright.
> >
> >> roxetta linux # /etc/init.d/openvpn_170-2000 start; echo $?
> >>  * Starting openvpn_170-2000 ...
> >>                                                                       [ ok ]
> >> 0
> >
> > OK. What does it return when you invoke "start" again while already
> > started? That must also succeed.
> >
> >> roxetta linux # /etc/init.d/openvpn_170-2000 status; echo $?
> >>  * status:  started
> >> 0
> >
> > Ok as well.
> >
> >> roxetta linux # /etc/init.d/openvpn_170-2000 stop; echo $?
> >>  * Stopping openvpn_170-2000 ...
> >>                                                                             [ ok ]
> >> 0
> >
> > Again, what happens when you invoke "stop" again? That, as well, must
> > succeed.
> >
> > What is the status result if stopped?
> >
> >> i did read in the LSB script requirement that there exit code of
> >> status is to be "3" when invoked on a resources that is in a stopped
> >> state.  will this cause issues for me?
> >
> > If it doesn't do that, _YES!_ it will cause issues for you.
> >
> > "status" must return 0 when everything is alright, 3 when completely
> > stopped, and anything else will be treated as "active but failed".
> >
> >
> >> be save, but no dice.  i have just re-compiled without grsecurity and
> >> pax controls for giggles.  perhaps heartbeat tries to do something
> >> that my kernel does not allow.
> >
> > Unlikely, your kernel should have warned in that case.
> >
> >
> > Sincerely,
> >     Lars
> >
>
> yeah, the long stint of sub zero temps might have my brain on deep freeze.  your insightful
> recommendations were correct.  since upping the baud rate to 230400, i am not seeing any of those
> errors.  i also added another "bcast" directive to the mix so now i've got two ethernet and one
> serial connection. testing all of the links proved well.
>
> now.  the issue i am seeing now is actually migrating resources to/from nodes.  the good news is i
> believe we already know why they are not migrating.  the LSB type scripts i am using do not exit
> with the correct code of "3" when stopped and status is queried.

then by definition they are not LSB scripts.
"3" isnt a number we made up, its what the LSB standard mandates.

http://www.linux-foundation.org/spec/refspecs/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

So you have two possible solutions...
1) Beat up on the Gentoo guys until their "LSB" scripts really do
conform to the LSB
2) Write an OCF agent based on the init script (and potentially pass
it back to us to include)

> the bad news is i don't know how
> to resolve this.  gentoo  uses a wrapper "/sbin/runscript" (which is a 32-bit LSB executable) to
> handle the startup and shutdown of its daemons from /etc/init.d.  i am currently trying to figure
> out how to "persuade" my scripts to exit with the code heartbeat requires and at the same time, keep
> my systems in line with "gentoo way"
>
> at first i had a difficult time understanding why "3" and not "0" like most other scripts, then
> after an evening of drinking and a good headache, i realized what the logic  as behind it. heartbeat
> needs to be able to differentiate between hard/soft states when transitioning resources between nodes.
>
> /me the light comes on
>
>
> i may end up re-writing my init scrips to be "/sbin/runsript" independent to resolve my issues
> unless someone has already tackled this beastie.
>
> ... snip from debug log
>
> Feb 17 11:57:52 destiny lrmd: [5280]: debug: on_msg_perform_op: add an operation operation stop[17]
> on lsb::openvpn_69-2000::rsc_openvpn_69-2000 for client 18717, its parameters: C.
> Feb 17 11:57:52 destiny lrmd: [5280]: info: RA output: (rsc_openvpn_69-2000:stop:stdout)  * WARNING:
>  openvpn_69-2000 has not yet been started.
> Feb 17 11:57:52 destiny rc-scripts: WARNING:  openvpn_69-2000 has not yet been started.
>
> ... end snip
>
>
> many thanks for the assist.  i may have some more questions shortly.
>
>
>
> _Terry
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list