[Linux-HA] Emergency reboot by stonith-enabled="false"

Dejan Muhamedagic dejanmm at fastmail.fm
Fri Oct 8 10:38:26 MDT 2010


Hi,

On Fri, Oct 08, 2010 at 03:35:13PM +0200, Nikita Michalko wrote:
> Hi all!
> 
> My very simple 2 nodes test  cluster with Pacemaker&Heartbeat make me some headaches. Here are my versions:
> 
> cluster-glue: 1.0.6
> resource-agents: 1.0.3
> Heartbeat STABLE: 3.0.3
> pacemaker: 1.1.3 (all from wiki sources)
> OS: SLES11/SP1
> After succesfully starting Heartbeat  on the first node "opter"
> (the other node was for test not up- dead) with stonith
> disabled (see my configuration below) did the first node
> reboot. Why on my own node? Do I need stonith on symmetric
> cluster?

Yes.

> HB_Report attached ...

stonith-ng failed to connect to the cluster:

Oct 08 13:03:27 opteron heartbeat: [10872]: WARN: Client [stonith-ng] pid 10898 failed authorization [no default client auth]
Oct 08 13:03:27 opteron heartbeat: [10872]: ERROR: api_process_registration_msg: cannot add client(stonith-ng)
...
Oct 08 13:03:27 opteron stonith-ng: [10898]: CRIT: main: Cannot sign in to the cluster... terminating

which made heartbeat reboot. I guess that you can add sth like
this to ha.cf:

apiauth stonith-ng  uid=root

If you want to prevent reboots, use "crm respawn".

Thanks,

Dejan

> 
> My configuration:
> -- crm(live)# configure show
> node $id="5ac2b85d-802f-40a6-ad0f-38660c4a6fb0" opter
> node $id="caca825d-2fd9-426d-9ed7-8ff9845bc08f" aipsles11
> primitive IPaddr_192_168_150_54 ocf:heartbeat:IPaddr \
>         op monitor interval="60s" timeout="60s" \
>         params ip="192.168.150.54" cidr_netmask="24" broadcast="192.168.150.63"
> primitive IPaddr_19X_XX_XX_54 ocf:heartbeat:IPaddr \
>         op monitor interval="60s" timeout="60s" \
>         params ip="19X.XX.XX.54" cidr_netmask="26" broadcast="19X.XX.XX.63"
> primitive ubis_udbmain_3 lsb:ubis_udbmain \
>         op monitor interval="120s" timeout="110s"
> group group_1 IPaddr_19X_XX_XX_54 IPaddr_192_168_150_54 ubis_udbmain_3
> location rsc_location_group_1 group_1 \
>         rule $id="prefered_location_group_1" 1: #uname eq opter
> property $id="cib-bootstrap-options" \
>         symmetric-cluster="true" \
>         no-quorum-policy="ignore" \
>         migration-threshold="3" \
>         stonith-enabled="false" \
>         stonith-action="reboot" \
>         startup-fencing="false" \
>         stop-orphan-resources="true" \
>         stop-orphan-actions="true" \
>         remove-after-stop="false" \
>         short-resource-names="true" \
>         transition-idle-timeout="3min" \
>         default-action-timeout="110s" \
>         is-managed-default="true" \
>         cluster-delay="60s" \
>         pe-error-series-max="-1" \
>         pe-warn-series-max="-1" \
>         pe-input-series-max="-1" \
>         dc-version="1.1.3-7e4c0424e331aa2a51cb1efb69e80b5c8e1f8701" \
>         cluster-infrastructure="Heartbeat" \
>         last-lrm-refresh="1284125385"
> 
> Any ideas/comments?
> 
> TIA!
> 
> Nikita Michalko 


> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems




More information about the Linux-HA mailing list