[Linux-HA] Newbie on STONITH
Fajar Priyanto
fajarpri at cbn.net.id
Wed Apr 2 07:56:03 MDT 2008
Hello all,
I've just got an opportunity to play with a fence device: WTI NPS.
I manage to test the power cycle using this command:
stonith -v -t wti_nps ipaddr=192.168.0.100 password=123456 -l -T reset
station8
However, I'm not really clear on how to apply Stonith into Linux-HA v2. Been
digging around and found this page:
http://www.linux-ha.org/ConfiguringStonithPlugins
If I'm not mistaken in understanding it, in a 2-node cluster, we need to setup
2 stonith resource, with each one's job is to shoot the other node in the
head?
More confusion is, in what parameter/attribute can I define the "station8"? In
wti_nps native resource, the mentioned parameter is just "ipaddr"
and "password".
Here's my related CIB:
<clone id="DoFencing">
<meta_attributes id="DoFencing_meta_attrs">
<attributes>
<nvpair id="DoFencing_metaattr_target_role" name="target_role"
value="stopped"/>
<nvpair id="DoFencing_metaattr_clone_max" name="clone_max"
value="2"/>
<nvpair id="DoFencing_metaattr_clone_node_max"
name="clone_node_max" value="1"/>
</attributes>
</meta_attributes>
<primitive id="resource_" class="stonith" type="wti_nps"
provider="heartbeat">
<instance_attributes id="resource__instance_attrs">
<attributes>
<nvpair id="babe7348-ace7-4802-b960-78b68175f00c" name="ipaddr"
value="192.168.0.100"/>
<nvpair id="9329362b-2213-41c3-83d8-b7aaa65c8816"
name="password" value="bajau123"/>
</attributes>
</instance_attributes>
<operations>
<op id="fafbbfdc-b1c3-4d31-a87a-33c79001ccd3" name="monitor"
description="fence8" interval="15" timeout="15" start_delay="15"
prereq="nothing" disabled="false" role="Started" on_fail="fence"/>
<op id="bbbc7128-adcf-42b8-b3d0-6180d6428207" name="start"
description="fence8" timeout="15" prereq="nothing" start_delay="0"
disabled="false" role="Started"/>
</operations>
<meta_attributes id="resource_:0_meta_attrs">
<attributes>
<nvpair id="resource_:0_metaattr_target_role"
name="target_role" value="started"/>
</attributes>
</meta_attributes>
</primitive>
</clone>
I try to add an operation to a resource "On Fail: fence". This is what happen
when I test to make the httpd resource fail by emptying httpd.conf:
Apr 2 22:31:36 station4 pengine: [5007]: notice: StopRsc:
station5.enterprise.com Stop r_iphttp_1
Apr 2 22:31:36 station4 pengine: [5007]: notice: StartRsc:
station4.enterprise.com Start r_iphttp_1
Apr 2 22:31:36 station4 pengine: [5007]: notice: RecurringOp:
station4.enterprise.com r_iphttp_1_monitor_10000
Apr 2 22:31:36 station4 pengine: [5007]: notice: NoRoleChange: Move resource
r_fsmount_1 (station5.enterprise.com -> station4.enterprise.com)
Apr 2 22:31:36 station4 pengine: [5007]: notice: StopRsc:
station5.enterprise.com Stop r_fsmount_1
Apr 2 22:31:36 station4 pengine: [5007]: notice: StartRsc:
station4.enterprise.com Start r_fsmount_1
Apr 2 22:31:36 station4 pengine: [5007]: notice: NoRoleChange: Recover
resource r_serviceweb_1 (station4.enterprise.com)
Apr 2 22:31:36 station4 pengine: [5007]: notice: StopRsc:
station5.enterprise.com Stop r_serviceweb_1
Apr 2 22:31:36 station4 pengine: [5007]: notice: StartRsc:
station4.enterprise.com Start r_serviceweb_1
Apr 2 22:31:36 station4 pengine: [5007]: notice: RecurringOp:
station4.enterprise.com r_serviceweb_1_monitor_15000
Apr 2 22:31:36 station4 pengine: [5007]: info: native_stop_constraints:
resource_:1_stop_0 is implicit after station5.enterprise.com is fenced
Apr 2 22:31:36 station4 pengine: [5007]: info: native_stop_constraints:
Re-creating actions for DoFencing
Apr 2 22:31:36 station4 pengine: [5007]: notice: NoRoleChange: Leave resource
resource_:0 (station4.enterprise.com)
Apr 2 22:31:36 station4 pengine: [5007]: notice: StopRsc:
station5.enterprise.com Stop resource_:1
Apr 2 22:31:36 station4 crmd: [2800]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=route_message ]
Apr 2 22:31:36 station4 tengine: [5006]: info: unpack_graph: Unpacked
transition 40: 18 actions in 18 synapses
Apr 2 22:31:36 station4 tengine: [5006]: info: te_pseudo_action: Pseudo
action 18 fired and confirmed
Apr 2 22:31:36 station4 tengine: [5006]: info: te_pseudo_action: Pseudo
action 25 fired and confirmed
Apr 2 22:31:36 station4 pengine: [5007]: WARN: process_pe_message: Transition
40: WARNINGs found during PE processing. PEngine Input stored
in: /var/lib/heartbeat/pengine/pe-warn-31.bz2
Apr 2 22:31:36 station4 tengine: [5006]: info: te_pseudo_action: Pseudo
action 27 fired and confirmed
Apr 2 22:31:36 station4 pengine: [5007]: info: process_pe_message:
Configuration WARNINGs found during PE processing. Please
run "crm_verify -L" to identify issues.
Apr 2 22:31:36 station4 tengine: [5006]: info: te_fence_node: Executing
reboot fencing operation (28) on station5.enterprise.com (timeout=30000)
Apr 2 22:31:36 station4 stonithd: [2798]: info: client tengine [pid: 5006]
want a STONITH operation RESET to node station5.enterprise.com.
Apr 2 22:31:36 station4 stonithd: [2798]: info: Broadcasting the message
succeeded: require others to stonith node station5.enterprise.com.
Apr 2 22:31:36 station4 tengine: [5006]: info: te_pseudo_action: Pseudo
action 22 fired and confirmed
Apr 2 22:31:36 station4 tengine: [5006]: info: te_pseudo_action: Pseudo
action 26 fired and confirmed
Apr 2 22:31:41 station4 stonithd: [8041]: info: Successful login to WTI
Network Power Switch.
Apr 2 22:33:06 station4 pengine: [5007]: WARN: process_pe_message: Transition
43: WARNINGs found during PE processing. PEngine Input stored
in: /var/lib/heartbeat/pengine/pe-warn-34.bz2
Apr 2 22:33:06 station4 pengine: [5007]: info: process_pe_message:
Configuration WARNINGs found during PE processing. Please
run "crm_verify -L" to identify issues.
Apr 2 22:33:06 station4 tengine: [5006]: info: unpack_graph: Unpacked
transition 43: 18 actions in 18 synapses
Apr 2 22:33:06 station4 tengine: [5006]: info: te_pseudo_action: Pseudo
action 18 fired and confirmed
Apr 2 22:33:06 station4 tengine: [5006]: info: te_pseudo_action: Pseudo
action 25 fired and confirmed
Apr 2 22:33:06 station4 tengine: [5006]: info: te_pseudo_action: Pseudo
action 27 fired and confirmed
Apr 2 22:33:06 station4 tengine: [5006]: info: te_fence_node: Executing
reboot fencing operation (28) on station5.enterprise.com (timeout=30000)
Any insight and comments are welcome.
Thank you in advance.
--
Fajar Priyanto | Reg'd Linux User #327841 | Linux tutorial
http://linux2.arinet.org
20:55:08 up 2:03, 2.6.22-14-generic GNU/Linux
Let's use OpenOffice. http://www.openoffice.org
The real challenge of teaching is getting your students motivated to learn.
More information about the Linux-HA
mailing list