[Linux-HA] how to restart a failed resource

dxj_600 dxj_600 at 126.com
Fri Sep 28 07:23:30 MDT 2007


Thanks to "Andrew Beekhof" and "Joseph Lamoree", i can trigger heartbeat to=
 start the resource now
but I met new problem as followings:
 1. I configured a clones resource, which should be run at two nodes
 node host_192.168.2.220 host_192.168.2.221
 2. first my resource run at only one node
 # crm_resource -W -r rsc_lcm_0:0
 resource rsc_lcm_0:0 is running on: host_192.168.2.220
  # crm_resource -W -r rsc_lcm_0:1
 resource rsc_lcm_0:1 is NOT running
 3. then i trigger the heartbeat to start rsc_lcm_0:1
 # crm_resource -C -r rsc_lcm_0:1
 4. then heartbeat do something like this
 on node host_192.168.2.221
 [2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra monitor
 probe
 [2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra start
 [2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
 [2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra stop <----=
 because my ra timeout
 on node host_192.168.2.220
 [2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
 probe
 [2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra monitor
 [2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
 [2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
 [2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra stop <----=
 why stop me????
 [2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra monitor
 [2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
 [2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra stop -;___=
 in fact it's a restart
 [2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra start -'
 [2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
 [2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
 [2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra monitor
 [2007-09-28 18:22]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra monitor
 5. finally result is
 # crm_resource -W -r rsc_lcm_0:0
 resource rsc_lcm_0:0 is running on: host_192.168.2.220
  # crm_resource -W -r rsc_lcm_0:1
 resource rsc_lcm_0:1 is NOT running
 My question is:
 since the result 5 is same as result 2, rsc_lcm_0:0 is still running at no=
de host_192.168.2.220, then WHY heartbeat stop the resource and restart it =
again, and WHY not heartbeat keep rsc_lcm_0:0 running untouched, how can i =
avoid this scenary.
 my resource configuration is:
 <clone id=3D"rsc_lcm" ordered=3D"false" interleave=3D"false" notify=3D"tru=
e">
 <instance_attributes id=3D"ow_lcm_instance_attributes">
 <attributes>
 <nvpair id=3D"ow_lcm_clone_max" name=3D"clone_max" value=3D"2"/>
 <nvpair id=3D"ow_lcm_clone_node_max" name=3D"clone_node_max" value=3D"1"/>
 </attributes>
 </instance_attributes>
 <primitive class=3D"ocf" type=3D"owlcm-ra" provider=3D"heartbeat" id=3D"rs=
c_lcm_0" is_managed=3D"true">
 <operations>
 <op id=3D"rsc_lcm_op0" name=3D"start" timeout=3D"50s" disabled=3D"false"/>
 <op id=3D"rsc_lcm_op2" name=3D"monitor" interval=3D"5s" timeout=3D"4s" dis=
abled=3D"false" role=3D"Started"/>
 </operations>
 <instance_attributes id=3D"ow_lcm_ra_instance_attributes">
 <attributes>
 <nvpair id=3D"rsc_lcm_vip" name=3D"vip" value=3D"192.168.2.231 192.168.2.2=
32"/>
 </attributes>
 </instance_attributes>
 </primitive>
 </clone>
 thanks a lot
 =20
=D4=DA2007-09-26=A3=AC"Joseph Lamoree" <jlamoree at gmail.com> =D0=B4=B5=C0=A3=
=BA
Take a look at the crm_resource page:
http://www.linux-ha.org/v2/AdminTools/crm_resource
There is an example of stopping/starting resources there. Also, most
of the executables in heartbeat support a --help option to display a
lot of very useful information.
--
Joseph Lamoree
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


More information about the Linux-HA mailing list