[Linux-HA] how to restart a failed resource
dxj_600
dxj_600 at 126.com
Fri Sep 28 07:23:30 MDT 2007
Thanks to "Andrew Beekhof" and "Joseph Lamoree", i can trigger heartbeat to=
start the resource now
but I met new problem as followings:
1. I configured a clones resource, which should be run at two nodes
node host_192.168.2.220 host_192.168.2.221
2. first my resource run at only one node
# crm_resource -W -r rsc_lcm_0:0
resource rsc_lcm_0:0 is running on: host_192.168.2.220
# crm_resource -W -r rsc_lcm_0:1
resource rsc_lcm_0:1 is NOT running
3. then i trigger the heartbeat to start rsc_lcm_0:1
# crm_resource -C -r rsc_lcm_0:1
4. then heartbeat do something like this
on node host_192.168.2.221
[2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra monitor
probe
[2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra start
[2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
[2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra stop <----=
because my ra timeout
on node host_192.168.2.220
[2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
probe
[2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra monitor
[2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
[2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
[2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra stop <----=
why stop me????
[2007-09-28 18:20]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra monitor
[2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
[2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra stop -;___=
in fact it's a restart
[2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra start -'
[2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
[2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra notify
[2007-09-28 18:21]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra monitor
[2007-09-28 18:22]: /usr/lib/ocf/resource.d//heartbeat/owlcm-ra monitor
5. finally result is
# crm_resource -W -r rsc_lcm_0:0
resource rsc_lcm_0:0 is running on: host_192.168.2.220
# crm_resource -W -r rsc_lcm_0:1
resource rsc_lcm_0:1 is NOT running
My question is:
since the result 5 is same as result 2, rsc_lcm_0:0 is still running at no=
de host_192.168.2.220, then WHY heartbeat stop the resource and restart it =
again, and WHY not heartbeat keep rsc_lcm_0:0 running untouched, how can i =
avoid this scenary.
my resource configuration is:
<clone id=3D"rsc_lcm" ordered=3D"false" interleave=3D"false" notify=3D"tru=
e">
<instance_attributes id=3D"ow_lcm_instance_attributes">
<attributes>
<nvpair id=3D"ow_lcm_clone_max" name=3D"clone_max" value=3D"2"/>
<nvpair id=3D"ow_lcm_clone_node_max" name=3D"clone_node_max" value=3D"1"/>
</attributes>
</instance_attributes>
<primitive class=3D"ocf" type=3D"owlcm-ra" provider=3D"heartbeat" id=3D"rs=
c_lcm_0" is_managed=3D"true">
<operations>
<op id=3D"rsc_lcm_op0" name=3D"start" timeout=3D"50s" disabled=3D"false"/>
<op id=3D"rsc_lcm_op2" name=3D"monitor" interval=3D"5s" timeout=3D"4s" dis=
abled=3D"false" role=3D"Started"/>
</operations>
<instance_attributes id=3D"ow_lcm_ra_instance_attributes">
<attributes>
<nvpair id=3D"rsc_lcm_vip" name=3D"vip" value=3D"192.168.2.231 192.168.2.2=
32"/>
</attributes>
</instance_attributes>
</primitive>
</clone>
thanks a lot
=20
=D4=DA2007-09-26=A3=AC"Joseph Lamoree" <jlamoree at gmail.com> =D0=B4=B5=C0=A3=
=BA
Take a look at the crm_resource page:
http://www.linux-ha.org/v2/AdminTools/crm_resource
There is an example of stopping/starting resources there. Also, most
of the executables in heartbeat support a --help option to display a
lot of very useful information.
--
Joseph Lamoree
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA
mailing list