[Linux-HA] colocating resources on failed restart
:CRM-Stable-4a0d4e40eeb0
Alex and Gill Strachan
asgks at yahoo.com
Mon Nov 6 04:19:21 MST 2006
I removed the is_managed='true" from all of the resources primtives but kept it in the group.
on crm_verify -L -VVV the resource_sinfds3A_aims now shows up as unmanaged - Yipee !!!
More testing to do. Now I wonder what will happed when resource_sinfids3A_oracle becomes unmanaged... :-)
----- Original Message ----
From: Andreas Kurz <akurz at sms.at>
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
Sent: Monday, 6 November, 2006 6:38:56 PM
Subject: Re: [Linux-HA] colocating resources on failed restart :CRM-Stable-4a0d4e40eeb0
Hello!
For system upgrades I prefer disabling the management (and so the
monitoring) of the affected resource. Works fine for me ... in your case
e.g:
crm_resource -p is_managed -r resource_sinfids3A_aims -t primitive -v off
... do your maintenance ....
crm_resource -p is_managed -r resource_sinfids3A_aims -t primitive -v on
Regards,
Andi
> I converted the cib.xml back to using groups which fixes the colocation problem but how to do perform system upgrades.
>
> e.g.
> I run command
> crm_resource -r resource_sinfids3A_aims -p target_role -v stopped
> my hope here is that hb will stop the resource and stop monitoring, allowing the aims software to be manually updated/started/checked/stopped; then run
> crm_resource -r resource_sinfids3A_aims -p target_role -v started
> hb will start the resource if necessary then continue to monitor.
>
> What actually happens is that hb stops ALL resources within the group_sinfids3A - aarrrgh.
>
>
> I realise that I am very confused on how to configure the primitive restart_type, and the on_fail for the monitor operation. Any help on this would be wonderful.
>
>
> The scenarios are
>
> crm_resource -r resource_sinfids3A_aims -p target_role -v stopped [as above]
> stop aims - allow manual updates...
>
> crm_resource -r resource_sinfids3A_oralsnr -p target_role -v stopped
> stop oracle listener - allow manaul updates
>
>
>
> crm_resource -r resource_sinfids3A_oracle -p target_role -v stopped
>
> stop aims, oracle listener, then oracle
>
>
> crm_resource -r resource_sinfids3A_smb -p target_role -v stopped
> stop samba - allow manual updates
>
>
> Node: sinfids3b1 (338afa76-8997-4d66-8381-fc36ec4b456b): online
> Node: sinfids3a2 (ec74bd17-2016-4d32-a694-0f6983121cd9): online
> Node: sinfids3a1 (b757aece-0e47-41e5-92b7-6a80b4f3eea7): online
>
> Resource Group: group_sinfids3
> resource_sinfids3_vip (heartbeat::ocf:IPaddr): Started sinfids3a1
> Resource Group: group_sinfids3A
> resource_sinfids3A_vip (heartbeat::ocf:IPaddr): Started sinfids3a1
> resource_sinfids3A_drbd (heartbeat:drbddisk): Started sinfids3a1
> resource_sinfids3A_fs (heartbeat::ocf:Filesystem): Started sinfids3a1
> resource_sinfids3A_smb (lsb:smb): Started sinfids3a1
> resource_sinfids3A_oracle (heartbeat::ocf:oracle): Started sinfids3a1
> resource_sinfids3A_oralsnr (heartbeat::ocf:oralsnr): Started sinfids3a1
> resource_sinfids3A_aims (lsb:aims): Started sinfids3a1
> Resource Group: group_sinfids3B
> resource_sinfids3B_vip (heartbeat::ocf:IPaddr): Started sinfids3b1
>
>
> My current settings for the primitives, res_order and monitor:
>
> [root at sinfids3b1 ~]# egrep "primitive |monitor|rsc_ord|group " saved-cib.xml
> <group id="group_sinfids3" ordered="true" collocated="true" is_managed="true" restart_type="restart">
> <primitive id="resource_sinfids3_vip" class="ocf" type="IPaddr" provider="heartbeat" is_managed="true" restart_type="restart">
> <op id="IPaddr_sinfids3_vip_mon" interval="60s" name="monitor" timeout="15s" on_fail="restart"/>
> <group id="group_sinfids3A" ordered="true" collocated="true" is_managed="true" restart_type="ignore">
> <primitive id="resource_sinfids3A_vip" class="ocf" type="IPaddr" provider="heartbeat" is_managed="true" restart_type="restart">
> <op id="IPaddr_sinfids3A_vip_mon" interval="60s" name="monitor" timeout="15s" on_fail="restart"/>
> <primitive id="resource_sinfids3A_drbd" class="heartbeat" type="drbddisk" provider="heartbeat" is_managed="true" restart_type="restart">
> <op id="drbddisk_sinfids3A_drbd_mon" name="monitor" interval="60s" timeout="60s" on_fail="restart"/>
> <primitive class="ocf" type="Filesystem" provider="heartbeat" id="resource_sinfids3A_fs" is_managed="true" restart_type="restart">
> <op name="monitor" timeout="60s" id="Filesystem_sinfids3A_fs_mon" interval="60s" on_fail="restart"/>
> <primitive class="lsb" type="smb" id="resource_sinfids3A_smb" is_managed="true" restart_type="restart">
> <op name="monitor" timeout="60s" id="smb_sinfids3A_smb_mon" interval="30s" on_fail="restart"/>
> <primitive class="ocf" type="oracle" provider="heartbeat" id="resource_sinfids3A_oracle" is_managed="true" restart_type="restart">
> <op name="monitor" timeout="60s" id="oracle_sinfids3A_oracle_mon" interval="300s" on_fail="restart"/>
> <primitive class="ocf" type="oralsnr" provider="heartbeat" id="resource_sinfids3A_oralsnr" is_managed="true" restart_type="restart">
> <op name="monitor" timeout="60s" id="oralsnr_sinfids3A_oralsnr_mon" interval="300s" on_fail="restart"/>
> <primitive class="lsb" type="aims" id="resource_sinfids3A_aims" is_managed="true" restart_type="ignore">
> <op name="monitor" timeout="240s" id="aims_sinfids3A_aims_mon" interval="180s" on_fail="restart"/>
> <group id="group_sinfids3B" ordered="true" collocated="true" is_managed="true" restart_type="restart">
> <primitive id="resource_sinfids3B_vip" class="ocf" type="IPaddr" provider="heartbeat" is_managed="true" restart_type="restart">
> <op id="IPaddr_sinfids3B_vip_mon" interval="60s" name="monitor" timeout="15s" on_fail="restart"/>
> <rsc_order id="order_sinfids3_sinfids3A" from="group_sinfids3" action="start" type="after" to="group_sinfids3A" symmetrical="true"/>
> <rsc_order id="order_sinfids3_sinfids3B" from="group_sinfids3" action="start" type="after" to="group_sinfids3B" symmetrical="true"/>
> <rsc_order id="order_sinfids3A_vip" from="resource_sinfids3A_vip" action="start" type="before" to="resource_sinfids3A_drbd" symmetrical="true"/>
> <rsc_order id="order_sinfids3A_drbd" from="resource_sinfids3A_drbd" action="start" type="after" to="resource_sinfids3A_vip" symmetrical="true"/>
> <rsc_order id="order_sinfids3A_fs" from="resource_sinfids3A_fs" action="start" type="after" to="resource_sinfids3A_drbd" symmetrical="true"/>
> <rsc_order id="order_sinfids3A_smb" from="resource_sinfids3A_smb" action="start" type="after" to="resource_sinfids3A_fs" symmetrical="true"/>
> <rsc_order id="order_sinfids3A_oracle" from="resource_sinfids3A_oracle" action="start" type="after" to="resource_sinfids3A_fs" symmetrical="true"/>
> <rsc_order id="order_sinfids3A_oralsnr" from="resource_sinfids3A_oralsnr" action="start" type="after" to="resource_sinfids3A_oracle" symmetrical="true"/>
> <rsc_order id="order_sinfids3A_aims" from="resource_sinfids3A_aims" action="start" type="after" to="resource_sinfids3A_oralsnr" symmetrical="true"/>
>
>
>
>
> ----- Original Message ----
> From: Serge Dubrouski <sergeyfd at gmail.com>
> To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
> Sent: Sunday, 5 November, 2006 11:47:05 PM
> Subject: Re: [Linux-HA] colocating resources on failed restart :CRM-Stable-4a0d4e40eeb0
>
> Why not to ombine your resources into a group with collcated=true. In
> this case they'll always stick together for all operations:
> start/stop/move etc...
>
> On 11/4/06, Alex and Gill Strachan <asgks at yahoo.com> wrote:
>> I have a group of resources linked by the name 3A, these resources must always run together so I allocated large co-location scores.
>>
>> When the resource_sinfids3A_aims fails and it is moved to another node I need all of the 3A resources to move with it and to start before.
>>
>> e.g.
>> resource_sinfids3A_aims fails on node 3a2
>> hb restarts and reduces node weight for that node..
>> resource_sinfids3A_aims fails on node 3a2
>> hb is unable to restart on node 3a2 so decides to relocate to 3a1
>>
>> ...How do I inform hb to stop all the other 3A resources on 3a2 and move
>> ...everything to 3a1, also starting in a particular order.
>>
>> Why didn't the colocation scores help in keeping the 3A resources together?
>>
>>
>> I originally had colocation scores of INFINITY for the 3A group but this then prevents the ability to specifiy that resource smb can fail 3 times while resource aims can only fail once.
>>
>>
>> I originally had this working by using groups and on_fail="fence" but it doesn't offer enough flexibility.
>>
>> e.g.
>> I would like heartbeat to restart smb on failure 3 times before moving to another node; using resource_stickiness. When using groups the restart of smb would trigger a stop of all higher resources, then start smb followed by start the higher resources. This behaviour was not wanted.
>>
>>
>>
>>
>>
>>
>> ============
>> Last updated: Sun Nov 5 14:02:46 2006
>> Current DC: sinfids3a2 (ec74bd17-2016-4d32-a694-0f6983121cd9)
>> 3 Nodes configured.
>> 9 Resources configured.
>> ============
>>
>> Node: sinfids3b1 (338afa76-8997-4d66-8381-fc36ec4b456b): online
>> resource_sinfids3B_vip (heartbeat::ocf:IPaddr)
>> Node: sinfids3a2 (ec74bd17-2016-4d32-a694-0f6983121cd9): online
>> resource_sinfids3A_drbd (heartbeat:drbddisk)
>> resource_sinfids3A_fs (heartbeat::ocf:Filesystem)
>> resource_sinfids3A_smb (lsb:smb)
>> resource_sinfids3A_vip (heartbeat::ocf:IPaddr)
>> resource_sinfids3A_oralsnr (heartbeat::ocf:oralsnr)
>> resource_sinfids3_vip (heartbeat::ocf:IPaddr)
>> resource_sinfids3A_oracle (heartbeat::ocf:oracle)
>> resource_sinfids3A_aims (lsb:aims)
>> Node: sinfids3a1 (b757aece-0e47-41e5-92b7-6a80b4f3eea7): online
>>
>>
>>
>> <rsc_order id="order_sinfids3_sinfids3A" from="resource_sinfids3_vip" type="after" to="resource_sinfids3A_vip"/>
>> <rsc_order id="order_sinfids3_sinfids3B" from="resource_sinfids3_vip" type="after" to="resource_sinfids3B_vip"/>
>> <rsc_order id="order_sinfids3A_drbd" from="resource_sinfids3A_drbd" type="after" to="resource_sinfids3A_vip"/>
>> <rsc_order id="order_sinfids3A_fs" from="resource_sinfids3A_fs" type="after" to="resource_sinfids3A_drbd"/>
>> <rsc_order id="order_sinfids3A_smb" from="resource_sinfids3A_smb" type="after" to="resource_sinfids3A_fs"/>
>> <rsc_order id="order_sinfids3A_oracle" from="resource_sinfids3A_oracle" type="after" to="resource_sinfids3A_fs"/>
>> <rsc_order id="order_sinfids3A_oralsnr" from="resource_sinfids3A_oralsnr" type="after" to="resource_sinfids3A_oracle"/>
>> <rsc_order id="order_sinfids3A_aims" from="resource_sinfids3A_aims" type="after" to="resource_sinfids3A_oralsnr"/>
>>
>> <rsc_colocation id="colocation_sinfids3_sinfids3A" from="resource_sinfids3_vip" to="resource_sinfids3A_vip" score="9000"/>
>> <rsc_colocation id="colocation_sinfids3_sinfids3B" from="resource_sinfids3_vip" to="resource_sinfids3B_vip" score="3000"/>
>>
>> <rsc_colocation id="colocation_sinfids3A_drbd" from="resource_sinfids3A_drbd" to="resource_sinfids3A_vip" score="100000"/>
>> <rsc_colocation id="colocation_sinfids3A_fs" from="resource_sinfids3A_fs" to="resource_sinfids3A_drbd" score="100000"/>
>> <rsc_colocation id="colocation_sinfids3A_smb" from="resource_sinfids3A_smb" to="resource_sinfids3A_fs" score="100000"/>
>> <rsc_colocation id="colocation_sinfids3A_oracle" from="resource_sinfids3A_oracle" to="resource_sinfids3A_fs" score="100000"/>
>> <rsc_colocation id="colocation_sinfids3A_oralsnr" from="resource_sinfids3A_oralsnr" to="resource_sinfids3A_oracle" score="100000"/>
>> <rsc_colocation id="colocation_sinfids3A_aims" from="resource_sinfids3A_aims" to="resource_sinfids3A_oralsnr" score="100000"/>
>>
>>
>> <primitive class="lsb" type="aims" id="resource_sinfids3A_aims" restart_type="restart">
>> <operations>
>> <op name="monitor" timeout="240s" id="aims_sinfids3A_aims_mon" interval="180s"/>
>> </operations>
>> <instance_attributes id="resource_sinfids3A_aims">
>> <attributes>
>> <nvpair id="resource_sinfids3A_aims-target_role" name="target_role" value="started"/>
>> </attributes>
>> </instance_attributes>
>> </primitive>
>>
>>
>>
>>
>>
>>
>> Send instant messages to your online friends http://au.messenger.yahoo.com
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>
>
>
>
> Send instant messages to your online friends http://au.messenger.yahoo.com
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Send instant messages to your online friends http://au.messenger.yahoo.com
More information about the Linux-HA
mailing list