[Linux-HA] failed resource in a group

Dominik Klein dk at in-telegence.net
Wed Jul 9 04:07:19 MDT 2008


Ehlers, Kolja wrote:
> I would like to use groups for my resources. But always if I manually stop
> one of the resources in the group all resources will be shutted down and
> restarted by heartbeat. I have tryed to find any information if this
> behaviour is normal, but I have not found anything about it. Only that the
> resources are started and stopped sequentially. Is this normal, and can I
> prevent that? The log does not tell me anything
> 
> at this point I stopped tomcat_21
> 
> crmd[32141]: 2008/07/09_11:53:14 info: process_lrm_event: LRM operation
> tomcat_21_monitor_5000 (call=99, rc=7) complete
> tengine[32148]: 2008/07/09_11:53:14 info: process_graph_event: Action
> tomcat_21_monitor_5000 arrived after a completed transition
> tengine[32148]: 2008/07/09_11:53:14 info: update_abort_priority: Abort
> priority upgraded to 1000000
> tengine[32148]: 2008/07/09_11:53:14 WARN: update_failcount: Updating
> failcount for tomcat_21 on 3a325e23-2184-46ed-9e88-42a11f28c2be after failed
> monitor: rc=7
> crmd[32141]: 2008/07/09_11:53:14 info: do_state_transition: State transition
> S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE
> origin=route_message ]
> crmd[32141]: 2008/07/09_11:53:14 info: do_state_transition: All 1 cluster
> nodes are eligible to run resources.
> pengine[32149]: 2008/07/09_11:53:14 info: determine_online_status: Node
> www1test is online
> pengine[32149]: 2008/07/09_11:53:14 WARN: unpack_rsc_op: Processing failed
> op tomcat_21_monitor_5000 on www1test: Error
> pengine[32149]: 2008/07/09_11:53:14 notice: group_print: Resource Group:
> group_1
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:
> IPaddr_192_168_11_25      (ocf::heartbeat:IPaddr):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     apache_2
> (ocf::heartbeat:apache):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: group_print: Resource Group:
> group_2
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_21
> (ocf::heartbeat:tomcat):        Started www1test FAILED
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_22
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_22sdb
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_30
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_34
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_35
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_36
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_37
> (ocf::heartbeat:tomcat):        Started www1test
> tengine[32148]: 2008/07/09_11:53:14 info: extract_event: Aborting on
> transient_attributes changes for 3a325e23-2184-46ed-9e88-42a11f28c2be
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_38
> (ocf::heartbeat:tomcat):        Started www1test
> tengine[32148]: 2008/07/09_11:53:14 WARN: notify_crmd: Delaying completion
> until all CIB updates complete
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> IPaddr_192_168_11_25   (www1test)
> tengine[32148]: 2008/07/09_11:53:14 info: te_update_diff: Aborting on
> transient_attributes deletions
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> apache_2       (www1test)
> tengine[32148]: 2008/07/09_11:53:14 WARN: notify_crmd: Delaying completion
> until all CIB updates complete
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Recover resource
> tomcat_21    (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: StopRsc:   www1test Stop
> tomcat_21
> pengine[32149]: 2008/07/09_11:53:14 notice: StartRsc:  www1test Start
> tomcat_21
> pengine[32149]: 2008/07/09_11:53:14 notice: RecurringOp: www1test
> tomcat_21_monitor_5000
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_22      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_22sdb   (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_30      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_34      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_35      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_36      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_37      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_38      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 info: process_pe_message: Transition 10:
> PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-68.bz2
> pengine[32149]: 2008/07/09_11:53:14 info: determine_online_status: Node
> www1test is online
> pengine[32149]: 2008/07/09_11:53:14 WARN: unpack_rsc_op: Processing failed
> op tomcat_21_monitor_5000 on www1test: Error
> pengine[32149]: 2008/07/09_11:53:14 notice: group_print: Resource Group:
> group_1
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:
> IPaddr_192_168_11_25      (ocf::heartbeat:IPaddr):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     apache_2
> (ocf::heartbeat:apache):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: group_print: Resource Group:
> group_2
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_21
> (ocf::heartbeat:tomcat):        Started www1test FAILED
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_22
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_22sdb
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_30
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_34
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_35
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_36
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_37
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: native_print:     tomcat_38
> (ocf::heartbeat:tomcat):        Started www1test
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> IPaddr_192_168_11_25   (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> apache_2       (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Recover resource
> tomcat_21    (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: StopRsc:   www1test Stop
> tomcat_21
> pengine[32149]: 2008/07/09_11:53:14 notice: StartRsc:  www1test Start
> tomcat_21
> pengine[32149]: 2008/07/09_11:53:14 notice: RecurringOp: www1test
> tomcat_21_monitor_5000
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_22      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_22sdb   (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_30      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_34      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_35      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_36      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_37      (www1test)
> pengine[32149]: 2008/07/09_11:53:14 notice: NoRoleChange: Leave resource
> tomcat_38      (www1test)
> crmd[32141]: 2008/07/09_11:53:14 info: do_state_transition: State transition
> S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=route_message ]
> tengine[32148]: 2008/07/09_11:53:14 info: unpack_graph: Unpacked transition
> 11: 32 actions in 32 synapses
> tengine[32148]: 2008/07/09_11:53:14 info: te_pseudo_action: Pseudo action 43
> fired and confirmed
> tengine[32148]: 2008/07/09_11:53:14 info: send_rsc_command: Initiating
> action 39: tomcat_38_stop_0 on www1test
> crmd[32141]: 2008/07/09_11:53:14 info: do_lrm_rsc_op: Performing
> op=tomcat_38_stop_0 key=39:11:a5a5ae88-f0aa-4e5a-9c45-59cfb6304a70)
> lrmd[32138]: 2008/07/09_11:53:14 info: rsc:tomcat_38: stop
> 
> and here it is now stopping tomcat_38, tomcat_37 ... the whole group in
> reverse order.

Groups are ordered by default. This means: if you stop the first 
resource in the group, all subsequent resources are stopped before.

If you do not want this and understand the change, you can set
<group id="whatever" ordered="false">

If you post your cib.xml, we might confirm this assumption.

Regards
Dominik


More information about the Linux-HA mailing list