[Linux-HA] pingd failover in active/standby cluster

Matt Zagrabelny mzagrabe at d.umn.edu
Wed Oct 3 13:37:39 MDT 2007


Update:

I ran ptest on 'cody', the primary node. It looks like it is computing
the values incorrectly for pingd. The multiplier seems to be only "1",
when in fact in the cib.xml file it is "100". Thus I am getting the
following comparison in the output of the ptest command:

debug: native_assign_node: Color external_VIP, Node[0] cody: 51
debug: native_assign_node: Color external_VIP, Node[1] tim: 2

which is wrong, it should be:

debug: native_assign_node: Color external_VIP, Node[0] cody: 150
debug: native_assign_node: Color external_VIP, Node[1] tim: 200

and 'tim' should then get the resources. Also, shouldn't it be computing
the values for the group which is "monolith_resources" not
"external_VIP"?

With regard to the last question, I do not understand why the following
resource (internal_VIP) is getting different values from above (51 and
2), it is in the same resource group (monolith_resources) as
external_VIP:

debug: native_assign_node: Color internal_VIP, Node[0] cody: 1000000
debug: native_assign_node: Color internal_VIP, Node[1] tim: -1000000


Here is the output I am referencing for the ptest command.

# ptest -VVVVVV -L
ptest[3876]: 2007/10/03_14:05:59 info: main: =#=#=#=#= Getting XML
=#=#=#=#=
ptest[3876]: 2007/10/03_14:05:59 info: main: Reading XML from: live
cluster
ptest[3876]: 2007/10/03_14:05:59 notice: main: Required feature set:
1.1 
ptest[3876]: 2007/10/03_14:05:59 notice: cluster_option: Using default
value '60s' for cluster option 'cluster-delay'
ptest[3876]: 2007/10/03_14:05:59 notice: cluster_option: Using default
value '-1' for cluster option 'pe-error-series-max'
ptest[3876]: 2007/10/03_14:05:59 notice: cluster_option: Using default
value '-1' for cluster option 'pe-warn-series-max'
ptest[3876]: 2007/10/03_14:05:59 debug: unpack_config: Default action
timeout: 5s
ptest[3876]: 2007/10/03_14:05:59 debug: unpack_config: Default
stickiness: 0
ptest[3876]: 2007/10/03_14:05:59 debug: unpack_config: Default failure
stickiness: 0
ptest[3876]: 2007/10/03_14:05:59 debug: unpack_config: STONITH of failed
nodes is disabled
ptest[3876]: 2007/10/03_14:05:59 debug: unpack_config: Cluster is
symmetric - resources can run anywhere by default
ptest[3876]: 2007/10/03_14:05:59 debug: unpack_config: On loss of CCM
Quorum: Stop ALL resources
ptest[3876]: 2007/10/03_14:05:59 info: determine_online_status: Node tim
is online
ptest[3876]: 2007/10/03_14:05:59 info: determine_online_status: Node
cody is online
ptest[3876]: 2007/10/03_14:05:59 info: unpack_find_resource: Internally
renamed pingd-child:0 on cody to pingd-child:1
ptest[3876]: 2007/10/03_14:05:59 debug: get_node_score: Rule
monolith_resources_connected_rule: node tim had value 2 for pingd
ptest[3876]: 2007/10/03_14:05:59 debug: get_node_score: Rule
monolith_resources_connected_rule: node cody had value 1 for pingd
ptest[3876]: 2007/10/03_14:05:59 debug: get_node_score: Rule
monolith_resources_connected_rule: node tim had value 2 for pingd
ptest[3876]: 2007/10/03_14:05:59 debug: get_node_score: Rule
monolith_resources_connected_rule: node cody had value 1 for pingd
ptest[3876]: 2007/10/03_14:05:59 info: group_print: Resource Group:
monolith_resources
ptest[3876]: 2007/10/03_14:05:59 info: native_print:     external_VIP
(heartbeat::ocf:IPaddr2):       Started cody
ptest[3876]: 2007/10/03_14:05:59 info: native_print:     internal_VIP
(heartbeat::ocf:IPaddr2):       Started cody
ptest[3876]: 2007/10/03_14:05:59 info: clone_print: Clone Set: pingd
ptest[3876]: 2007/10/03_14:05:59 info: native_print:     pingd-child:0
(heartbeat::ocf:pingd): Started tim 
ptest[3876]: 2007/10/03_14:05:59 info: native_print:     pingd-child:1
(heartbeat::ocf:pingd): Started cody
ptest[3876]: 2007/10/03_14:05:59 debug: group_rsc_location: Processing
rsc_location prefered_location_monolith_resources for monolith_resources
ptest[3876]: 2007/10/03_14:05:59 debug: group_rsc_location: Processing
rsc_location monolith_resources_connected_rule for monolith_resources
ptest[3876]: 2007/10/03_14:05:59 debug: native_print: Allocating:
external_VIP  (heartbeat::ocf:IPaddr2):       Started cody
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Color
external_VIP, Node[0] cody: 51
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Color
external_VIP, Node[1] tim: 2
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Assigning
cody to external_VIP
ptest[3876]: 2007/10/03_14:05:59 debug: native_print: Allocating:
internal_VIP  (heartbeat::ocf:IPaddr2):       Started cody
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Color
internal_VIP, Node[0] cody: 1000000
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Color
internal_VIP, Node[1] tim: -1000000
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Assigning
cody to internal_VIP
ptest[3876]: 2007/10/03_14:05:59 notice: NoRoleChange: Leave resource
external_VIP      (cody)
ptest[3876]: 2007/10/03_14:05:59 notice: NoRoleChange: Leave resource
internal_VIP      (cody)
ptest[3876]: 2007/10/03_14:05:59 debug: native_print: Allocating:
pingd-child:0 (heartbeat::ocf:pingd): Started tim 
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Color
pingd-child:0, Node[0] tim: 1
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Color
pingd-child:0, Node[1] cody: 0
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Assigning
tim to pingd-child:0
ptest[3876]: 2007/10/03_14:05:59 debug: native_print: Allocating:
pingd-child:1 (heartbeat::ocf:pingd): Started cody
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Color
pingd-child:1, Node[0] cody: 1
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Color
pingd-child:1, Node[1] tim: -1000000
ptest[3876]: 2007/10/03_14:05:59 debug: native_assign_node: Assigning
cody to pingd-child:1
ptest[3876]: 2007/10/03_14:05:59 debug: clone_color: Allocated 2 pingd
instances of a possible 2
ptest[3876]: 2007/10/03_14:05:59 notice: NoRoleChange: Leave resource
pingd-child:0     (tim)
ptest[3876]: 2007/10/03_14:05:59 notice: NoRoleChange: Leave resource
pingd-child:1     (cody)
ptest[3876]: 2007/10/03_14:05:59 debug: init_dotfile: PE_DOT:  digraph
"g" {
ptest[3876]: 2007/10/03_14:05:59 debug: main: PE_DOT: }
ptest[3876]: 2007/10/03_14:05:59 info: unpack_graph: Unpacked transition
0: 0 actions in 0 synapses
ptest[3876]: 2007/10/03_14:05:59 info: set_default_graph_functions:
Setting default graph functions
ptest[3876]: 2007/10/03_14:05:59 debug: run_graph:
====================================================
ptest[3876]: 2007/10/03_14:05:59 info: run_graph: Transition 0:
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0)

Again, any insight is much appreciated. Thanks for listening.

On Wed, 2007-10-03 at 11:39 -0500, Matt Zagrabelny wrote:
> Hello,
> 
> I have a problem with my heartbeat not failing over resources in my
> active/standby cluster when a ping node goes "down". I have been reading
> the archives the past couple of days and it looks like a somewhat
> frequent question/problem. Unfortunately, after reading the list
> archives, I am still unable to solve my problem.
> 
> Scenario:
> 
> Active/Standby firewall cluster.
> 
>                  Internet
> 
> +---eth0---+                 +---eth0---+
> |          |                 |          |
> |       eth2-----------------eth2       |
> |  (cody)  |     Heartbeat   |   (tim)  |
> | /dev/ttyS0-----------------/dev/ttyS0 |
> |          |                 |          |
> +---eth1---+                 +---eth1---+
> 
>                  Intranet
> 
> 
> 'cody' is the primary firewall box and 'tim' is the backup.
> 
> A single resource group:
> 
> Resource Group: monolith_resources
>     external_VIP        (heartbeat::ocf:IPaddr2):       Started cody
>     internal_VIP        (heartbeat::ocf:IPaddr2):       Started cody
> 
> A ping node on each interface's network to verify connectivity for that
> interface. 
> 
> Some relevant configs:
> 
> # cat /etc/ha.d/ha.cf
> use_logd on
> 
> keepalive 1
> deadtime 5
> initdead 120
> 
> udpport 694
> baud 115200
> serial /dev/ttyS0
> bcast eth2
> 
> node cody
> node tim
> 
> ping 131.212.4.158
> ping 192.168.115.38
> 
> crm on
> 
> # cat /var/lib/heartbeat/crm/cib.xml
> <?xml version="1.0"?>
> <cib admin_epoch="0" epoch="0" num_updates="0">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <attributes>
>           <nvpair id="cib-bootstrap-options-symmetric-cluster"
> name="symmetric-cluster"                   value="true"/>
>           <nvpair id="cib-bootstrap-options-no-quorum-policy"
> name="no-quorum-policy"                    value="stop"/>
>           <nvpair id="cib-bootstrap-options-default-resource-stickiness"
> name="default-resource-stickiness"         value="0"/>
>           <nvpair
> id="cib-bootstrap-options-default-resource-failure-stickiness"
> name="default-resource-failure-stickiness" value="0"/>
>           <nvpair id="cib-bootstrap-options-stonith-enabled"
> name="stonith-enabled"                     value="false"/>
>           <nvpair id="cib-bootstrap-options-stonith-action"
> name="stonith-action"                      value="reboot"/>
>           <nvpair id="cib-bootstrap-options-startup-fencing"
> name="startup-fencing"                     value="true"/>
>           <nvpair id="cib-bootstrap-options-stop-orphan-resources"
> name="stop-orphan-resources"               value="true"/>
>           <nvpair id="cib-bootstrap-options-stop-orphan-actions"
> name="stop-orphan-actions"                 value="true"/>
>           <nvpair id="cib-bootstrap-options-remove-after-stop"
> name="remove-after-stop"                   value="false"/>
>           <nvpair id="cib-bootstrap-options-short-resource-names"
> name="short-resource-names"                value="true"/>
>           <nvpair id="cib-bootstrap-options-transition-idle-timeout"
> name="transition-idle-timeout"             value="5min"/>
>           <nvpair id="cib-bootstrap-options-default-action-timeout"
> name="default-action-timeout"              value="30s"/>
>           <nvpair id="cib-bootstrap-options-is-managed-default"
> name="is-managed-default"                  value="true"/>
>           <nvpair id="cib-bootstrap-options-pe-input-series-max"
> name="pe-input-series-max"                 value="400"/>
>         </attributes>
>       </cluster_property_set>
>     </crm_config>
>     <nodes/>
>     <resources>
>       <group id="monolith_resources">
>         <primitive id="external_VIP" class="ocf" provider="heartbeat"
> type="IPaddr2">
>           <operations>
>             <op id="external_VIP_mon" name="monitor" interval="5s"
> timeout="5s"/>
>           </operations>
>           <instance_attributes id="external_VIP_inst_attr">
>             <attributes>
>               <nvpair id="external_VIP_ip_assignment" name="ip"
> value="131.212.4.153"/>
>             </attributes>
>           </instance_attributes>
>         </primitive>
>         <primitive id="internal_VIP" class="ocf" provider="heartbeat"
> type="IPaddr2">
>           <operations>
>             <op id="internal_VIP_mon" name="monitor" interval="5s"
> timeout="5s"/>
>           </operations>
>           <instance_attributes id="internal_VIP_inst_attr">
>             <attributes>
>               <nvpair id="internal_VIP_ip_assignment" name="ip"
> value="192.168.115.33"/>
>             </attributes>
>           </instance_attributes>
>         </primitive>
>       </group>
>       <clone id="pingd" globally_unique="false">
>         <instance_attributes id="pingd_inst_attr">
>           <attributes>
>             <nvpair id="pingd-clone_max"      name="clone_max"
> value="2"/>
>             <nvpair id="pingd-clone_node_max" name="clone_node_max"
> value="1"/>
>             <nvpair id="pingd-dampen"         name="dampen"
> value="5s"/>
>             <nvpair id="pingd-multiplier"     name="multiplier"
> value="100"/>
>           </attributes>
>         </instance_attributes>
>         <primitive id="pingd-child" provider="heartbeat" class="ocf"
> type="pingd">
>           <operations>
>             <op id="pingd-child-monitor" name="monitor" interval="20s"
> timeout="40s" prereq="nothing"/>
>             <op id="pingd-child-start" name="start" prereq="nothing"/>
>           </operations>
>         </primitive>
>       </clone>
>     </resources>
>     <constraints>
>       <rsc_location id="monolith_resources_location"
> rsc="monolith_resources">
>         <rule id="prefered_location_monolith_resources" score="50">
>           <expression id="prefered_location_host_cody"
> attribute="#uname" operation="eq" value="cody"/>
>         </rule>
>       </rsc_location>
>       <rsc_location id="monolith_resources_connected"
> rsc="monolith_resources">
>         <rule id="monolith_resources_connected_rule"
> score_attribute="pingd" >
>           <expression id="connected_via_ping" attribute="pingd"
> operation="defined"/>
>         </rule>
>       </rsc_location>
>     </constraints>
>   </configuration>
>   <status/>
> </cib>
> 
> 
> I have read in a previous thread on the list where Andrew Beekhof wrote:
> 
> >> if you can get the cluster into a state where:
> >> - both nodes are online
> >> - both nodes can see each other
> >> - only one node can see the ping node
> >> - the resource isnt running on the machine that can see the ping node
> 
> >> then run "cibadmin -Q" and attach the results.
> >> that will be enough to tell me roughly where the problem lies.
> 
> So that is what I did. Here is the result of the "cibadmin -Q" command.
> Note that I ran this command on the node that cannot see the ping node,
> but still has the resources.
> 
> <cib admin_epoch="0" epoch="0" num_updates="32" generated="true"
> have_quorum="true" ignore_dtd="false" num_peers="2"
> cib-last-written="Tue Oct  2 16:14:04 2007" ccm_transition="2"
> cib_feature_revision="1.3"
> dc_uuid="e723a418-ba24-470e-9540-fbb568b9bcb4">
>    <configuration>
>      <crm_config>
>        <cluster_property_set id="cib-bootstrap-options">
>          <attributes>
>            <nvpair id="cib-bootstrap-options-symmetric-cluster"
> name="symmetric-cluster" value="true"/>
>            <nvpair id="cib-bootstrap-options-no-quorum-policy"
> name="no-quorum-policy" value="stop"/>
>            <nvpair
> id="cib-bootstrap-options-default-resource-stickiness"
> name="default-resource-stickiness" value="0"/>
>            <nvpair
> id="cib-bootstrap-options-default-resource-failure-stickiness"
> name="default-resource-failure-stickiness" value="0"/>
>            <nvpair id="cib-bootstrap-options-stonith-enabled"
> name="stonith-enabled" value="false"/>
>            <nvpair id="cib-bootstrap-options-stonith-action"
> name="stonith-action" value="reboot"/>
>            <nvpair id="cib-bootstrap-options-startup-fencing"
> name="startup-fencing" value="true"/>
>            <nvpair id="cib-bootstrap-options-stop-orphan-resources"
> name="stop-orphan-resources" value="true"/>
>            <nvpair id="cib-bootstrap-options-stop-orphan-actions"
> name="stop-orphan-actions" value="true"/>
>            <nvpair id="cib-bootstrap-options-remove-after-stop"
> name="remove-after-stop" value="false"/>
>            <nvpair id="cib-bootstrap-options-short-resource-names"
> name="short-resource-names" value="true"/>
>            <nvpair id="cib-bootstrap-options-transition-idle-timeout"
> name="transition-idle-timeout" value="5min"/>
>            <nvpair id="cib-bootstrap-options-default-action-timeout"
> name="default-action-timeout" value="5s"/>
>            <nvpair id="cib-bootstrap-options-is-managed-default"
> name="is-managed-default" value="true"/>
>            <nvpair id="cib-bootstrap-options-pe-input-series-max"
> name="pe-input-series-max" value="400"/>
>          </attributes>
>        </cluster_property_set>
>      </crm_config>
>      <nodes>
>        <node id="e723a418-ba24-470e-9540-fbb568b9bcb4" uname="tim"
> type="normal"/>
>        <node id="ccca855c-2191-4aa8-8707-88237b72112c" uname="cody"
> type="normal"/>
>      </nodes>
>      <resources>
>        <group id="monolith_resources">
>          <primitive id="external_VIP" class="ocf" provider="heartbeat"
> type="IPaddr2">
>            <operations>
>              <op id="external_VIP_mon" name="monitor" interval="5s"
> timeout="5s"/>
>            </operations>
>            <instance_attributes id="external_VIP_inst_attr">
>              <attributes>
>                <nvpair id="external_VIP_ip_assignment" name="ip"
> value="131.212.4.153"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>          <primitive id="internal_VIP" class="ocf" provider="heartbeat"
> type="IPaddr2">
>            <operations>
>              <op id="internal_VIP_mon" name="monitor" interval="5s"
> timeout="5s"/>
>            </operations>
>            <instance_attributes id="internal_VIP_inst_attr">
>              <attributes>
>                <nvpair id="internal_VIP_ip_assignment" name="ip"
> value="192.168.115.33"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>        </group>
>        <clone id="pingd" globally_unique="false">
>          <instance_attributes id="pingd_inst_attr">
>            <attributes>
>              <nvpair id="pingd-clone_max" name="clone_max" value="2"/>
>              <nvpair id="pingd-clone_node_max" name="clone_node_max"
> value="1"/>
>              <nvpair id="pingd-dampen" name="dampen" value="5s"/>
>              <nvpair id="pingd-multiplier" name="multiplier"
> value="100"/>
>            </attributes>
>          </instance_attributes>
>          <primitive id="pingd-child" provider="heartbeat" class="ocf"
> type="pingd">
>            <operations>
>              <op id="pingd-child-monitor" name="monitor" interval="20s"
> timeout="40s" prereq="nothing"/>
>              <op id="pingd-child-start" name="start" prereq="nothing"/>
>            </operations>
>          </primitive>
>        </clone>
>      </resources>
>      <constraints>
>        <rsc_location id="monolith_resources_location"
> rsc="monolith_resources">
>          <rule id="prefered_location_monolith_resources" score="50">
>            <expression id="prefered_location_host_cody"
> attribute="#uname" operation="eq" value="cody"/>
>          </rule>
>        </rsc_location>
>        <rsc_location id="monolith_resources_connected"
> rsc="monolith_resources">
>          <rule id="monolith_resources_connected_rule"
> score_attribute="pingd">
>            <expression id="connected_via_ping" attribute="pingd"
> operation="defined"/>
>          </rule>
>        </rsc_location>
>      </constraints>
>    </configuration>
>    <status>
>      <node_state id="e723a418-ba24-470e-9540-fbb568b9bcb4" uname="tim"
> crmd="online" crm-debug-origin="do_update_resource" shutdown="0"
> in_ccm="true" ha="active" join="member" expected="member">
>        <lrm id="e723a418-ba24-470e-9540-fbb568b9bcb4">
>          <lrm_resources>
>            <lrm_resource id="pingd-child:0" type="pingd" class="ocf"
> provider="heartbeat">
>              <lrm_rsc_op id="pingd-child:0_monitor_0"
> operation="monitor" crm-debug-origin="do_update_resource"
> transition_key="5:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:7;5:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="4" crm_feature_set="1.0.9" rc_code="7" op_status="0"
> interval="0" op_digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
>              <lrm_rsc_op id="pingd-child:0_start_0" operation="start"
> crm-debug-origin="do_update_resource"
> transition_key="13:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:0;13:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="6" crm_feature_set="1.0.9" rc_code="0" op_status="0"
> interval="0" op_digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
>              <lrm_rsc_op id="pingd-child:0_monitor_20000"
> operation="monitor" crm-debug-origin="do_update_resource"
> transition_key="14:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:0;14:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="7" crm_feature_set="1.0.9" rc_code="0" op_status="0"
> interval="20000" op_digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
>            </lrm_resource>
>            <lrm_resource id="external_VIP" type="IPaddr2" class="ocf"
> provider="heartbeat">
>              <lrm_rsc_op id="external_VIP_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource"
> transition_key="3:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:7;3:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="2" crm_feature_set="1.0.9" rc_code="7" op_status="0"
> interval="0" op_digest="295d3e33fcc839a733bd86ee666491f6"/>
>            </lrm_resource>
>            <lrm_resource id="internal_VIP" type="IPaddr2" class="ocf"
> provider="heartbeat">
>              <lrm_rsc_op id="internal_VIP_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource"
> transition_key="4:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:7;4:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="3" crm_feature_set="1.0.9" rc_code="7" op_status="0"
> interval="0" op_digest="c309120cbcd5acf8308b6af00c3fd33c"/>
>            </lrm_resource>
>            <lrm_resource id="pingd-child:1" type="pingd" class="ocf"
> provider="heartbeat">
>              <lrm_rsc_op id="pingd-child:1_monitor_0"
> operation="monitor" crm-debug-origin="do_update_resource"
> transition_key="3:1:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:7;3:1:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="5" crm_feature_set="1.0.9" rc_code="7" op_status="0"
> interval="0" op_digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
>            </lrm_resource>
>          </lrm_resources>
>        </lrm>
>        <transient_attributes id="e723a418-ba24-470e-9540-fbb568b9bcb4">
>          <instance_attributes
> id="status-e723a418-ba24-470e-9540-fbb568b9bcb4">
>            <attributes>
>              <nvpair
> id="status-e723a418-ba24-470e-9540-fbb568b9bcb4-probe_complete"
> name="probe_complete" value="true"/>
>              <nvpair
> id="status-e723a418-ba24-470e-9540-fbb568b9bcb4-pingd" name="pingd"
> value="2"/>
>            </attributes>
>          </instance_attributes>
>        </transient_attributes>
>      </node_state>
>      <node_state id="ccca855c-2191-4aa8-8707-88237b72112c" uname="cody"
> crmd="online" crm-debug-origin="do_update_resource" in_ccm="true"
> ha="active" join="member" expected="member" shutdown="0">
>        <lrm id="ccca855c-2191-4aa8-8707-88237b72112c">
>          <lrm_resources>
>            <lrm_resource id="external_VIP" type="IPaddr2" class="ocf"
> provider="heartbeat">
>              <lrm_rsc_op id="external_VIP_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource"
> transition_key="7:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:7;7:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="2" crm_feature_set="1.0.9" rc_code="7" op_status="0"
> interval="0" op_digest="295d3e33fcc839a733bd86ee666491f6"/>
>              <lrm_rsc_op id="external_VIP_start_0" operation="start"
> crm-debug-origin="do_update_resource"
> transition_key="5:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:0;5:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="7" crm_feature_set="1.0.9" rc_code="0" op_status="0"
> interval="0" op_digest="295d3e33fcc839a733bd86ee666491f6"/>
>              <lrm_rsc_op id="external_VIP_monitor_5000"
> operation="monitor" crm-debug-origin="do_update_resource"
> transition_key="6:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:0;6:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="9" crm_feature_set="1.0.9" rc_code="0" op_status="0"
> interval="5000" op_digest="295d3e33fcc839a733bd86ee666491f6"/>
>            </lrm_resource>
>            <lrm_resource id="pingd-child:0" type="pingd" class="ocf"
> provider="heartbeat">
>              <lrm_rsc_op id="pingd-child:0_monitor_0"
> operation="monitor" crm-debug-origin="do_update_resource"
> transition_key="9:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:7;9:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="4" crm_feature_set="1.0.9" rc_code="7" op_status="0"
> interval="0" op_digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
>            </lrm_resource>
>            <lrm_resource id="internal_VIP" type="IPaddr2" class="ocf"
> provider="heartbeat">
>              <lrm_rsc_op id="internal_VIP_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource"
> transition_key="8:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:7;8:0:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="3" crm_feature_set="1.0.9" rc_code="7" op_status="0"
> interval="0" op_digest="c309120cbcd5acf8308b6af00c3fd33c"/>
>              <lrm_rsc_op id="internal_VIP_start_0" operation="start"
> crm-debug-origin="do_update_resource"
> transition_key="7:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:0;7:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="10" crm_feature_set="1.0.9" rc_code="0" op_status="0"
> interval="0" op_digest="c309120cbcd5acf8308b6af00c3fd33c"/>
>              <lrm_rsc_op id="internal_VIP_monitor_5000"
> operation="monitor" crm-debug-origin="do_update_resource"
> transition_key="8:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:0;8:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="12" crm_feature_set="1.0.9" rc_code="0" op_status="0"
> interval="5000" op_digest="c309120cbcd5acf8308b6af00c3fd33c"/>
>            </lrm_resource>
>            <lrm_resource id="pingd-child:1" type="pingd" class="ocf"
> provider="heartbeat">
>              <lrm_rsc_op id="pingd-child:1_monitor_0"
> operation="monitor" crm-debug-origin="do_update_resource"
> transition_key="5:1:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:7;5:1:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="6" crm_feature_set="1.0.9" rc_code="7" op_status="0"
> interval="0" op_digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
>              <lrm_rsc_op id="pingd-child:1_start_0" operation="start"
> crm-debug-origin="do_update_resource"
> transition_key="15:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:0;15:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="8" crm_feature_set="1.0.9" rc_code="0" op_status="0"
> interval="0" op_digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
>              <lrm_rsc_op id="pingd-child:1_monitor_20000"
> operation="monitor" crm-debug-origin="do_update_resource"
> transition_key="16:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> transition_magic="0:0;16:2:5dfc6976-0cfa-491f-b6c7-80b0c9e3f212"
> call_id="11" crm_feature_set="1.0.9" rc_code="0" op_status="0"
> interval="20000" op_digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
>            </lrm_resource>
>          </lrm_resources>
>        </lrm>
>        <transient_attributes id="ccca855c-2191-4aa8-8707-88237b72112c">
>          <instance_attributes
> id="status-ccca855c-2191-4aa8-8707-88237b72112c">
>            <attributes>
>              <nvpair
> id="status-ccca855c-2191-4aa8-8707-88237b72112c-probe_complete"
> name="probe_complete" value="true"/>
>              <nvpair
> id="status-ccca855c-2191-4aa8-8707-88237b72112c-pingd" name="pingd"
> value="1"/>
>            </attributes>
>          </instance_attributes>
>        </transient_attributes>
>      </node_state>
>    </status>
>  </cib>
> 
> 
> Here are some log file snippets too:
> 
> From cody (primary node)
> 
> heartbeat[29689]: 2007/10/03_10:49:18 WARN: node 192.168.115.38: is dead
> crmd[29708]: 2007/10/03_10:49:18 notice: crmd_ha_status_callback: Status
> update: Node 192.168.115.38 now has status [dead]
> heartbeat[29689]: 2007/10/03_10:49:18 info: Link
> 192.168.115.38:192.168.115.38 dead.
> pingd[29865]: 2007/10/03_10:49:18 notice: pingd_nstatus_callback: Status
> update: Ping node 192.168.115.38 now has status [dead]
> pingd[29865]: 2007/10/03_10:49:18 info: send_update: 1 active ping nodes
> pingd[29865]: 2007/10/03_10:49:18 notice: pingd_lstatus_callback: Status
> update: Ping node 192.168.115.38 now has status [dead]
> pingd[29865]: 2007/10/03_10:49:18 notice: pingd_nstatus_callback: Status
> update: Ping node 192.168.115.38 now has status [dead]
> pingd[29865]: 2007/10/03_10:49:18 info: send_update: 1 active ping nodes
> crmd[29708]: 2007/10/03_10:49:18 WARN: get_uuid: Could not calculate
> UUID for 192.168.115.38
> attrd[29707]: 2007/10/03_10:49:19 info: attrd_trigger_update: Sending
> flush op to all hosts for: pingd
> attrd[29707]: 2007/10/03_10:49:19 info: attrd_ha_callback: flush message
> from cody
> attrd[29707]: 2007/10/03_10:49:19 info: attrd_perform_update: Sent
> update 6: pingd=1
> cib[29704]: 2007/10/03_10:51:26 info: cib_stats: Processed 111
> operations (4864.00us average, 0% utilization) in the last 10min
> 
> 
> From tim (backup node)
> 
> attrd[27093]: 2007/10/03_10:49:19 info: attrd_ha_callback: flush message
> from cody
> attrd[27093]: 2007/10/03_10:49:19 info: attrd_perform_update: Sent
> update 7: pingd=2
> tengine[27097]: 2007/10/03_10:49:20 info: extract_event: Aborting on
> transient_attributes changes for ccca855c-2191-4aa8-8707-88237b72112c
> tengine[27097]: 2007/10/03_10:49:20 info: update_abort_priority: Abort
> priority upgraded to 1000000
> tengine[27097]: 2007/10/03_10:49:20 info: te_update_diff: Aborting on
> transient_attributes deletions
> crmd[27094]: 2007/10/03_10:49:20 info: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_IPC_MESSAGE origin=route_message ]
> crmd[27094]: 2007/10/03_10:49:20 info: do_state_transition: All 2
> cluster nodes are eligible to run resources.
> pengine[27098]: 2007/10/03_10:49:20 notice: cluster_option: Using
> default value '60s' for cluster option 'cluster-delay'
> pengine[27098]: 2007/10/03_10:49:20 notice: cluster_option: Using
> default value '-1' for cluster option 'pe-error-series-max'
> pengine[27098]: 2007/10/03_10:49:20 notice: cluster_option: Using
> default value '-1' for cluster option 'pe-warn-series-max'
> pengine[27098]: 2007/10/03_10:49:20 info: determine_online_status: Node
> tim is online
> pengine[27098]: 2007/10/03_10:49:20 info: determine_online_status: Node
> cody is online
> pengine[27098]: 2007/10/03_10:49:20 info: unpack_find_resource:
> Internally renamed pingd-child:0 on cody to pingd-child:1
> pengine[27098]: 2007/10/03_10:49:20 info: group_print: Resource Group:
> monolith_resources
> pengine[27098]: 2007/10/03_10:49:20 info: native_print:     external_VIP
> (heartbeat::ocf:IPaddr2):       Started cody
> pengine[27098]: 2007/10/03_10:49:20 info: native_print:     internal_VIP
> (heartbeat::ocf:IPaddr2):       Started cody
> pengine[27098]: 2007/10/03_10:49:20 info: clone_print: Clone Set: pingd
> pengine[27098]: 2007/10/03_10:49:20 info: native_print:
> pingd-child:0       (heartbeat::ocf:pingd): Started tim
> pengine[27098]: 2007/10/03_10:49:20 info: native_print:
> pingd-child:1       (heartbeat::ocf:pingd): Started cody
> pengine[27098]: 2007/10/03_10:49:20 notice: NoRoleChange: Leave resource
> external_VIP   (cody)
> pengine[27098]: 2007/10/03_10:49:20 notice: NoRoleChange: Leave resource
> internal_VIP   (cody)
> pengine[27098]: 2007/10/03_10:49:20 notice: NoRoleChange: Leave resource
> pingd-child:0  (tim)
> pengine[27098]: 2007/10/03_10:49:20 notice: NoRoleChange: Leave resource
> pingd-child:1  (cody)
> crmd[27094]: 2007/10/03_10:49:20 info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=route_message ]
> tengine[27097]: 2007/10/03_10:49:20 info: unpack_graph: Unpacked
> transition 5: 0 actions in 0 synapses
> crmd[27094]: 2007/10/03_10:49:20 info: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_IPC_MESSAGE origin=route_message ]
> tengine[27097]: 2007/10/03_10:49:20 info: run_graph: Transition 5:
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0)
> tengine[27097]: 2007/10/03_10:49:20 info: notify_crmd: Transition 5
> status: te_complete - <null>
> pengine[27098]: 2007/10/03_10:49:20 info: process_pe_message: Transition
> 5: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-127.bz2
> cib[27090]: 2007/10/03_10:51:29 info: cib_stats: Processed 60 operations
> (11000.00us average, 0% utilization) in the last 10min
> 
> The "NoRoleChange" sticks out as being a possible problem and I am
> guessing that the cause of that has to be with the scores for the
> resources in the constraints section of the cib.xml. However, I am
> unable to determine what exactly is the issue.
> 
> I guess that is about it. Help is much appreciated, thank you.
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
-- 
Matt Zagrabelny - mzagrabe at d.umn.edu - (218) 726 8844
University of Minnesota Duluth
Information Technology Systems & Services
PGP key 1024D/84E22DA2 2005-11-07
Fingerprint: 78F9 18B3 EF58 56F5 FC85  C5CA 53E7 887F 84E2 2DA2

He is not a fool who gives up what he cannot keep to gain what he cannot
lose.
-Jim Elliot
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20071003/080796d7/attachment-0001.pgp


More information about the Linux-HA mailing list