[Linux-HA] Ethernet link down NO failover;(

Dejan Muhamedagic dejanmm at fastmail.fm
Fri Sep 7 05:31:35 MDT 2007


On Thu, Sep 06, 2007 at 04:50:25PM +0000, Andre Heine wrote:
> Hello all,
> 
> I test my 2 node cluster with 2.1.2 on SLES 9.
> 
> On my tests I removed the two network cables from the production
> nics (bond0) so I expect a failover.
> 
> But there wasn't any reaction;(
> 
> When I bring all network interfaces down, the slave take the 
> resources. All things works fine

There is a difference between having cables removed (failover
doesn't work) and bringing interfaces down (failover works)?

> bond1 and  bond2 are "direct links/crossover" and I can't see any 
> production IP via this two interfaces.
> 
> On bond0 there are two vlan interfaces. Over these interfaces I can 
> see the production network (see ping_group).
> 
> When I removed the link for bond0 (vlan 425/725) the master can't 
> ping any ip from the ping group.
>
> On the master I get only  these log entries:
> 
> -----------
> heartbeat[18143]: 2007/09/06_16:43:00 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:00 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:00 ERROR: write failure on 
> ping_group prod_lan.: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:02 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:02 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:02 ERROR: write failure on 
> ping_group prod_lan.: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:04 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:04 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:04 ERROR: write failure on 
> ping_group prod_lan.: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:06 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:06 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:06 ERROR: write failure on 
> ping_group prod_lan.: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:08 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:08 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:08 ERROR: write failure on 
> ping_group prod_lan.: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:10 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:10 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:10 ERROR: write failure on 
> ping_group prod_lan.: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:12 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:12 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:12 ERROR: write failure on 
> ping_group prod_lan.: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:14 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:14 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:14 ERROR: write failure on 
> ping_group prod_lan.: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:16 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:16 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:16 ERROR: write failure on 
> ping_group prod_lan.: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:18 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:18 ERROR: glib: Error sending 
> packet: Network is unreachable
> heartbeat[18143]: 2007/09/06_16:43:18 ERROR: write failure on 
> ping_group prod_lan.: Network is unreachable
> heartbeat[18132]: 2007/09/06_16:43:18 WARN: node prod_lan: is dead
> heartbeat[18132]: 2007/09/06_16:43:18 info: Link prod_lan:prod_lan 
> dead.
> crmd[18174]: 2007/09/06_16:43:18 notice: crmd_ha_status_callback: 
> Status update: Node prod_lan now has status [dead]
> heartbeat[18132]: 2007/09/06_16:43:18 info: Link nfs00002:vlan725 
> dead.
> heartbeat[18132]: 2007/09/06_16:43:18 info: Link nfs00002:vlan425 
> dead.
> crmd[18174]: 2007/09/06_16:43:18 WARN: get_uuid: Could not calculate 
> UUID for prod_lan

These log messages look normal given the circumstances.

> Something is wrong in my config...
> 
> Any hints?

You should check that pingd attribute in the node's status
section. Otherwise, please attach full logs.

Thanks,

Dejan

> 
> Ciao
> 
> Andre
> 
> 
> --------------
> 
> # cat /etc/ha.d/ha.cf
> 
> keepalive 2
> 
> deadtime 15
> warntime 10
> 
> initdead 25
> 
> 
> bcast vlan725  # bond0
> bcast vlan425  # bond0
> 
> bcast bond1    # direct link 192.168.1.0
> bcast bond2    # direct link 192.168.2.0
> 
> node nfs00001 nfs00002
> 
> ping_group prod_lan 172.18.8.60 10.18.1.126 10.18.1.116
> 
> 
> crm yes
> use_logd yes
> 
> ###################################################
> 
> # cat cib.xml
>  <cib admin_epoch="0" generated="false" have_quorum="true" 
> ignore_dtd="false" num_peers="0" cib_feature_revision="1.3" 
> epoch="106" num_updates="30" cib-last-written="Thu Sep  6 16:41:03 
> 2007" ccm_transition="1">
>    <configuration>
>      <crm_config>
>        <cluster_property_set id="cib-bootstrap-options">
>          <attributes>
>            <nvpair id="cib-bootstrap-options-symmetric-cluster" 
> name="symmetric-cluster" value="true"/>
>            <nvpair id="cib-bootstrap-options-no-quorum-policy" 
> name="no-quorum-policy" value="stop"/>
>            <nvpair 
> id="cib-bootstrap-options-default-resource-stickiness" 
> name="default-resource-stickiness" value="INFINITY"/>
>            <nvpair 
> id="cib-bootstrap-options-default-resource-failure-stickiness" 
> name="default-resource-failure-stickiness" value="0"/>
>            <nvpair id="cib-bootstrap-options-stonith-enabled" 
> name="stonith-enabled" value="true"/>
>            <nvpair id="cib-bootstrap-options-stonith-action" 
> name="stonith-action" value="reboot"/>
>            <nvpair id="cib-bootstrap-options-stop-orphan-resources" 
> name="stop-orphan-resources" value="true"/>
>            <nvpair id="cib-bootstrap-options-stop-orphan-actions" 
> name="stop-orphan-actions" value="true"/>
>            <nvpair id="cib-bootstrap-options-remove-after-stop" 
> name="remove-after-stop" value="false"/>
>            <nvpair id="cib-bootstrap-options-short-resource-names" 
> name="short-resource-names" value="true"/>
>            <nvpair 
> id="cib-bootstrap-options-transition-idle-timeout" 
> name="transition-idle-timeout" value="5min"/>
>            <nvpair id="cib-bootstrap-options-default-action-timeout" 
> name="default-action-timeout" value="15s"/>
>            <nvpair id="cib-bootstrap-options-is-managed-default" 
> name="is-managed-default" value="true"/>
>            <nvpair id="cib-bootstrap-options-suppress_cib_writes" 
> name="suppress_cib_writes" value="false"/>
>            <nvpair id="cib-bootstrap-options-startup-fencing" 
> name="startup-fencing" value="true"/>
>          </attributes>
>        </cluster_property_set>
>      </crm_config>
>      <nodes>
>        <node uname="nfs00002" type="normal" 
> id="043eb3f2-485e-4647-9220-5fa850c6a4df">
>          <instance_attributes 
> id="nodes-043eb3f2-485e-4647-9220-5fa850c6a4df">
>            <attributes>
>              <nvpair name="standby" 
> id="standby-043eb3f2-485e-4647-9220-5fa850c6a4df" value="false"/>
>            </attributes>
>          </instance_attributes>
>        </node>
>        <node uname="nfs00001" type="normal" 
> id="c610a2fc-70db-4d20-90ad-d928685f1bd9">
>          <instance_attributes 
> id="nodes-c610a2fc-70db-4d20-90ad-d928685f1bd9">
>            <attributes>
>              <nvpair 
> id="standby-c610a2fc-70db-4d20-90ad-d928685f1bd9" name="standby" 
> value="false"/>
>            </attributes>
>          </instance_attributes>
>        </node>
>      </nodes>
>      <resources>
>        <clone id="PingHosts">
>          <instance_attributes id="pingd">
>            <attributes>
>              <nvpair id="pingd-clone_max_val" name="clone_max" 
> value="2"/>
>              <nvpair id="pingd-clone_node_max" name="clone_node_max" 
> value="1"/>
>            </attributes>
>          </instance_attributes>
>          <primitive id="pingd-child" provider="heartbeat" 
> class="ocf" type="pingd">
>            <operations>
>              <op id="pingd-child-monitor" name="monitor" 
> interval="20s" timeout="40s" prereq="nothing"/>
>              <op id="pingd-child-start" name="start" 
> prereq="nothing"/>
>            </operations>
>            <instance_attributes id="pingd_inst_attr">
>              <attributes>
>                <nvpair id="pingd-dampen" name="dampen" value="5s"/>
>                <nvpair id="pingd-multiplier" name="multiplier" 
> value="100"/>
>                <nvpair id="pingd-user" name="user" 
> value="hacluster"/>
>                <nvpair id="pingd-pid" name="pidfile" 
> value="/var/lib/heartbeat/cores/hacluster/pingd.pid"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>        </clone>
>        <clone id="NodeFencing">
>          <instance_attributes id="clone_attrs">
>            <attributes>
>              <nvpair id="clone_max_val" name="clone_max" value="2"/>
>              <nvpair id="clone_node_max_val" name="clone_node_max" 
> value="1"/>
>            </attributes>
>          </instance_attributes>
>          <primitive id="child_NodeFencing" class="stonith" 
> type="external/CB_STONITH.pl" provider="heartbeat">
>            <operations>
>              <op id="child_NodeFencing_mon" name="monitor" 
> interval="5s" timeout="20s" prereq="nothing"/>
>              <op id="child_NodeFencing_start" name="start" 
> timeout="20s" prereq="nothing"/>
>            </operations>
>            <instance_attributes id="child_NodeFencing_attrs">
>              <attributes>
>                <nvpair id="cb_stonith_hostlist" name="hostlist" 
> value="nfs00001-nfs00002"/>
>                <nvpair id="cb_stonith_iloip" name="iloip" 
> value="10_18_2_32-10_18_2_33"/>
>                <nvpair id="cb_stonith_passlist" name="passlist" 
> value="XXXX-XXXX"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>        </clone>
>        <group id="group_NFS">
>          <primitive class="ocf" id="VIP01_172_18_8_11" 
> provider="heartbeat" type="IPaddr2">
>            <operations>
>              <op id="VIP01_mon" interval="5s" name="monitor" 
> timeout="10s"/>
>            </operations>
>            <instance_attributes id="VIP01_inst_attr">
>              <attributes>
>                <nvpair id="VIP01_attr_0" name="ip" 
> value="172.18.8.11"/>
>                <nvpair id="VIP01_attr_1" name="cidr_netmask" 
> value="26"/>
>                <nvpair id="VIP01_attr_2" name="broadcast" 
> value="172.18.8.63"/>
>                <nvpair id="VIP01_attr_3" name="nic" 
> value="vlan425"/>
>                <nvpair id="VIP01_attr_4" name="iflabel" 
> value="vip01"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>          <primitive class="ocf" id="VIP02_172_18_8_12" 
> provider="heartbeat" type="IPaddr2">
>            <operations>
>              <op id="VIP02_mon" interval="5s" name="monitor" 
> timeout="10s"/>
>            </operations>
>            <instance_attributes id="VIP02_inst_attr">
>              <attributes>
>                <nvpair id="VIP02_attr_0" name="ip" 
> value="172.18.8.12"/>
>                <nvpair id="VIP02_attr_1" name="cidr_netmask" 
> value="26"/>
>                <nvpair id="VIP02_attr_2" name="broadcast" 
> value="172.18.8.63"/>
>                <nvpair id="VIP02_attr_3" name="nic" 
> value="vlan425"/>
>                <nvpair id="VIP02_attr_4" name="iflabel" 
> value="vip02"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>          <primitive class="ocf" id="VIP03_172_18_8_13" 
> provider="heartbeat" type="IPaddr2">
>            <operations>
>              <op id="VIP03_mon" interval="5s" name="monitor" 
> timeout="10s"/>
>            </operations>
>            <instance_attributes id="VIP03_inst_attr">
>              <attributes>
>                <nvpair id="VIP03_attr_0" name="ip" 
> value="172.18.8.13"/>
>                <nvpair id="VIP03_attr_1" name="cidr_netmask" 
> value="26"/>
>                <nvpair id="VIP03_attr_2" name="broadcast" 
> value="172.18.8.63"/>
>                <nvpair id="VIP03_attr_3" name="nic" 
> value="vlan425"/>
>                <nvpair id="VIP03_attr_4" name="iflabel" 
> value="vip03"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>        </group>
>      </resources>
>      <constraints>
>        <rsc_location id="rsc_location_group_NFS" rsc="group_NFS">
>          <rule id="prefered_location_group_NFS" score="100">
>            <expression attribute="#uname" 
> id="prefered_location_group_NFS_expr" operation="eq" 
> value="nfs00001"/>
>          </rule>
>        </rsc_location>
>        <rsc_location id="pingd:connected" rsc="group_NFS">
>          <rule id="pingd:connected:rule" 
> score_attribute="default_ping_set">
>            <expression id="pingd:connected:expr:defined" 
> attribute="default_ping_set" operation="defined"/>
>          </rule>
>        </rsc_location>
>      </constraints>
>    </configuration>
>  </cib>
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list