[Linux-HA] a strange case of mixing ping nodes and member nodes

Dejan Muhamedagic dejanmm at fastmail.fm
Thu Sep 7 17:24:08 MDT 2006


Replying to myself, my favourite activity:

This seems to have been already reported and Andrew produced a
patch for it. Sorry for noise.

Cheers,

Dejan

On Thu, Sep 07, 2006 at 05:45:45PM +0200, Dejan Muhamedagic wrote:
> ***********************
> Warning: Your file, sapcl01.xml.bz2, appears to be a compressed file but is corrupt. It was not scanned by InterScan MSS.
> ***********************
> 
> 
> Hello,
> 
> The cluster has three nodes (sapcl01,02,03) and sapcl01 has
> since last night a rather strange status section. The
> node_state of sapcl02 has vanished and in it's stead there's
> now a node_state of one of the ping nodes. This is the
> crm_diff of the global CIB and the sapcl01 CIB:
> 
>  <diff>
>    <diff-removed>
>      <cib>
>        <status>
>          <node_state uname="9.158.30.40" crmd="offline" in_ccm="false" ha="dead" join="down" id="bdcbaad6-5fdc-4880-a309-ccaaf70db357"/>
>        </status>
>      </cib>
>    </diff-removed>
>    <diff-added>
>      <cib>
>        <status>
>          <node_state uname="sapcl02" crmd="online" in_ccm="true" ha="active" join="member" id="bdcbaad6-5fdc-4880-a309-ccaaf70db357">
>            <transient_attributes id="bdcbaad6-5fdc-4880-a309-ccaaf70db357" __crm_diff_marker__="added:top">
>              <instance_attributes id="status-bdcbaad6-5fdc-4880-a309-ccaaf70db357">
>                <attributes>
>                  <nvpair id="status-bdcbaad6-5fdc-4880-a309-ccaaf70db357-pingd" name="pingd" value="300"/>
>                  <nvpair id="status-bdcbaad6-5fdc-4880-a309-ccaaf70db357-probe_complete" name="probe_complete" value="true"/>
>                </attributes>
>              </instance_attributes>
>            </transient_attributes>
>          </node_state>
>        </status>
>      </cib>
>    </diff-added>
>  </diff>
> 
> crm_mon on sapcl01 shows both that sapcl02 is the current DC
> and that it's offline.
> 
> The ping node 9.158.30.40 is running AIX and has no
> heartbeat installed.
> 
> I'll attach the relevant stuff, though I can't see any clues
> in the logs. Hope that somebody out there will have more
> luck.
> 
> Cheers,
> 
> Dejan
> 
> 

> traditional_compression false
> coredumps true
> use_logd yes
> keepalive 2
> warntime 6
> deadtime 8
> initdead 10
> deadping 6
> mcast   eth0 225.0.0.1 694 1 0
> mcast   eth1 225.0.0.2 694 1 0
> auto_failback	legacy
> node sapcl01
> node sapcl02
> node sapcl03
> #node lingws
> ping 9.158.3.144 9.158.29.46 9.158.30.40
> crm	on
> respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s
> debug 1


> 
> 
> ============
> Last updated: Thu Sep  7 17:28:50 2006
> Current DC: sapcl02 (bdcbaad6-5fdc-4880-a309-ccaaf70db357)
> 3 Nodes configured.
> 6 Resources configured.
> ============
> 
> Node: sapcl02 (bdcbaad6-5fdc-4880-a309-ccaaf70db357): OFFLINE
> Node: sapcl01 (cf68c349-b7ca-4495-9aaf-5e158969efef): online
> Node: sapcl03 (4b288449-8f13-4f2b-9e96-ba8c45f3ed8d): online
> 
> Resource Group: a1
>     IPaddr_10_1_1_22	(heartbeat::ocf:IPaddr2):	Started sapcl03
>     IPaddr_192_168_1_22	(heartbeat::ocf:IPaddr2):	Started sapcl03
>     apache_a1	(heartbeat::ocf:apache):	Started sapcl03
> Resource Group: a2
>     IPaddr_10_1_1_23	(heartbeat::ocf:IPaddr2):	Started sapcl01
>     IPaddr_192_168_1_23	(heartbeat::ocf:IPaddr2):	Started sapcl01
>     apache_a2	(heartbeat::ocf:apache):	Started sapcl01
> Resource Group: a4
>     IPaddr_10_1_1_25	(heartbeat::ocf:IPaddr2):	Started sapcl03
>     IPaddr_192_168_1_25	(heartbeat::ocf:IPaddr2):	Started sapcl03
>     apache_a4	(heartbeat::ocf:apache):	Started sapcl03
> Resource Group: a5
>     IPaddr_10_1_1_26	(heartbeat::ocf:IPaddr2):	Started sapcl01
>     IPaddr_192_168_1_26	(heartbeat::ocf:IPaddr2):	Started sapcl01
>     apache_a5	(heartbeat::ocf:apache):	Started sapcl01


More information about the Linux-HA mailing list