[Linux-HA] a strange case of mixing ping nodes and member nodes
Dejan Muhamedagic
dejanmm at fastmail.fm
Thu Sep 7 17:24:08 MDT 2006
Replying to myself, my favourite activity:
This seems to have been already reported and Andrew produced a
patch for it. Sorry for noise.
Cheers,
Dejan
On Thu, Sep 07, 2006 at 05:45:45PM +0200, Dejan Muhamedagic wrote:
> ***********************
> Warning: Your file, sapcl01.xml.bz2, appears to be a compressed file but is corrupt. It was not scanned by InterScan MSS.
> ***********************
>
>
> Hello,
>
> The cluster has three nodes (sapcl01,02,03) and sapcl01 has
> since last night a rather strange status section. The
> node_state of sapcl02 has vanished and in it's stead there's
> now a node_state of one of the ping nodes. This is the
> crm_diff of the global CIB and the sapcl01 CIB:
>
> <diff>
> <diff-removed>
> <cib>
> <status>
> <node_state uname="9.158.30.40" crmd="offline" in_ccm="false" ha="dead" join="down" id="bdcbaad6-5fdc-4880-a309-ccaaf70db357"/>
> </status>
> </cib>
> </diff-removed>
> <diff-added>
> <cib>
> <status>
> <node_state uname="sapcl02" crmd="online" in_ccm="true" ha="active" join="member" id="bdcbaad6-5fdc-4880-a309-ccaaf70db357">
> <transient_attributes id="bdcbaad6-5fdc-4880-a309-ccaaf70db357" __crm_diff_marker__="added:top">
> <instance_attributes id="status-bdcbaad6-5fdc-4880-a309-ccaaf70db357">
> <attributes>
> <nvpair id="status-bdcbaad6-5fdc-4880-a309-ccaaf70db357-pingd" name="pingd" value="300"/>
> <nvpair id="status-bdcbaad6-5fdc-4880-a309-ccaaf70db357-probe_complete" name="probe_complete" value="true"/>
> </attributes>
> </instance_attributes>
> </transient_attributes>
> </node_state>
> </status>
> </cib>
> </diff-added>
> </diff>
>
> crm_mon on sapcl01 shows both that sapcl02 is the current DC
> and that it's offline.
>
> The ping node 9.158.30.40 is running AIX and has no
> heartbeat installed.
>
> I'll attach the relevant stuff, though I can't see any clues
> in the logs. Hope that somebody out there will have more
> luck.
>
> Cheers,
>
> Dejan
>
>
> traditional_compression false
> coredumps true
> use_logd yes
> keepalive 2
> warntime 6
> deadtime 8
> initdead 10
> deadping 6
> mcast eth0 225.0.0.1 694 1 0
> mcast eth1 225.0.0.2 694 1 0
> auto_failback legacy
> node sapcl01
> node sapcl02
> node sapcl03
> #node lingws
> ping 9.158.3.144 9.158.29.46 9.158.30.40
> crm on
> respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s
> debug 1
>
>
> ============
> Last updated: Thu Sep 7 17:28:50 2006
> Current DC: sapcl02 (bdcbaad6-5fdc-4880-a309-ccaaf70db357)
> 3 Nodes configured.
> 6 Resources configured.
> ============
>
> Node: sapcl02 (bdcbaad6-5fdc-4880-a309-ccaaf70db357): OFFLINE
> Node: sapcl01 (cf68c349-b7ca-4495-9aaf-5e158969efef): online
> Node: sapcl03 (4b288449-8f13-4f2b-9e96-ba8c45f3ed8d): online
>
> Resource Group: a1
> IPaddr_10_1_1_22 (heartbeat::ocf:IPaddr2): Started sapcl03
> IPaddr_192_168_1_22 (heartbeat::ocf:IPaddr2): Started sapcl03
> apache_a1 (heartbeat::ocf:apache): Started sapcl03
> Resource Group: a2
> IPaddr_10_1_1_23 (heartbeat::ocf:IPaddr2): Started sapcl01
> IPaddr_192_168_1_23 (heartbeat::ocf:IPaddr2): Started sapcl01
> apache_a2 (heartbeat::ocf:apache): Started sapcl01
> Resource Group: a4
> IPaddr_10_1_1_25 (heartbeat::ocf:IPaddr2): Started sapcl03
> IPaddr_192_168_1_25 (heartbeat::ocf:IPaddr2): Started sapcl03
> apache_a4 (heartbeat::ocf:apache): Started sapcl03
> Resource Group: a5
> IPaddr_10_1_1_26 (heartbeat::ocf:IPaddr2): Started sapcl01
> IPaddr_192_168_1_26 (heartbeat::ocf:IPaddr2): Started sapcl01
> apache_a5 (heartbeat::ocf:apache): Started sapcl01
More information about the Linux-HA
mailing list