[Linux-HA] Node goes offline without any reason

Andrew Beekhof beekhof at gmail.com
Mon Nov 27 10:22:19 MST 2006


can you include the complete logs from both nodes next time please?

On 11/27/06, Marian Neubert <linux-ha at tesla-crew.de> wrote:
> Hi all,
>
> i have a strange problem. we're running a lot of heartbeat-2-clusters,
> most of them are 2-node-clusters. On one cluster with two nodes one
> member sporadically goes to offline mode without any helpful message in
> the logfile. (helpful for me ;o)
>
> the ha.log is attached. (k237 is the node, who's gone offline; k242 is
> the online-node and "uplink" is the "node" for pingd)
>
> regards,
> marian
>
>
> Nov 27 14:29:35 k237 cib: [14452]: info: cib_stats:main.c Processed 4 operations (15000.00us average, 0% utilization) in the last 10min
> Nov 27 14:39:35 k237 cib: [14452]: info: cib_stats:main.c Processed 4 operations (12500.00us average, 0% utilization) in the last 10min
> Nov 27 14:43:40 k237 heartbeat: [14441]: WARN: Late heartbeat: Node uplink: interval 6010 ms
> Nov 27 14:43:50 k237 heartbeat: [14441]: WARN: Late heartbeat: Node uplink: interval 10020 ms
> Nov 27 14:43:54 k237 attrd: [14455]: info: attrd_ha_callback:attrd.c flush message from k242
> Nov 27 14:43:54 k237 attrd: [14455]: info: attrd_ha_callback:attrd.c Sent update 4: pingd=1
> Nov 27 14:43:55 k237 cib: [14452]: info: cib_diff_notify:notify.c Update (client: 23933, call:125): 0.29.2792 -> 0.29.2793 (ok)
> Nov 27 14:43:55 k237 cibmon: [14458]: info: cibmon_diff:cibmon.c [cib_diff_notify] cib_apply_diff confirmed
> Nov 27 14:43:55 k237 cibmon: [14458]: info: cib_apply_diff: Diff: --- 0.29.2792
> Nov 27 14:43:55 k237 cibmon: [14458]: info: cib_apply_diff: Diff: +++ 0.29.2793
> Nov 27 14:43:55 k237 cibmon: [14458]: info: cib_apply_diff: - <cib num_updates="2792"/>
> Nov 27 14:43:55 k237 cibmon: [14458]: info: cib_apply_diff: + <cib num_updates="2793"/>
> Nov 27 14:43:55 k237 cib: [7625]: info: write_cib_contents:io.c Wrote version 0.29.2793 of the CIB to disk (digest: 8b534aa54c270144f88319764bc2f69c)
> Nov 27 14:43:56 k237 cib: [14452]: notice: apply_xml_diff:xml.c Diff application failed!
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:actual_diff: - <cib num_updates="2793"/>
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:actual_diff: + <cib num_updates="2794"/>
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: - <cib num_updates="2793"/>
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: + <cib num_updates="2794">
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: +   <status>
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: +     <node_state id="9f398cb5-480c-4231-bd32-2756558b1b7b">
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: +       <transient_attributes id="9f398cb5-480c-4231-bd32-2756558b1b7b" __crm_diff_marker__="added:top">
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: +         <instance_attributes id="status-9f398cb5-480c-4231-bd32-2756558b1b7b">
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: +           <attributes>
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: +             <nvpair id="status-9f398cb5-480c-4231-bd32-2756558b1b7b-pingd" name="pingd" value="1"/>
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: +           </attributes>
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: +         </instance_attributes>
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: +       </transient_attributes>
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: +     </node_state>
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: +   </status>
> Nov 27 14:43:56 k237 cib: [14452]: info: diff:input_diff: + </cib>
> Nov 27 14:43:56 k237 cib: [14452]: info: cib_process_diff:messages.c Diff 0.29.2793 -> 0.29.2794 not applied to 0.29.2793: Failed application of a global update.  Requesting full refresh.
> Nov 27 14:43:56 k237 cib: [14452]: info: cib_process_diff:messages.c Requesting re-sync from peer: Failed application of a global update.  Requesting full refresh.
> Nov 27 14:43:56 k237 cib: [14452]: WARN: do_cib_notify:notify.c cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh
> Nov 27 14:43:56 k237 cib: [14452]: WARN: cib_process_request:callbacks.c cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh
> Nov 27 14:43:56 k237 attrd: [14455]: ERROR: attrd_cib_callback:attrd.c Update 4 for pingd failed: Application of an update diff failed, requesting a full refresh
> Nov 27 14:43:58 k237 cib: [14452]: info: cib_replace_notify:notify.c Replaced: 0.29.2793 -> 0.29.2794 from (null)
> Nov 27 14:43:58 k237 crmd: [14456]: info: populate_cib_nodes:control.c Requesting the list of configured nodes
> Nov 27 14:43:58 k237 cib: [14452]: info: activateCibXml:io.c CIB size is 274672 bytes (was 275964)
> Nov 27 14:43:58 k237 cib: [14452]: info: cib_diff_notify:notify.c Update (client: k237): 0.29.2793 -> 0.29.2794 (ok)
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cibmon_diff:cibmon.c [cib_diff_notify] cib_replace confirmed
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: Diff: --- 0.29.2793
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: Diff: +++ 0.29.2794
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: - <cib num_updates="2793">
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: -   <status>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: -     <node_state uname="k237" crmd="online" in_ccm="true" ha="active" join="member" id="9f398cb5-480c-4231-bd32-2756558b1b7b">
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: -       <transient_attributes id="9f398cb5-480c-4231-bd32-2756558b1b7b">
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: -         <instance_attributes id="status-9f398cb5-480c-4231-bd32-2756558b1b7b">
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: -           <attributes>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: -             <nvpair id="status-9f398cb5-480c-4231-bd32-2756558b1b7b-probe_complete" name="probe_complete" value="true" __crm_diff_marker__="removed:top"/>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: -           </attributes>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: -         </instance_attributes>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: -       </transient_attributes>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: -     </node_state>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: -   </status>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: - </cib>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: + <cib num_updates="2794">
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: +   <status>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: +     <node_state uname="uplink" crmd="offline" in_ccm="false" ha="dead" join="down" id="9f398cb5-480c-4231-bd32-2756558b1b7b"/>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: +   </status>
> Nov 27 14:43:58 k237 cibmon: [14458]: info: cib_replace: + </cib>
> Nov 27 14:43:58 k237 cib: [7626]: info: write_cib_contents:io.c Wrote version 0.29.2794 of the CIB to disk (digest: 4c8dab13693e429e2e8a620bc7b2e9ff)
> Nov 27 14:43:59 k237 crmd: [14456]: notice: populate_cib_nodes:control.c Node: k242 (uuid: 596d5dcc-0000-4f95-82d1-7c801733dd81)
> Nov 27 14:43:59 k237 crmd: [14456]: notice: populate_cib_nodes:control.c Node: k237 (uuid: 9f398cb5-480c-4231-bd32-2756558b1b7b)
> Nov 27 14:43:59 k237 cib: [14452]: info: cib_diff_notify:notify.c Local-only Change (client:14456, call: 26): 0.29.2794 (ok)
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cibmon_diff:cibmon.c [cib_diff_notify] cib_update confirmed
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cib_update: Local-only Change: 0.29.2794
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cib_update: - <cib>
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cib_update: -   <status>
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cib_update: -     <node_state uname="uplink" crmd="offline" in_ccm="false" id="9f398cb5-480c-4231-bd32-2756558b1b7b"/>
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cib_update: -   </status>
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cib_update: - </cib>
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cib_update: + <cib>
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cib_update: +   <status>
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cib_update: +     <node_state uname="k237" crmd="online" in_ccm="true" id="9f398cb5-480c-4231-bd32-2756558b1b7b"/>
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cib_update: +   </status>
> Nov 27 14:43:59 k237 cibmon: [14458]: info: cib_update: + </cib>
> Nov 27 14:43:59 k237 cib: [7627]: info: write_cib_contents:io.c Wrote version 0.29.2794 of the CIB to disk (digest: a2985bfec8921cd8418a0fd929d57f3f)
> Nov 27 14:49:35 k237 cib: [14452]: info: cib_stats:main.c Processed 11 operations (11818.00us average, 0% utilization) in the last 10min
> Nov 27 14:59:35 k237 cib: [14452]: info: cib_stats:main.c Processed 4 operations (10000.00us average, 0% utilization) in the last 10min
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>


More information about the Linux-HA mailing list