[Linux-HA] heartbeat 2.0.8: pingd failover
greno at verizon.net
greno at verizon.net
Sat Feb 3 00:18:08 MST 2007
I have heartbeat running in a 2-node active/passive setup and I've configured a resource 'ip_resource' to monitor in cib. Whenever I unplug the network cable from the active server, I see messages that indicate that the resource is stopped and then restarted again but on the active server rather than the failover server as I was expecting. Is there something wrong in the config files? Some details...
Refresh in 4s...
============
Last updated: Sat Feb 3 01:17:56 2007
Current DC: server2 (29626f17-db1f-4139-aa33-5a6b4110da51)
2 Nodes configured.
1 Resources configured.
============
ip_resource (heartbeat::ocf:IPaddr): Started server1
=====================
logfacility daemon
keepalive 1
deadtime 10
warntime 5
initdead 1208
udpport 694
ping 192.168.1.1
bcast eth0 eth1
auto_failback off
respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score
node server1
node server2
use_logd yes
compression bz2
compression_threshold 2
crm yes
=====================
<cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" generated="true" epoch="14" num_updates="171" cib-last-written="Sat Feb 3 01:18:23 2007" ccm_transition="2" dc_uuid="29626f17-db1f-4139-aa33-5a6b4110da51">
<configuration>
<crm_config/>
<nodes>
<node id="29626f17-db1f-4139-aa33-5a6b4110da51" uname="server2" type="normal"/>
<node id="67b0bfa7-0165-4a8c-9c0f-ec82e0ae2c91" uname="server1" type="normal"/>
</nodes>
<resources>
<primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
<instance_attributes id="ip_attributes">
<attributes>
<nvpair id="ip" name="ip" value="192.168.1.215"/>
</attributes>
</instance_attributes>
</primitive>
</resources>
<constraints>
<rsc_location id="run_ip_resource" rsc="ip_resource">
<rule id="pref_run_ip_resource1" score="100">
<expression id="expr1" attribute="#uname" operation="eq" value="server1"/>
</rule>
<rule id="pref_run_ip_resource2" score="000">
<expression id="expr2" attribute="#uname" operation="eq" value="server2"/>
</rule>
</rsc_location>
<rsc_location id="ip_resource:not_connected" rsc="ip_resource">
<rule id="ip_resource:not_connected:rule" score="-INFINITY">
<expression id="ip_resource:not_connected:expr" attribute="pingd_score" operation="not_defined"/>
</rule>
</rsc_location>
</constraints>
</configuration>
</cib>
=====================
Feb 3 01:12:32 server1 heartbeat: [6149]: info: Enabling logging daemon
Feb 3 01:12:32 server1 heartbeat: [6149]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf)
Feb 3 01:12:32 server1 heartbeat: [6149]: WARN: logd is enabled but logfile/debugfile/logfacility is still configured in ha.cf
Feb 3 01:12:32 server1 heartbeat: [6149]: info: **************************
Feb 3 01:12:32 server1 heartbeat: [6149]: info: Configuration validated. Starting heartbeat 2.0.8
Feb 3 01:12:32 server1 heartbeat: [6150]: info: heartbeat: version 2.0.8
Feb 3 01:12:32 server1 heartbeat: [6150]: info: Heartbeat generation: 13
Feb 3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb 3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb 3 01:12:32 server1 heartbeat: [6150]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: ping heartbeat started.
Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
Feb 3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Feb 3 01:12:32 server1 heartbeat: [6150]: info: Local status now set to: 'up'
Feb 3 01:12:33 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 up.
Feb 3 01:12:33 server1 heartbeat: [6150]: info: Status update for node 192.168.1.1: status ping
Feb 3 01:12:34 server1 heartbeat: [6150]: info: Link server1:eth0 up.
Feb 3 01:12:34 server1 heartbeat: [6150]: info: Link server1:eth1 up.
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Link server2:eth0 up.
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Link server2:eth1 up.
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Status update for node server2: status up
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Comm_now_up(): updating status to active
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Local status now set to: 'active'
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" (0,0)
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/ccm" (100,101)
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/cib" (100,101)
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/lrmd -r" (0,0)
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/attrd" (100,101)
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/crmd" (100,101)
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/mgmtd -v" (0,0)
Feb 3 01:12:45 server1 heartbeat: [6150]: info: Status update for node server2: status active
Feb 3 01:12:45 server1 heartbeat: [6165]: info: Starting "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" as uid 0 gid 0 (pid 6165)
Feb 3 01:12:45 server1 heartbeat: [6166]: info: Starting "/usr/lib/heartbeat/ccm" as uid 100 gid 101 (pid 6166)
Feb 3 01:12:45 server1 heartbeat: [6167]: info: Starting "/usr/lib/heartbeat/cib" as uid 100 gid 101 (pid 6167)
Feb 3 01:12:45 server1 cib: [6167]: WARN: crm_is_writable: /var/lib/heartbeat/crm/cib.xml should be owned and r/w by group haclient
Feb 3 01:12:45 server1 cib: [6167]: info: readCibXmlFile: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
Feb 3 01:12:45 server1 cib: [6167]: info: log_data_element: readCibXmlFile: [on-disk] <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
Feb 3 01:12:45 server1 heartbeat: [6168]: info: Starting "/usr/lib/heartbeat/lrmd -r" as uid 0 gid 0 (pid 6168)
Feb 3 01:12:45 server1 heartbeat: [6169]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 6169)
Feb 3 01:12:45 server1 heartbeat: [6170]: info: Starting "/usr/lib/heartbeat/attrd" as uid 100 gid 101 (pid 6170)
Feb 3 01:12:45 server1 heartbeat: [6171]: info: Starting "/usr/lib/heartbeat/crmd" as uid 100 gid 101 (pid 6171)
Feb 3 01:12:45 server1 heartbeat: [6172]: info: Starting "/usr/lib/heartbeat/mgmtd -v" as uid 0 gid 0 (pid 6172)
Feb 3 01:12:45 server1 stonithd: [6169]: info: Signing in with heartbeat.
Feb 3 01:12:46 server1 stonithd: [6169]: notice: /usr/lib/heartbeat/stonithd start up successfully.
Feb 3 01:12:50 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [19:21]
Feb 3 01:12:50 server1 heartbeat: [6150]: info: No pkts missing from server2!
Feb 3 01:12:51 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [23:25]
Feb 3 01:12:51 server1 heartbeat: [6150]: info: No pkts missing from server2!
Feb 3 01:13:01 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [43:45]
Feb 3 01:13:01 server1 heartbeat: [6150]: info: No pkts missing from server2!
Feb 3 01:18:17 server1 heartbeat: [6150]: info: Link server2:eth0 dead.
Feb 3 01:18:18 server1 heartbeat: [6150]: WARN: node 192.168.1.1: is dead
Feb 3 01:18:18 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 dead.
Feb 3 02:06:44 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 up.
Feb 3 02:06:44 server1 heartbeat: [6150]: WARN: Late heartbeat: Node 192.168.1.1: interval 2917470 ms
Feb 3 02:06:44 server1 heartbeat: [6150]: info: Status update for node 192.168.1.1: status ping
Feb 3 02:06:45 server1 heartbeat: [6150]: info: Link server2:eth0 up.
=====================
Feb 3 01:12:43 server2 heartbeat: [6016]: info: Enabling logging daemon
Feb 3 01:12:43 server2 heartbeat: [6016]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf)
Feb 3 01:12:43 server2 heartbeat: [6016]: WARN: logd is enabled but logfile/debugfile/logfacility is still configured in ha.cf
Feb 3 01:12:43 server2 heartbeat: [6016]: info: **************************
Feb 3 01:12:43 server2 heartbeat: [6016]: info: Configuration validated. Starting heartbeat 2.0.8
Feb 3 01:12:43 server2 heartbeat: [6017]: info: heartbeat: version 2.0.8
Feb 3 01:12:43 server2 heartbeat: [6017]: info: Heartbeat generation: 14
Feb 3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb 3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb 3 01:12:43 server2 heartbeat: [6017]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: ping heartbeat started.
Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
Feb 3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Feb 3 01:12:44 server2 heartbeat: [6017]: info: Local status now set to: 'up'
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link 192.168.1.1:192.168.1.1 up.
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Status update for node 192.168.1.1: status ping
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server1:eth0 up.
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Status update for node server1: status up
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server2:eth0 up.
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server1:eth1 up.
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server2:eth1 up.
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Comm_now_up(): updating status to active
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Local status now set to: 'active'
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" (0,0)
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/ccm" (100,101)
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/cib" (100,101)
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/lrmd -r" (0,0)
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/attrd" (100,101)
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/crmd" (100,101)
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/mgmtd -v" (0,0)
Feb 3 01:12:45 server2 heartbeat: [6017]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 60 ms (> 50 ms) (GSource: 0x9f568d0)
Feb 3 01:12:45 server2 heartbeat: [6031]: info: Starting "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" as uid 0 gid 0 (pid 6031)
Feb 3 01:12:45 server2 heartbeat: [6032]: info: Starting "/usr/lib/heartbeat/ccm" as uid 100 gid 101 (pid 6032)
Feb 3 01:12:45 server2 heartbeat: [6033]: info: Starting "/usr/lib/heartbeat/cib" as uid 100 gid 101 (pid 6033)
Feb 3 01:12:45 server2 cib: [6033]: WARN: crm_is_writable: /var/lib/heartbeat/crm/cib.xml should be owned and r/w by group haclient
Feb 3 01:12:45 server2 cib: [6033]: info: readCibXmlFile: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
Feb 3 01:12:45 server2 cib: [6033]: info: log_data_element: readCibXmlFile: [on-disk] <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
Feb 3 01:12:45 server2 heartbeat: [6034]: info: Starting "/usr/lib/heartbeat/lrmd -r" as uid 0 gid 0 (pid 6034)
Feb 3 01:12:45 server2 heartbeat: [6035]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 6035)
Feb 3 01:12:45 server2 heartbeat: [6017]: info: Status update for node server1: status active
Feb 3 01:12:45 server2 heartbeat: [6036]: info: Starting "/usr/lib/heartbeat/attrd" as uid 100 gid 101 (pid 6036)
Feb 3 01:12:45 server2 heartbeat: [6037]: info: Starting "/usr/lib/heartbeat/crmd" as uid 100 gid 101 (pid 6037)
Feb 3 01:12:45 server2 heartbeat: [6038]: info: Starting "/usr/lib/heartbeat/mgmtd -v" as uid 0 gid 0 (pid 6038)
Feb 3 01:12:45 server2 stonithd: [6035]: info: Signing in with heartbeat.
Feb 3 01:12:45 server2 stonithd: [6035]: notice: /usr/lib/heartbeat/stonithd start up successfully.
Feb 3 01:12:50 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [45:47]
Feb 3 01:12:50 server2 heartbeat: [6017]: info: No pkts missing from server1!
Feb 3 01:12:52 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [54:56]
Feb 3 01:12:52 server2 heartbeat: [6017]: info: No pkts missing from server1!
Feb 3 01:12:57 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [64:66]
Feb 3 01:12:57 server2 heartbeat: [6017]: info: No pkts missing from server1!
Feb 3 01:15:04 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Stopped
Feb 3 01:15:04 server2 pengine: [6049]: info: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-29.bz2
Feb 3 01:15:05 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Stopped
Feb 3 01:15:05 server2 pengine: [6049]: info: process_pe_message: Transition 1: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-30.bz2
Feb 3 01:15:07 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Started server1
Feb 3 01:15:07 server2 pengine: [6049]: info: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-31.bz2
Feb 3 01:18:17 server2 heartbeat: [6017]: info: Link server1:eth0 dead.
Feb 3 01:18:24 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Started server1
Feb 3 01:18:24 server2 pengine: [6049]: info: process_pe_message: Transition 3: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-32.bz2
=====================
More information about the Linux-HA
mailing list