[Linux-HA] heartbeat 2.0.8: pingd failover

greno at verizon.net greno at verizon.net
Sat Feb 3 00:18:08 MST 2007


I have heartbeat running in a 2-node active/passive setup and I've configured a resource 'ip_resource' to monitor in cib.  Whenever I unplug the network cable from the active server, I see messages that indicate  that the resource is stopped and then restarted again but on the active server rather than the failover server as I was expecting.  Is there something wrong in the config files?  Some details...

Refresh in 4s...

============
Last updated: Sat Feb  3 01:17:56 2007
Current DC: server2 (29626f17-db1f-4139-aa33-5a6b4110da51)
2 Nodes configured.
1 Resources configured.
============

ip_resource     (heartbeat::ocf:IPaddr):        Started server1

=====================

logfacility     daemon
keepalive 1
deadtime 10
warntime 5
initdead 1208
udpport 694
ping 192.168.1.1
bcast eth0 eth1
auto_failback off
respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s -a  pingd_score
node    server1
node    server2
use_logd yes
compression     bz2
compression_threshold 2
crm yes

=====================
<cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" generated="true" epoch="14" num_updates="171" cib-last-written="Sat Feb  3 01:18:23 2007" ccm_transition="2" dc_uuid="29626f17-db1f-4139-aa33-5a6b4110da51">
   <configuration>
     <crm_config/>
     <nodes>
       <node id="29626f17-db1f-4139-aa33-5a6b4110da51" uname="server2" type="normal"/>
       <node id="67b0bfa7-0165-4a8c-9c0f-ec82e0ae2c91" uname="server1" type="normal"/>
     </nodes>
     <resources>
       <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
         <instance_attributes id="ip_attributes">
           <attributes>
             <nvpair id="ip" name="ip" value="192.168.1.215"/>
           </attributes>
         </instance_attributes>
       </primitive>
     </resources>
     <constraints>
       <rsc_location id="run_ip_resource" rsc="ip_resource">
         <rule id="pref_run_ip_resource1" score="100">
           <expression id="expr1" attribute="#uname" operation="eq" value="server1"/>
         </rule>
         <rule id="pref_run_ip_resource2" score="000">
           <expression id="expr2" attribute="#uname" operation="eq" value="server2"/>
         </rule>
       </rsc_location>
       <rsc_location id="ip_resource:not_connected" rsc="ip_resource">
         <rule id="ip_resource:not_connected:rule" score="-INFINITY">
           <expression id="ip_resource:not_connected:expr" attribute="pingd_score" operation="not_defined"/>
         </rule>
       </rsc_location>
     </constraints>
   </configuration>
 </cib>

=====================
Feb  3 01:12:32 server1 heartbeat: [6149]: info: Enabling logging daemon 
Feb  3 01:12:32 server1 heartbeat: [6149]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf)
Feb  3 01:12:32 server1 heartbeat: [6149]: WARN: logd is enabled but logfile/debugfile/logfacility is still configured in ha.cf
Feb  3 01:12:32 server1 heartbeat: [6149]: info: **************************
Feb  3 01:12:32 server1 heartbeat: [6149]: info: Configuration validated. Starting heartbeat 2.0.8
Feb  3 01:12:32 server1 heartbeat: [6150]: info: heartbeat: version 2.0.8
Feb  3 01:12:32 server1 heartbeat: [6150]: info: Heartbeat generation: 13
Feb  3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb  3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb  3 01:12:32 server1 heartbeat: [6150]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
Feb  3 01:12:32 server1 heartbeat: [6150]: info: glib: ping heartbeat started.
Feb  3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Feb  3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Feb  3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
Feb  3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
Feb  3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Feb  3 01:12:32 server1 heartbeat: [6150]: info: Local status now set to: 'up'
Feb  3 01:12:33 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 up.
Feb  3 01:12:33 server1 heartbeat: [6150]: info: Status update for node 192.168.1.1: status ping
Feb  3 01:12:34 server1 heartbeat: [6150]: info: Link server1:eth0 up.
Feb  3 01:12:34 server1 heartbeat: [6150]: info: Link server1:eth1 up.
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Link server2:eth0 up.
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Link server2:eth1 up.
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Status update for node server2: status up
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Comm_now_up(): updating status to active
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Local status now set to: 'active'
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" (0,0)
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/ccm" (100,101)
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/cib" (100,101)
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/lrmd -r" (0,0)
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/attrd" (100,101)
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/crmd" (100,101)
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/mgmtd -v" (0,0)
Feb  3 01:12:45 server1 heartbeat: [6150]: info: Status update for node server2: status active
Feb  3 01:12:45 server1 heartbeat: [6165]: info: Starting "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" as uid 0  gid 0 (pid 6165)
Feb  3 01:12:45 server1 heartbeat: [6166]: info: Starting "/usr/lib/heartbeat/ccm" as uid 100  gid 101 (pid 6166)
Feb  3 01:12:45 server1 heartbeat: [6167]: info: Starting "/usr/lib/heartbeat/cib" as uid 100  gid 101 (pid 6167)
Feb  3 01:12:45 server1 cib: [6167]: WARN: crm_is_writable: /var/lib/heartbeat/crm/cib.xml should be owned and r/w by group haclient
Feb  3 01:12:45 server1 cib: [6167]: info: readCibXmlFile: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
Feb  3 01:12:45 server1 cib: [6167]: info: log_data_element: readCibXmlFile: [on-disk]       <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
Feb  3 01:12:45 server1 heartbeat: [6168]: info: Starting "/usr/lib/heartbeat/lrmd -r" as uid 0  gid 0 (pid 6168)
Feb  3 01:12:45 server1 heartbeat: [6169]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0  gid 0 (pid 6169)
Feb  3 01:12:45 server1 heartbeat: [6170]: info: Starting "/usr/lib/heartbeat/attrd" as uid 100  gid 101 (pid 6170)
Feb  3 01:12:45 server1 heartbeat: [6171]: info: Starting "/usr/lib/heartbeat/crmd" as uid 100  gid 101 (pid 6171)
Feb  3 01:12:45 server1 heartbeat: [6172]: info: Starting "/usr/lib/heartbeat/mgmtd -v" as uid 0  gid 0 (pid 6172)
Feb  3 01:12:45 server1 stonithd: [6169]: info: Signing in with heartbeat.
Feb  3 01:12:46 server1 stonithd: [6169]: notice: /usr/lib/heartbeat/stonithd start up successfully.
Feb  3 01:12:50 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [19:21]
Feb  3 01:12:50 server1 heartbeat: [6150]: info: No pkts missing from server2!
Feb  3 01:12:51 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [23:25]
Feb  3 01:12:51 server1 heartbeat: [6150]: info: No pkts missing from server2!
Feb  3 01:13:01 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [43:45]
Feb  3 01:13:01 server1 heartbeat: [6150]: info: No pkts missing from server2!
Feb  3 01:18:17 server1 heartbeat: [6150]: info: Link server2:eth0 dead.
Feb  3 01:18:18 server1 heartbeat: [6150]: WARN: node 192.168.1.1: is dead
Feb  3 01:18:18 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 dead.
Feb  3 02:06:44 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 up.
Feb  3 02:06:44 server1 heartbeat: [6150]: WARN: Late heartbeat: Node 192.168.1.1: interval 2917470 ms
Feb  3 02:06:44 server1 heartbeat: [6150]: info: Status update for node 192.168.1.1: status ping
Feb  3 02:06:45 server1 heartbeat: [6150]: info: Link server2:eth0 up.


=====================
Feb  3 01:12:43 server2 heartbeat: [6016]: info: Enabling logging daemon 
Feb  3 01:12:43 server2 heartbeat: [6016]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf)
Feb  3 01:12:43 server2 heartbeat: [6016]: WARN: logd is enabled but logfile/debugfile/logfacility is still configured in ha.cf
Feb  3 01:12:43 server2 heartbeat: [6016]: info: **************************
Feb  3 01:12:43 server2 heartbeat: [6016]: info: Configuration validated. Starting heartbeat 2.0.8
Feb  3 01:12:43 server2 heartbeat: [6017]: info: heartbeat: version 2.0.8
Feb  3 01:12:43 server2 heartbeat: [6017]: info: Heartbeat generation: 14
Feb  3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb  3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb  3 01:12:43 server2 heartbeat: [6017]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
Feb  3 01:12:43 server2 heartbeat: [6017]: info: glib: ping heartbeat started.
Feb  3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Feb  3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Feb  3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
Feb  3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
Feb  3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Feb  3 01:12:44 server2 heartbeat: [6017]: info: Local status now set to: 'up'
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Link 192.168.1.1:192.168.1.1 up.
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Status update for node 192.168.1.1: status ping
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Link server1:eth0 up.
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Status update for node server1: status up
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Link server2:eth0 up.
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Link server1:eth1 up.
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Link server2:eth1 up.
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Comm_now_up(): updating status to active
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Local status now set to: 'active'
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" (0,0)
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/ccm" (100,101)
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/cib" (100,101)
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/lrmd -r" (0,0)
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/attrd" (100,101)
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/crmd" (100,101)
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/mgmtd -v" (0,0)
Feb  3 01:12:45 server2 heartbeat: [6017]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 60 ms (> 50 ms) (GSource: 0x9f568d0)
Feb  3 01:12:45 server2 heartbeat: [6031]: info: Starting "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" as uid 0  gid 0 (pid 6031)
Feb  3 01:12:45 server2 heartbeat: [6032]: info: Starting "/usr/lib/heartbeat/ccm" as uid 100  gid 101 (pid 6032)
Feb  3 01:12:45 server2 heartbeat: [6033]: info: Starting "/usr/lib/heartbeat/cib" as uid 100  gid 101 (pid 6033)
Feb  3 01:12:45 server2 cib: [6033]: WARN: crm_is_writable: /var/lib/heartbeat/crm/cib.xml should be owned and r/w by group haclient
Feb  3 01:12:45 server2 cib: [6033]: info: readCibXmlFile: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
Feb  3 01:12:45 server2 cib: [6033]: info: log_data_element: readCibXmlFile: [on-disk]       <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
Feb  3 01:12:45 server2 heartbeat: [6034]: info: Starting "/usr/lib/heartbeat/lrmd -r" as uid 0  gid 0 (pid 6034)
Feb  3 01:12:45 server2 heartbeat: [6035]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0  gid 0 (pid 6035)
Feb  3 01:12:45 server2 heartbeat: [6017]: info: Status update for node server1: status active
Feb  3 01:12:45 server2 heartbeat: [6036]: info: Starting "/usr/lib/heartbeat/attrd" as uid 100  gid 101 (pid 6036)
Feb  3 01:12:45 server2 heartbeat: [6037]: info: Starting "/usr/lib/heartbeat/crmd" as uid 100  gid 101 (pid 6037)
Feb  3 01:12:45 server2 heartbeat: [6038]: info: Starting "/usr/lib/heartbeat/mgmtd -v" as uid 0  gid 0 (pid 6038)
Feb  3 01:12:45 server2 stonithd: [6035]: info: Signing in with heartbeat.
Feb  3 01:12:45 server2 stonithd: [6035]: notice: /usr/lib/heartbeat/stonithd start up successfully.
Feb  3 01:12:50 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [45:47]
Feb  3 01:12:50 server2 heartbeat: [6017]: info: No pkts missing from server1!
Feb  3 01:12:52 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [54:56]
Feb  3 01:12:52 server2 heartbeat: [6017]: info: No pkts missing from server1!
Feb  3 01:12:57 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [64:66]
Feb  3 01:12:57 server2 heartbeat: [6017]: info: No pkts missing from server1!
Feb  3 01:15:04 server2 pengine: [6049]: info: native_print: ip_resource   (heartbeat::ocf:IPaddr): Stopped 
Feb  3 01:15:04 server2 pengine: [6049]: info: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-29.bz2
Feb  3 01:15:05 server2 pengine: [6049]: info: native_print: ip_resource   (heartbeat::ocf:IPaddr): Stopped 
Feb  3 01:15:05 server2 pengine: [6049]: info: process_pe_message: Transition 1: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-30.bz2
Feb  3 01:15:07 server2 pengine: [6049]: info: native_print: ip_resource   (heartbeat::ocf:IPaddr): Started server1
Feb  3 01:15:07 server2 pengine: [6049]: info: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-31.bz2
Feb  3 01:18:17 server2 heartbeat: [6017]: info: Link server1:eth0 dead.
Feb  3 01:18:24 server2 pengine: [6049]: info: native_print: ip_resource   (heartbeat::ocf:IPaddr): Started server1
Feb  3 01:18:24 server2 pengine: [6049]: info: process_pe_message: Transition 3: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-32.bz2

=====================




More information about the Linux-HA mailing list