[Linux-HA] heartbeat 2.0.8: pingd failover
Andrew Beekhof
beekhof at gmail.com
Mon Feb 5 13:26:42 MST 2007
On 2/3/07, greno at verizon.net <greno at verizon.net> wrote:
> I have heartbeat running in a 2-node active/passive setup and I've configured a resource 'ip_resource' to monitor in cib. Whenever I unplug the network cable from the active server, I see messages that indicate that the resource is stopped and then restarted again but on the active server rather than the failover server as I was expecting. Is there something wrong in the config files?
hard to say since you didn't include the logs you're talking about above
but at a guess i'd suggest making the score here:
<rule id="pref_run_ip_resource1" score="100">
less than the score in ha.cf:
respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score
otherwise both nodes will end up with the same preference and we'll
pick one (seemingly) at random
> Some details...
>
> Refresh in 4s...
>
> ============
> Last updated: Sat Feb 3 01:17:56 2007
> Current DC: server2 (29626f17-db1f-4139-aa33-5a6b4110da51)
> 2 Nodes configured.
> 1 Resources configured.
> ============
>
> ip_resource (heartbeat::ocf:IPaddr): Started server1
>
> =====================
>
> logfacility daemon
> keepalive 1
> deadtime 10
> warntime 5
> initdead 1208
> udpport 694
> ping 192.168.1.1
> bcast eth0 eth1
> auto_failback off
> respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score
> node server1
> node server2
> use_logd yes
> compression bz2
> compression_threshold 2
> crm yes
>
> =====================
> <cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" generated="true" epoch="14" num_updates="171" cib-last-written="Sat Feb 3 01:18:23 2007" ccm_transition="2" dc_uuid="29626f17-db1f-4139-aa33-5a6b4110da51">
> <configuration>
> <crm_config/>
> <nodes>
> <node id="29626f17-db1f-4139-aa33-5a6b4110da51" uname="server2" type="normal"/>
> <node id="67b0bfa7-0165-4a8c-9c0f-ec82e0ae2c91" uname="server1" type="normal"/>
> </nodes>
> <resources>
> <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
> <instance_attributes id="ip_attributes">
> <attributes>
> <nvpair id="ip" name="ip" value="192.168.1.215"/>
> </attributes>
> </instance_attributes>
> </primitive>
> </resources>
> <constraints>
> <rsc_location id="run_ip_resource" rsc="ip_resource">
> <rule id="pref_run_ip_resource1" score="100">
> <expression id="expr1" attribute="#uname" operation="eq" value="server1"/>
> </rule>
> <rule id="pref_run_ip_resource2" score="000">
> <expression id="expr2" attribute="#uname" operation="eq" value="server2"/>
> </rule>
> </rsc_location>
> <rsc_location id="ip_resource:not_connected" rsc="ip_resource">
> <rule id="ip_resource:not_connected:rule" score="-INFINITY">
> <expression id="ip_resource:not_connected:expr" attribute="pingd_score" operation="not_defined"/>
> </rule>
> </rsc_location>
> </constraints>
> </configuration>
> </cib>
>
> =====================
> Feb 3 01:12:32 server1 heartbeat: [6149]: info: Enabling logging daemon
> Feb 3 01:12:32 server1 heartbeat: [6149]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf)
> Feb 3 01:12:32 server1 heartbeat: [6149]: WARN: logd is enabled but logfile/debugfile/logfacility is still configured in ha.cf
> Feb 3 01:12:32 server1 heartbeat: [6149]: info: **************************
> Feb 3 01:12:32 server1 heartbeat: [6149]: info: Configuration validated. Starting heartbeat 2.0.8
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: heartbeat: version 2.0.8
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: Heartbeat generation: 13
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_TriggerHandler: Added signal manual handler
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_TriggerHandler: Added signal manual handler
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: ping heartbeat started.
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: Local status now set to: 'up'
> Feb 3 01:12:33 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 up.
> Feb 3 01:12:33 server1 heartbeat: [6150]: info: Status update for node 192.168.1.1: status ping
> Feb 3 01:12:34 server1 heartbeat: [6150]: info: Link server1:eth0 up.
> Feb 3 01:12:34 server1 heartbeat: [6150]: info: Link server1:eth1 up.
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Link server2:eth0 up.
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Link server2:eth1 up.
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Status update for node server2: status up
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Comm_now_up(): updating status to active
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Local status now set to: 'active'
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" (0,0)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/ccm" (100,101)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/cib" (100,101)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/lrmd -r" (0,0)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/attrd" (100,101)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/crmd" (100,101)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/mgmtd -v" (0,0)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Status update for node server2: status active
> Feb 3 01:12:45 server1 heartbeat: [6165]: info: Starting "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" as uid 0 gid 0 (pid 6165)
> Feb 3 01:12:45 server1 heartbeat: [6166]: info: Starting "/usr/lib/heartbeat/ccm" as uid 100 gid 101 (pid 6166)
> Feb 3 01:12:45 server1 heartbeat: [6167]: info: Starting "/usr/lib/heartbeat/cib" as uid 100 gid 101 (pid 6167)
> Feb 3 01:12:45 server1 cib: [6167]: WARN: crm_is_writable: /var/lib/heartbeat/crm/cib.xml should be owned and r/w by group haclient
> Feb 3 01:12:45 server1 cib: [6167]: info: readCibXmlFile: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
> Feb 3 01:12:45 server1 cib: [6167]: info: log_data_element: readCibXmlFile: [on-disk] <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
> Feb 3 01:12:45 server1 heartbeat: [6168]: info: Starting "/usr/lib/heartbeat/lrmd -r" as uid 0 gid 0 (pid 6168)
> Feb 3 01:12:45 server1 heartbeat: [6169]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 6169)
> Feb 3 01:12:45 server1 heartbeat: [6170]: info: Starting "/usr/lib/heartbeat/attrd" as uid 100 gid 101 (pid 6170)
> Feb 3 01:12:45 server1 heartbeat: [6171]: info: Starting "/usr/lib/heartbeat/crmd" as uid 100 gid 101 (pid 6171)
> Feb 3 01:12:45 server1 heartbeat: [6172]: info: Starting "/usr/lib/heartbeat/mgmtd -v" as uid 0 gid 0 (pid 6172)
> Feb 3 01:12:45 server1 stonithd: [6169]: info: Signing in with heartbeat.
> Feb 3 01:12:46 server1 stonithd: [6169]: notice: /usr/lib/heartbeat/stonithd start up successfully.
> Feb 3 01:12:50 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [19:21]
> Feb 3 01:12:50 server1 heartbeat: [6150]: info: No pkts missing from server2!
> Feb 3 01:12:51 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [23:25]
> Feb 3 01:12:51 server1 heartbeat: [6150]: info: No pkts missing from server2!
> Feb 3 01:13:01 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [43:45]
> Feb 3 01:13:01 server1 heartbeat: [6150]: info: No pkts missing from server2!
> Feb 3 01:18:17 server1 heartbeat: [6150]: info: Link server2:eth0 dead.
> Feb 3 01:18:18 server1 heartbeat: [6150]: WARN: node 192.168.1.1: is dead
> Feb 3 01:18:18 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 dead.
> Feb 3 02:06:44 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 up.
> Feb 3 02:06:44 server1 heartbeat: [6150]: WARN: Late heartbeat: Node 192.168.1.1: interval 2917470 ms
> Feb 3 02:06:44 server1 heartbeat: [6150]: info: Status update for node 192.168.1.1: status ping
> Feb 3 02:06:45 server1 heartbeat: [6150]: info: Link server2:eth0 up.
>
>
> =====================
> Feb 3 01:12:43 server2 heartbeat: [6016]: info: Enabling logging daemon
> Feb 3 01:12:43 server2 heartbeat: [6016]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf)
> Feb 3 01:12:43 server2 heartbeat: [6016]: WARN: logd is enabled but logfile/debugfile/logfacility is still configured in ha.cf
> Feb 3 01:12:43 server2 heartbeat: [6016]: info: **************************
> Feb 3 01:12:43 server2 heartbeat: [6016]: info: Configuration validated. Starting heartbeat 2.0.8
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: heartbeat: version 2.0.8
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: Heartbeat generation: 14
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_TriggerHandler: Added signal manual handler
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_TriggerHandler: Added signal manual handler
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: ping heartbeat started.
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Feb 3 01:12:44 server2 heartbeat: [6017]: info: Local status now set to: 'up'
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link 192.168.1.1:192.168.1.1 up.
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Status update for node 192.168.1.1: status ping
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server1:eth0 up.
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Status update for node server1: status up
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server2:eth0 up.
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server1:eth1 up.
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server2:eth1 up.
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Comm_now_up(): updating status to active
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Local status now set to: 'active'
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" (0,0)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/ccm" (100,101)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/cib" (100,101)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/lrmd -r" (0,0)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/attrd" (100,101)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/crmd" (100,101)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/mgmtd -v" (0,0)
> Feb 3 01:12:45 server2 heartbeat: [6017]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 60 ms (> 50 ms) (GSource: 0x9f568d0)
> Feb 3 01:12:45 server2 heartbeat: [6031]: info: Starting "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" as uid 0 gid 0 (pid 6031)
> Feb 3 01:12:45 server2 heartbeat: [6032]: info: Starting "/usr/lib/heartbeat/ccm" as uid 100 gid 101 (pid 6032)
> Feb 3 01:12:45 server2 heartbeat: [6033]: info: Starting "/usr/lib/heartbeat/cib" as uid 100 gid 101 (pid 6033)
> Feb 3 01:12:45 server2 cib: [6033]: WARN: crm_is_writable: /var/lib/heartbeat/crm/cib.xml should be owned and r/w by group haclient
> Feb 3 01:12:45 server2 cib: [6033]: info: readCibXmlFile: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
> Feb 3 01:12:45 server2 cib: [6033]: info: log_data_element: readCibXmlFile: [on-disk] <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
> Feb 3 01:12:45 server2 heartbeat: [6034]: info: Starting "/usr/lib/heartbeat/lrmd -r" as uid 0 gid 0 (pid 6034)
> Feb 3 01:12:45 server2 heartbeat: [6035]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 6035)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Status update for node server1: status active
> Feb 3 01:12:45 server2 heartbeat: [6036]: info: Starting "/usr/lib/heartbeat/attrd" as uid 100 gid 101 (pid 6036)
> Feb 3 01:12:45 server2 heartbeat: [6037]: info: Starting "/usr/lib/heartbeat/crmd" as uid 100 gid 101 (pid 6037)
> Feb 3 01:12:45 server2 heartbeat: [6038]: info: Starting "/usr/lib/heartbeat/mgmtd -v" as uid 0 gid 0 (pid 6038)
> Feb 3 01:12:45 server2 stonithd: [6035]: info: Signing in with heartbeat.
> Feb 3 01:12:45 server2 stonithd: [6035]: notice: /usr/lib/heartbeat/stonithd start up successfully.
> Feb 3 01:12:50 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [45:47]
> Feb 3 01:12:50 server2 heartbeat: [6017]: info: No pkts missing from server1!
> Feb 3 01:12:52 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [54:56]
> Feb 3 01:12:52 server2 heartbeat: [6017]: info: No pkts missing from server1!
> Feb 3 01:12:57 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [64:66]
> Feb 3 01:12:57 server2 heartbeat: [6017]: info: No pkts missing from server1!
> Feb 3 01:15:04 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Stopped
> Feb 3 01:15:04 server2 pengine: [6049]: info: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-29.bz2
> Feb 3 01:15:05 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Stopped
> Feb 3 01:15:05 server2 pengine: [6049]: info: process_pe_message: Transition 1: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-30.bz2
> Feb 3 01:15:07 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Started server1
> Feb 3 01:15:07 server2 pengine: [6049]: info: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-31.bz2
> Feb 3 01:18:17 server2 heartbeat: [6017]: info: Link server1:eth0 dead.
> Feb 3 01:18:24 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Started server1
> Feb 3 01:18:24 server2 pengine: [6049]: info: process_pe_message: Transition 3: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-32.bz2
>
> =====================
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list