[Linux-HA] heartbeat 2.0.8: pingd failover
Andreas Kurz
andreas.kurz at gmail.com
Sat Feb 3 15:53:10 MST 2007
Try to add a monitor operation to the ip_resource to check it regularly.
Is the "pingd_score" completely removed from the cib, in the case of
an uplugged interface, or is the value of the disconnected node zero?
... This would explain why the constraint is not working correctly ...
then a constraint depending on the value of pingd_score would be more
helpful.
Regards,
Andreas
On 2/3/07, greno at verizon.net <greno at verizon.net> wrote:
> I have heartbeat running in a 2-node active/passive setup and I've configured a resource 'ip_resource' to monitor in cib. Whenever I unplug the network cable from the active server, I see messages that indicate that the resource is stopped and then restarted again but on the active server rather than the failover server as I was expecting. Is there something wrong in the config files? Some details...
>
> Refresh in 4s...
>
> ============
> Last updated: Sat Feb 3 01:17:56 2007
> Current DC: server2 (29626f17-db1f-4139-aa33-5a6b4110da51)
> 2 Nodes configured.
> 1 Resources configured.
> ============
>
> ip_resource (heartbeat::ocf:IPaddr): Started server1
>
> =====================
>
> logfacility daemon
> keepalive 1
> deadtime 10
> warntime 5
> initdead 1208
> udpport 694
> ping 192.168.1.1
> bcast eth0 eth1
> auto_failback off
> respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score
> node server1
> node server2
> use_logd yes
> compression bz2
> compression_threshold 2
> crm yes
>
> =====================
> <cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" generated="true" epoch="14" num_updates="171" cib-last-written="Sat Feb 3 01:18:23 2007" ccm_transition="2" dc_uuid="29626f17-db1f-4139-aa33-5a6b4110da51">
> <configuration>
> <crm_config/>
> <nodes>
> <node id="29626f17-db1f-4139-aa33-5a6b4110da51" uname="server2" type="normal"/>
> <node id="67b0bfa7-0165-4a8c-9c0f-ec82e0ae2c91" uname="server1" type="normal"/>
> </nodes>
> <resources>
> <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
> <instance_attributes id="ip_attributes">
> <attributes>
> <nvpair id="ip" name="ip" value="192.168.1.215"/>
> </attributes>
> </instance_attributes>
> </primitive>
> </resources>
> <constraints>
> <rsc_location id="run_ip_resource" rsc="ip_resource">
> <rule id="pref_run_ip_resource1" score="100">
> <expression id="expr1" attribute="#uname" operation="eq" value="server1"/>
> </rule>
> <rule id="pref_run_ip_resource2" score="000">
> <expression id="expr2" attribute="#uname" operation="eq" value="server2"/>
> </rule>
> </rsc_location>
> <rsc_location id="ip_resource:not_connected" rsc="ip_resource">
> <rule id="ip_resource:not_connected:rule" score="-INFINITY">
> <expression id="ip_resource:not_connected:expr" attribute="pingd_score" operation="not_defined"/>
> </rule>
> </rsc_location>
> </constraints>
> </configuration>
> </cib>
>
> =====================
> Feb 3 01:12:32 server1 heartbeat: [6149]: info: Enabling logging daemon
> Feb 3 01:12:32 server1 heartbeat: [6149]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf)
> Feb 3 01:12:32 server1 heartbeat: [6149]: WARN: logd is enabled but logfile/debugfile/logfacility is still configured in ha.cf
> Feb 3 01:12:32 server1 heartbeat: [6149]: info: **************************
> Feb 3 01:12:32 server1 heartbeat: [6149]: info: Configuration validated. Starting heartbeat 2.0.8
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: heartbeat: version 2.0.8
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: Heartbeat generation: 13
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_TriggerHandler: Added signal manual handler
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_TriggerHandler: Added signal manual handler
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: ping heartbeat started.
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Feb 3 01:12:32 server1 heartbeat: [6150]: info: Local status now set to: 'up'
> Feb 3 01:12:33 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 up.
> Feb 3 01:12:33 server1 heartbeat: [6150]: info: Status update for node 192.168.1.1: status ping
> Feb 3 01:12:34 server1 heartbeat: [6150]: info: Link server1:eth0 up.
> Feb 3 01:12:34 server1 heartbeat: [6150]: info: Link server1:eth1 up.
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Link server2:eth0 up.
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Link server2:eth1 up.
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Status update for node server2: status up
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Comm_now_up(): updating status to active
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Local status now set to: 'active'
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" (0,0)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/ccm" (100,101)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/cib" (100,101)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/lrmd -r" (0,0)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/attrd" (100,101)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/crmd" (100,101)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Starting child client "/usr/lib/heartbeat/mgmtd -v" (0,0)
> Feb 3 01:12:45 server1 heartbeat: [6150]: info: Status update for node server2: status active
> Feb 3 01:12:45 server1 heartbeat: [6165]: info: Starting "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" as uid 0 gid 0 (pid 6165)
> Feb 3 01:12:45 server1 heartbeat: [6166]: info: Starting "/usr/lib/heartbeat/ccm" as uid 100 gid 101 (pid 6166)
> Feb 3 01:12:45 server1 heartbeat: [6167]: info: Starting "/usr/lib/heartbeat/cib" as uid 100 gid 101 (pid 6167)
> Feb 3 01:12:45 server1 cib: [6167]: WARN: crm_is_writable: /var/lib/heartbeat/crm/cib.xml should be owned and r/w by group haclient
> Feb 3 01:12:45 server1 cib: [6167]: info: readCibXmlFile: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
> Feb 3 01:12:45 server1 cib: [6167]: info: log_data_element: readCibXmlFile: [on-disk] <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
> Feb 3 01:12:45 server1 heartbeat: [6168]: info: Starting "/usr/lib/heartbeat/lrmd -r" as uid 0 gid 0 (pid 6168)
> Feb 3 01:12:45 server1 heartbeat: [6169]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 6169)
> Feb 3 01:12:45 server1 heartbeat: [6170]: info: Starting "/usr/lib/heartbeat/attrd" as uid 100 gid 101 (pid 6170)
> Feb 3 01:12:45 server1 heartbeat: [6171]: info: Starting "/usr/lib/heartbeat/crmd" as uid 100 gid 101 (pid 6171)
> Feb 3 01:12:45 server1 heartbeat: [6172]: info: Starting "/usr/lib/heartbeat/mgmtd -v" as uid 0 gid 0 (pid 6172)
> Feb 3 01:12:45 server1 stonithd: [6169]: info: Signing in with heartbeat.
> Feb 3 01:12:46 server1 stonithd: [6169]: notice: /usr/lib/heartbeat/stonithd start up successfully.
> Feb 3 01:12:50 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [19:21]
> Feb 3 01:12:50 server1 heartbeat: [6150]: info: No pkts missing from server2!
> Feb 3 01:12:51 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [23:25]
> Feb 3 01:12:51 server1 heartbeat: [6150]: info: No pkts missing from server2!
> Feb 3 01:13:01 server1 heartbeat: [6150]: WARN: 1 lost packet(s) for [server2] [43:45]
> Feb 3 01:13:01 server1 heartbeat: [6150]: info: No pkts missing from server2!
> Feb 3 01:18:17 server1 heartbeat: [6150]: info: Link server2:eth0 dead.
> Feb 3 01:18:18 server1 heartbeat: [6150]: WARN: node 192.168.1.1: is dead
> Feb 3 01:18:18 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 dead.
> Feb 3 02:06:44 server1 heartbeat: [6150]: info: Link 192.168.1.1:192.168.1.1 up.
> Feb 3 02:06:44 server1 heartbeat: [6150]: WARN: Late heartbeat: Node 192.168.1.1: interval 2917470 ms
> Feb 3 02:06:44 server1 heartbeat: [6150]: info: Status update for node 192.168.1.1: status ping
> Feb 3 02:06:45 server1 heartbeat: [6150]: info: Link server2:eth0 up.
>
>
> =====================
> Feb 3 01:12:43 server2 heartbeat: [6016]: info: Enabling logging daemon
> Feb 3 01:12:43 server2 heartbeat: [6016]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf)
> Feb 3 01:12:43 server2 heartbeat: [6016]: WARN: logd is enabled but logfile/debugfile/logfacility is still configured in ha.cf
> Feb 3 01:12:43 server2 heartbeat: [6016]: info: **************************
> Feb 3 01:12:43 server2 heartbeat: [6016]: info: Configuration validated. Starting heartbeat 2.0.8
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: heartbeat: version 2.0.8
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: Heartbeat generation: 14
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_TriggerHandler: Added signal manual handler
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_TriggerHandler: Added signal manual handler
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: ping heartbeat started.
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
> Feb 3 01:12:43 server2 heartbeat: [6017]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Feb 3 01:12:44 server2 heartbeat: [6017]: info: Local status now set to: 'up'
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link 192.168.1.1:192.168.1.1 up.
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Status update for node 192.168.1.1: status ping
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server1:eth0 up.
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Status update for node server1: status up
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server2:eth0 up.
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server1:eth1 up.
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Link server2:eth1 up.
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Comm_now_up(): updating status to active
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Local status now set to: 'active'
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" (0,0)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/ccm" (100,101)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/cib" (100,101)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/lrmd -r" (0,0)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/attrd" (100,101)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/crmd" (100,101)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Starting child client "/usr/lib/heartbeat/mgmtd -v" (0,0)
> Feb 3 01:12:45 server2 heartbeat: [6017]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 60 ms (> 50 ms) (GSource: 0x9f568d0)
> Feb 3 01:12:45 server2 heartbeat: [6031]: info: Starting "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" as uid 0 gid 0 (pid 6031)
> Feb 3 01:12:45 server2 heartbeat: [6032]: info: Starting "/usr/lib/heartbeat/ccm" as uid 100 gid 101 (pid 6032)
> Feb 3 01:12:45 server2 heartbeat: [6033]: info: Starting "/usr/lib/heartbeat/cib" as uid 100 gid 101 (pid 6033)
> Feb 3 01:12:45 server2 cib: [6033]: WARN: crm_is_writable: /var/lib/heartbeat/crm/cib.xml should be owned and r/w by group haclient
> Feb 3 01:12:45 server2 cib: [6033]: info: readCibXmlFile: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
> Feb 3 01:12:45 server2 cib: [6033]: info: log_data_element: readCibXmlFile: [on-disk] <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
> Feb 3 01:12:45 server2 heartbeat: [6034]: info: Starting "/usr/lib/heartbeat/lrmd -r" as uid 0 gid 0 (pid 6034)
> Feb 3 01:12:45 server2 heartbeat: [6035]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 6035)
> Feb 3 01:12:45 server2 heartbeat: [6017]: info: Status update for node server1: status active
> Feb 3 01:12:45 server2 heartbeat: [6036]: info: Starting "/usr/lib/heartbeat/attrd" as uid 100 gid 101 (pid 6036)
> Feb 3 01:12:45 server2 heartbeat: [6037]: info: Starting "/usr/lib/heartbeat/crmd" as uid 100 gid 101 (pid 6037)
> Feb 3 01:12:45 server2 heartbeat: [6038]: info: Starting "/usr/lib/heartbeat/mgmtd -v" as uid 0 gid 0 (pid 6038)
> Feb 3 01:12:45 server2 stonithd: [6035]: info: Signing in with heartbeat.
> Feb 3 01:12:45 server2 stonithd: [6035]: notice: /usr/lib/heartbeat/stonithd start up successfully.
> Feb 3 01:12:50 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [45:47]
> Feb 3 01:12:50 server2 heartbeat: [6017]: info: No pkts missing from server1!
> Feb 3 01:12:52 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [54:56]
> Feb 3 01:12:52 server2 heartbeat: [6017]: info: No pkts missing from server1!
> Feb 3 01:12:57 server2 heartbeat: [6017]: WARN: 1 lost packet(s) for [server1] [64:66]
> Feb 3 01:12:57 server2 heartbeat: [6017]: info: No pkts missing from server1!
> Feb 3 01:15:04 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Stopped
> Feb 3 01:15:04 server2 pengine: [6049]: info: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-29.bz2
> Feb 3 01:15:05 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Stopped
> Feb 3 01:15:05 server2 pengine: [6049]: info: process_pe_message: Transition 1: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-30.bz2
> Feb 3 01:15:07 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Started server1
> Feb 3 01:15:07 server2 pengine: [6049]: info: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-31.bz2
> Feb 3 01:18:17 server2 heartbeat: [6017]: info: Link server1:eth0 dead.
> Feb 3 01:18:24 server2 pengine: [6049]: info: native_print: ip_resource (heartbeat::ocf:IPaddr): Started server1
> Feb 3 01:18:24 server2 pengine: [6049]: info: process_pe_message: Transition 3: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-32.bz2
>
> =====================
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list