[Linux-HA] heartbeat 2.0.8: pingd failover
greno at verizon.net
greno at verizon.net
Sun Feb 4 14:04:55 MST 2007
>From: greno at verizon.net
>Date: 2007/02/04 Sun AM 12:44:08 CST
>To: Andreas Kurz <andreas.kurz at gmail.com>,
General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
>Subject: Re: Re: [Linux-HA] heartbeat 2.0.8: pingd failover
>>From: Andreas Kurz <andreas.kurz at gmail.com>
>>Date: 2007/02/03 Sat PM 04:53:10 CST
>>To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
>>Subject: Re: [Linux-HA] heartbeat 2.0.8: pingd failover
>
>>Try to add a monitor operation to the ip_resource to check it regularly.
>>
>>Is the "pingd_score" completely removed from the cib, in the case of
>>an uplugged interface, or is the value of the disconnected node zero?
>>... This would explain why the constraint is not working correctly ...
>>then a constraint depending on the value of pingd_score would be more
>>helpful.
>>
>>Regards,
>>Andreas
>>
>
>Ok, I changed things around to try this with a resource group since that is what I really need to end up with.
>Same ha.cf and modified cib.xml:
>
>==================================
> <cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" generated="true" epoch="17" num_updates="213" cib-last-written="Sun Feb 4 01:00:05 2007" ccm_transition="4" dc_uuid="29626f17-db1f-4139-aa33-5a6b4110da51">
> <configuration>
> <crm_config/>
> <nodes>
> <node id="29626f17-db1f-4139-aa33-5a6b4110da51" uname="server2" type="normal"/>
> <node id="67b0bfa7-0165-4a8c-9c0f-ec82e0ae2c91" uname="server1" type="normal"/>
> </nodes>
> <resources>
> <group id="GRP_webserver_RG">
> <primitive id="GRP_webserverip_R" class="ocf" type="IPaddr" provider="heartbeat">
> <instance_attributes id="GRP_webserver_RA">
> <attributes>
> <nvpair id="GRP_webserverip_RA_ip" name="ip" value="192.168.1.215"/>
> </attributes>
> </instance_attributes>
> </primitive>
> </group>
> </resources>
> <constraints>
> <rsc_location id="run_GRP_webserver_RG" rsc="GRP_webserver_RG">
> <rule id="pref_run_GRP_webserver_RG" score="100">
> <expression id="pref_run_GRP_webserver_RG_expr1" attribute="#uname" operation="eq" value="server1"/>
> </rule>
> </rsc_location>
> <rsc_location id="GRP_webserver_RG:not_connected" rsc="GRP_webserver_RG">
> <rule id="GRP_webserver_RG:not_connected:rule" score="-INFINITY">
> <expression id="GRP_webserver_RG:not_connected:expr" attribute="pingd_score" operation="not_defined"/>
> </rule>
> </rsc_location>
> </constraints>
> </configuration>
> </cib>
>
>==================================
>
>What I get now is that the resource fails over from server1 to server2. And then when connectivity is restored to server1, the log on server2 shows "Link server1:eth0 up." and "Started server2"; but it actually started server1 which I tested by logging into the webserver virtual ip. So I got the failover but with having autofailback set to no I expected that the IP would remain on server2. It told me that it started it on server2 but actually it is on server1.
>
>So now with the IP on server1 I pulled the network cable and the log on server2 shows server1:eth0 is dead and says Starting server1. And if I try to login to the IP from server1 itself (which is disconnected at this point) it is on server1. This makes no sense.
>
>
>
>_______________________________________________
>Linux-HA mailing list
>Linux-HA at lists.linux-ha.org
>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>See also: http://linux-ha.org/ReportingProblems
Logs from latest session:
================================
Feb 4 00:59:39 server1 heartbeat: [10916]: info: Enabling logging daemon
Feb 4 00:59:39 server1 heartbeat: [10916]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf)
Feb 4 00:59:39 server1 heartbeat: [10916]: WARN: logd is enabled but logfile/debugfile/logfacility is still configured in ha.cf
Feb 4 00:59:39 server1 heartbeat: [10916]: info: **************************
Feb 4 00:59:39 server1 heartbeat: [10916]: info: Configuration validated. Starting heartbeat 2.0.8
Feb 4 00:59:39 server1 heartbeat: [10917]: info: heartbeat: version 2.0.8
Feb 4 00:59:39 server1 heartbeat: [10917]: info: Heartbeat generation: 16
Feb 4 00:59:39 server1 heartbeat: [10917]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb 4 00:59:39 server1 heartbeat: [10917]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb 4 00:59:39 server1 heartbeat: [10917]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
Feb 4 00:59:39 server1 heartbeat: [10917]: info: glib: ping heartbeat started.
Feb 4 00:59:39 server1 heartbeat: [10917]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Feb 4 00:59:39 server1 heartbeat: [10917]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Feb 4 00:59:39 server1 heartbeat: [10917]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
Feb 4 00:59:39 server1 heartbeat: [10917]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
Feb 4 00:59:39 server1 heartbeat: [10917]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Feb 4 00:59:39 server1 heartbeat: [10917]: info: Local status now set to: 'up'
Feb 4 00:59:40 server1 heartbeat: [10917]: info: Link 192.168.1.1:192.168.1.1 up.
Feb 4 00:59:40 server1 heartbeat: [10917]: info: Status update for node 192.168.1.1: status ping
Feb 4 00:59:40 server1 heartbeat: [10917]: info: Link server2:eth0 up.
Feb 4 00:59:40 server1 heartbeat: [10917]: info: Status update for node server2: status active
Feb 4 00:59:40 server1 heartbeat: [10917]: info: Link server1:eth0 up.
Feb 4 00:59:40 server1 heartbeat: [10917]: info: Link server2:eth1 up.
Feb 4 00:59:40 server1 heartbeat: [10917]: info: Link server1:eth1 up.
Feb 4 00:59:41 server1 heartbeat: [10917]: info: Comm_now_up(): updating status to active
Feb 4 00:59:41 server1 heartbeat: [10917]: info: Local status now set to: 'active'
Feb 4 00:59:41 server1 heartbeat: [10917]: info: Starting child client "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" (0,0)
Feb 4 00:59:41 server1 heartbeat: [10917]: info: Starting child client "/usr/lib/heartbeat/cibmon -d" (100,101)
Feb 4 00:59:41 server1 heartbeat: [10917]: info: Starting child client "/usr/lib/heartbeat/ccm" (100,101)
Feb 4 00:59:41 server1 heartbeat: [10917]: info: Starting child client "/usr/lib/heartbeat/cib" (100,101)
Feb 4 00:59:41 server1 heartbeat: [10917]: info: Starting child client "/usr/lib/heartbeat/lrmd -r" (0,0)
Feb 4 00:59:41 server1 heartbeat: [10917]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
Feb 4 00:59:41 server1 heartbeat: [10917]: info: Starting child client "/usr/lib/heartbeat/attrd" (100,101)
Feb 4 00:59:41 server1 heartbeat: [10917]: info: Starting child client "/usr/lib/heartbeat/crmd" (100,101)
Feb 4 00:59:41 server1 heartbeat: [10917]: info: Starting child client "/usr/lib/heartbeat/mgmtd -v" (0,0)
Feb 4 00:59:41 server1 heartbeat: [10931]: info: Starting "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" as uid 0 gid 0 (pid 10931)
Feb 4 00:59:41 server1 heartbeat: [10932]: info: Starting "/usr/lib/heartbeat/cibmon -d" as uid 100 gid 101 (pid 10932)
Feb 4 00:59:41 server1 heartbeat: [10933]: info: Starting "/usr/lib/heartbeat/ccm" as uid 100 gid 101 (pid 10933)
Feb 4 00:59:41 server1 heartbeat: [10934]: info: Starting "/usr/lib/heartbeat/cib" as uid 100 gid 101 (pid 10934)
Feb 4 00:59:41 server1 heartbeat: [10936]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 10936)
Feb 4 00:59:41 server1 heartbeat: [10937]: info: Starting "/usr/lib/heartbeat/attrd" as uid 100 gid 101 (pid 10937)
Feb 4 00:59:41 server1 heartbeat: [10938]: info: Starting "/usr/lib/heartbeat/crmd" as uid 100 gid 101 (pid 10938)
Feb 4 00:59:41 server1 heartbeat: [10939]: info: Starting "/usr/lib/heartbeat/mgmtd -v" as uid 0 gid 0 (pid 10939)
Feb 4 00:59:41 server1 heartbeat: [10935]: info: Starting "/usr/lib/heartbeat/lrmd -r" as uid 0 gid 0 (pid 10935)
Feb 4 00:59:41 server1 stonithd: [10936]: info: Signing in with heartbeat.
Feb 4 00:59:41 server1 cib: [10934]: WARN: crm_is_writable: /var/lib/heartbeat/crm/cib.xml should be owned and r/w by group haclient
Feb 4 00:59:41 server1 cib: [10934]: info: readCibXmlFile: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
Feb 4 00:59:41 server1 cib: [10934]: info: log_data_element: readCibXmlFile: [on-disk] <primitive id="GRP_webserverip_R" class="ocf" type="IPaddr" provider="heartbeat">
Feb 4 00:59:41 server1 stonithd: [10936]: notice: /usr/lib/heartbeat/stonithd start up successfully.
Feb 4 00:59:52 server1 cibmon: [10932]: info: log_data_element: cib_replace: + <lrm_resource id="GRP_webserverip_R" type="IPaddr" class="ocf" provider="heartbeat">
Feb 4 00:59:53 server1 cibmon: [10932]: info: log_data_element: cib_replace: + <lrm_resource id="GRP_webserverip_R" type="IPaddr" class="ocf" provider="heartbeat">
Feb 4 00:59:53 server1 cibmon: [10932]: info: log_data_element: cib_apply_diff: - <lrm_resource id="GRP_webserverip_R" type="IPaddr" class="ocf" provider="heartbeat">
Feb 4 00:59:55 server1 cibmon: [10932]: info: log_data_element: cib_apply_diff: + <lrm_resource id="GRP_webserverip_R" type="IPaddr" class="ocf" provider="heartbeat">
================================
Feb 4 00:50:43 server2 heartbeat: [10332]: info: Enabling logging daemon
Feb 4 00:50:43 server2 heartbeat: [10332]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf)
Feb 4 00:50:43 server2 heartbeat: [10332]: WARN: logd is enabled but logfile/debugfile/logfacility is still configured in ha.cf
Feb 4 00:50:43 server2 heartbeat: [10332]: info: **************************
Feb 4 00:50:43 server2 heartbeat: [10332]: info: Configuration validated. Starting heartbeat 2.0.8
Feb 4 00:50:43 server2 heartbeat: [10333]: info: heartbeat: version 2.0.8
Feb 4 00:50:43 server2 heartbeat: [10333]: info: Heartbeat generation: 16
Feb 4 00:50:43 server2 heartbeat: [10333]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb 4 00:50:43 server2 heartbeat: [10333]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb 4 00:50:43 server2 heartbeat: [10333]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
Feb 4 00:50:43 server2 heartbeat: [10333]: info: glib: ping heartbeat started.
Feb 4 00:50:43 server2 heartbeat: [10333]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Feb 4 00:50:43 server2 heartbeat: [10333]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Feb 4 00:50:43 server2 heartbeat: [10333]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
Feb 4 00:50:43 server2 heartbeat: [10333]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
Feb 4 00:50:43 server2 heartbeat: [10333]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Feb 4 00:50:43 server2 heartbeat: [10333]: info: Local status now set to: 'up'
Feb 4 00:50:44 server2 heartbeat: [10333]: info: Link 192.168.1.1:192.168.1.1 up.
Feb 4 00:50:44 server2 heartbeat: [10333]: info: Status update for node 192.168.1.1: status ping
Feb 4 00:50:44 server2 heartbeat: [10333]: info: Link server1:eth0 up.
Feb 4 00:50:44 server2 heartbeat: [10333]: info: Status update for node server1: status up
Feb 4 00:50:44 server2 heartbeat: [10333]: info: Link server2:eth0 up.
Feb 4 00:50:44 server2 heartbeat: [10333]: info: Link server1:eth1 up.
Feb 4 00:50:44 server2 heartbeat: [10333]: info: Link server2:eth1 up.
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Comm_now_up(): updating status to active
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Local status now set to: 'active'
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Starting child client "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" (0,0)
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Starting child client "/usr/lib/heartbeat/cibmon -d" (100,101)
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Starting child client "/usr/lib/heartbeat/ccm" (100,101)
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Starting child client "/usr/lib/heartbeat/cib" (100,101)
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Starting child client "/usr/lib/heartbeat/lrmd -r" (0,0)
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Starting child client "/usr/lib/heartbeat/attrd" (100,101)
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Starting child client "/usr/lib/heartbeat/crmd" (100,101)
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Starting child client "/usr/lib/heartbeat/mgmtd -v" (0,0)
Feb 4 00:50:45 server2 heartbeat: [10333]: info: Status update for node server1: status active
Feb 4 00:50:45 server2 heartbeat: [10347]: info: Starting "/usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd_score" as uid 0 gid 0 (pid 10347)
Feb 4 00:50:45 server2 heartbeat: [10348]: info: Starting "/usr/lib/heartbeat/cibmon -d" as uid 100 gid 101 (pid 10348)
Feb 4 00:50:45 server2 heartbeat: [10349]: info: Starting "/usr/lib/heartbeat/ccm" as uid 100 gid 101 (pid 10349)
Feb 4 00:50:45 server2 heartbeat: [10350]: info: Starting "/usr/lib/heartbeat/cib" as uid 100 gid 101 (pid 10350)
Feb 4 00:50:45 server2 heartbeat: [10351]: info: Starting "/usr/lib/heartbeat/lrmd -r" as uid 0 gid 0 (pid 10351)
Feb 4 00:50:45 server2 heartbeat: [10352]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 10352)
Feb 4 00:50:45 server2 heartbeat: [10353]: info: Starting "/usr/lib/heartbeat/attrd" as uid 100 gid 101 (pid 10353)
Feb 4 00:50:45 server2 heartbeat: [10354]: info: Starting "/usr/lib/heartbeat/crmd" as uid 100 gid 101 (pid 10354)
Feb 4 00:50:45 server2 heartbeat: [10355]: info: Starting "/usr/lib/heartbeat/mgmtd -v" as uid 0 gid 0 (pid 10355)
Feb 4 00:50:45 server2 cib: [10350]: WARN: crm_is_writable: /var/lib/heartbeat/crm/cib.xml should be owned and r/w by group haclient
Feb 4 00:50:45 server2 cib: [10350]: info: readCibXmlFile: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
Feb 4 00:50:45 server2 cib: [10350]: info: log_data_element: readCibXmlFile: [on-disk] <primitive id="GRP_webserverip_R" class="ocf" type="IPaddr" provider="heartbeat">
Feb 4 00:50:45 server2 stonithd: [10352]: info: Signing in with heartbeat.
Feb 4 00:50:45 server2 stonithd: [10352]: notice: /usr/lib/heartbeat/stonithd start up successfully.
Feb 4 00:50:52 server2 heartbeat: [10333]: WARN: 1 lost packet(s) for [server1] [47:49]
Feb 4 00:50:52 server2 heartbeat: [10333]: info: No pkts missing from server1!
Feb 4 00:50:53 server2 heartbeat: [10333]: WARN: 1 lost packet(s) for [server1] [54:56]
Feb 4 00:50:53 server2 heartbeat: [10333]: info: No pkts missing from server1!
Feb 4 00:51:01 server2 heartbeat: [10333]: WARN: 1 lost packet(s) for [server1] [68:70]
Feb 4 00:51:01 server2 heartbeat: [10333]: info: No pkts missing from server1!
Feb 4 00:53:00 server2 pengine: [10366]: info: native_print: GRP_webserverip_R (heartbeat::ocf:IPaddr): Stopped
Feb 4 00:53:00 server2 pengine: [10366]: info: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-37.bz2
Feb 4 00:53:00 server2 cibmon: [10348]: info: log_data_element: cib_update: + <lrm_resource id="GRP_webserverip_R" type="IPaddr" class="ocf" provider="heartbeat">
Feb 4 00:53:00 server2 cibmon: [10348]: info: log_data_element: cib_update: + <lrm_resource id="GRP_webserverip_R" type="IPaddr" class="ocf" provider="heartbeat">
Feb 4 00:53:01 server2 pengine: [10366]: info: native_print: GRP_webserverip_R (heartbeat::ocf:IPaddr): Stopped
Feb 4 00:53:01 server2 pengine: [10366]: info: process_pe_message: Transition 1: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-38.bz2
Feb 4 00:53:03 server2 pengine: [10366]: info: native_print: GRP_webserverip_R (heartbeat::ocf:IPaddr): Started server1
Feb 4 00:53:03 server2 pengine: [10366]: info: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-39.bz2
Feb 4 00:57:19 server2 pengine: [10366]: info: native_print: GRP_webserverip_R (heartbeat::ocf:IPaddr): Started server1
Feb 4 00:57:19 server2 pengine: [10366]: info: process_pe_message: Transition 3: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-40.bz2
Feb 4 00:57:33 server2 heartbeat: [10333]: WARN: node server1: is dead
Feb 4 00:57:33 server2 heartbeat: [10333]: info: Link server1:eth0 dead.
Feb 4 00:57:33 server2 heartbeat: [10333]: info: Link server1:eth1 dead.
Feb 4 00:57:34 server2 pengine: [10366]: info: native_print: GRP_webserverip_R (heartbeat::ocf:IPaddr): Started server2
Feb 4 00:57:34 server2 pengine: [10366]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-41.bz2
Feb 4 00:59:40 server2 heartbeat: [10333]: info: Heartbeat restart on node server1
Feb 4 00:59:40 server2 heartbeat: [10333]: info: Link server1:eth0 up.
Feb 4 00:59:40 server2 heartbeat: [10333]: info: Status update for node server1: status init
Feb 4 00:59:40 server2 heartbeat: [10333]: info: Link server1:eth1 up.
Feb 4 00:59:40 server2 heartbeat: [10333]: info: Status update for node server1: status up
Feb 4 00:59:40 server2 heartbeat: [10333]: info: all clients are now paused
Feb 4 00:59:41 server2 heartbeat: [10333]: info: Status update for node server1: status active
Feb 4 00:59:41 server2 heartbeat: [10333]: info: all clients are now resumed
Feb 4 00:59:46 server2 heartbeat: [10333]: WARN: 1 lost packet(s) for [server1] [17:19]
Feb 4 00:59:46 server2 heartbeat: [10333]: info: No pkts missing from server1!
Feb 4 00:59:47 server2 heartbeat: [10333]: WARN: 1 lost packet(s) for [server1] [21:23]
Feb 4 00:59:47 server2 heartbeat: [10333]: info: No pkts missing from server1!
Feb 4 00:59:53 server2 cibmon: [10348]: info: log_data_element: cib_update: - <lrm_resource id="GRP_webserverip_R" type="IPaddr" class="ocf" provider="heartbeat">
Feb 4 00:59:53 server2 pengine: [10366]: info: native_print: GRP_webserverip_R (heartbeat::ocf:IPaddr): Started server2
Feb 4 00:59:53 server2 pengine: [10366]: info: process_pe_message: Transition 5: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-42.bz2
Feb 4 00:59:55 server2 cibmon: [10348]: info: log_data_element: cib_update: + <lrm_resource id="GRP_webserverip_R" type="IPaddr" class="ocf" provider="heartbeat">
Feb 4 00:59:56 server2 pengine: [10366]: info: native_print: GRP_webserverip_R (heartbeat::ocf:IPaddr): Started server2
Feb 4 00:59:56 server2 pengine: [10366]: info: process_pe_message: Transition 6: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-43.bz2
Feb 4 00:59:57 server2 heartbeat: [10333]: WARN: 1 lost packet(s) for [server1] [42:44]
Feb 4 00:59:57 server2 heartbeat: [10333]: info: No pkts missing from server1!
Feb 4 01:00:04 server2 pengine: [10366]: info: native_print: GRP_webserverip_R (heartbeat::ocf:IPaddr): Started server2
Feb 4 01:00:04 server2 pengine: [10366]: info: process_pe_message: Transition 7: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-44.bz2
Feb 4 01:28:15 server2 heartbeat: [10333]: info: Link server1:eth0 dead.
Feb 4 01:28:21 server2 pengine: [10366]: info: native_print: GRP_webserverip_R (heartbeat::ocf:IPaddr): Started server1
Feb 4 01:28:21 server2 pengine: [10366]: info: process_pe_message: Transition 8: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-45.bz2
================================
More information about the Linux-HA
mailing list