[Linux-HA] heartbeat version 1 behavior

Johan De Meersman jdm at operamail.com
Fri Feb 10 02:14:26 MST 2006


ResourceManager[3363]:  2006/02/08_12:48:32 info:
Running /etc/init.d/squid  start
ResourceManager[3363]:  2006/02/08_12:48:32 ERROR:
Return code 1 from /etc/init.d/squid
ResourceManager[3363]:  2006/02/08_12:48:32 CRIT:
Giving up resources due to failure of squid


First guess would be that the squid init script doesn't return OK when
asked to start while already running.


Pamela Rock wrote:

>I have a wierd senario (HB version 1 config) I'm
>hoping someone may provide some help with.  I'm
>testing heartbeat and all but one of my tests work. 
>The one test that fails is when I bring up the
>seconadary node while the primary is active.  The
>heartbeat will fail.
>
>If both nodes are up and I shutdown the primary, the
>secondary takes over as expected.  If I bring the
>secondary back on line, the primary takes over as
>expected.  If I shutdown the secondary, the primary
>nodes remains operational (again as expected).  But
>after I attempt to bring the secondary node back up,
>heartbeat stops working.  This to means that the
>secondary remains a single point of failure for the
>entire system.
>
>I'm not sure if this is relevent but we are using a
>second NIC for the heartbeat.  My better refuses to
>use the recommended serial cable.
>
>The error in /var/log/ha-log is ERROR: Return code 1
>from /etc/init.d/squid (complete error log below)
>
>(Incidently, Squid works fine by itself (that is, the
>start, stop, and status processes work as expected.)
>
>I hope someone can help.  The following is some info
>regarding my setup and config.
>
>Running RH ES3 with HB version 2.0.2.
>
>The config on the primary (secondary is very similar)
>node is:
>
>haresources: 
>server1 10.15.0.15 squid sfagent_control
>sfserver_control
>
>ha.cf:
>debugfile /var/log/ha-debug
>logfile /var/log/ha-log
>logfacility     local0
>keepalive 2
>deadtime 30
>warntime 10
>initdead 120
>bcast   eth1 
>auto_failback on
>node    server1
>node    server2
>ping 10.15.0.254
>respawn hacluster /usr/lib/heartbeat/ipfail
>
>ha-log on the primary node:
>
>heartbeat[2582]: 2006/02/08_12:43:33 info: Link
>server2:eth1 up.
>heartbeat[2599]: 2006/02/08_12:43:33 info: pid 2599
>locked in memory.
>heartbeat[2600]: 2006/02/08_12:43:33 info: pid 2600
>locked in memory.
>heartbeat[2582]: 2006/02/08_12:43:33 info: Status
>update for node server2: status active
>heartbeat[2582]: 2006/02/08_12:43:33 info: Link
>10.15.0.254:10.15.0.254 up.
>heartbeat[2582]: 2006/02/08_12:43:33 info: Status
>update for node 10.15.0.254: status ping
>heartbeat[2582]: 2006/02/08_12:43:33 info: Local
>status now set to: 'active'
>heartbeat[2582]: 2006/02/08_12:43:33 info: Starting
>child client "/usr/lib/heartbeat/ipfail" (502,502)
>harc[2607]:     2006/02/08_12:43:33 info: Running
>/etc/ha.d/rc.d/status status
>heartbeat[2608]: 2006/02/08_12:43:34 info: Starting
>"/usr/lib/heartbeat/ipfail" as uid 502  gid 502 (pid
>2608)
>heartbeat[2582]: 2006/02/08_12:43:34 info: Link
>server1:eth1 up.
>heartbeat[2582]: 2006/02/08_12:43:34 info: remote
>resource transition completed.
>heartbeat[2582]: 2006/02/08_12:43:34 info: remote
>resource transition completed.
>heartbeat[2582]: 2006/02/08_12:43:34 info: Local
>Resource acquisition completed. (none)
>heartbeat[2582]: 2006/02/08_12:43:34 info: server2
>wants to go standby [foreign]
>heartbeat[2582]: 2006/02/08_12:43:42 info: standby:
>acquire [foreign] resources from server2
>heartbeat[2687]: 2006/02/08_12:43:42 info: acquire
>local HA resources (standby).
>ResourceManager[2697]:  2006/02/08_12:43:42 info:
>Acquiring resource group: server1 10.15.0.15 squid
>sfagent_control sfserver_control
>ResourceManager[2697]:  2006/02/08_12:43:42 info:
>Running /etc/ha.d/resource.d/IPaddr 10.15.0.15 start
>IPaddr[2755]:   2006/02/08_12:43:43 info:
>/sbin/ifconfig eth0:0 10.15.0.15 netmask 255.255.0.0 
>broadcast 10.15.255.255
>IPaddr[2755]:   2006/02/08_12:43:43 info: Sending
>Gratuitous Arp for 10.15.0.15 on eth0:0 [eth0]
>IPaddr[2755]:   2006/02/08_12:43:43
>/usr/lib/heartbeat/send_arp -i 500 -r 10 -p
>/var/run/heartbeat/rsctmp/send_arp/send_arp-10.15.0.15
>eth0 10.15.0.15 auto 10.15.0.15 ffffffffffff
>ResourceManager[2697]:  2006/02/08_12:43:44 info:
>Running /etc/init.d/squid  start
>ResourceManager[2697]:  2006/02/08_12:43:46 info:
>Running /etc/init.d/sfagent_control  start
>ResourceManager[2697]:  2006/02/08_12:43:47 info:
>Running /etc/init.d/sfserver_control  start
>heartbeat[2687]: 2006/02/08_12:43:49 info: local HA
>resource acquisition completed (standby).
>heartbeat[2582]: 2006/02/08_12:43:49 info: Standby
>resource acquisition done [foreign].
>heartbeat[2582]: 2006/02/08_12:43:50 info: Initial
>resource acquisition complete (auto_failback)
>heartbeat[2582]: 2006/02/08_12:43:50 info: remote
>resource transition completed.
>heartbeat[2582]: 2006/02/08_12:45:53 WARN: node
>server2: is dead
>heartbeat[2582]: 2006/02/08_12:45:53 WARN: No STONITH
>device configured.
>heartbeat[2582]: 2006/02/08_12:45:53 WARN: Shared
>disks are not protected.
>heartbeat[2582]: 2006/02/08_12:45:53 info: Resources
>being acquired from server2.
>heartbeat[2582]: 2006/02/08_12:45:53 info: Link
>server2:eth1 dead.
>harc[3236]:     2006/02/08_12:45:53 info: Running
>/etc/ha.d/rc.d/status status
>mach_down[3248]:        2006/02/08_12:45:53 info:
>/usr/lib/heartbeat/mach_down: nice_failback: foreign
>resources acquired
>heartbeat[2582]: 2006/02/08_12:45:53 info: mach_down
>takeover complete.
>mach_down[3248]:        2006/02/08_12:45:53 info:
>mach_down takeover complete for node server2.
>heartbeat[3238]: 2006/02/08_12:45:53 info: Local
>Resource acquisition completed.
>heartbeat[2582]: 2006/02/08_12:48:19 info: Heartbeat
>restart on node server2
>heartbeat[2582]: 2006/02/08_12:48:19 info: Link
>server2:eth1 up.
>heartbeat[2582]: 2006/02/08_12:48:19 info: Status
>update for node server2: status init
>heartbeat[2582]: 2006/02/08_12:48:19 info: Status
>update for node server2: status up
>harc[3313]:     2006/02/08_12:48:19 info: Running
>/etc/ha.d/rc.d/status status
>harc[3323]:     2006/02/08_12:48:19 info: Running
>/etc/ha.d/rc.d/status status
>heartbeat[2582]: 2006/02/08_12:48:20 info: Status
>update for node server2: status active
>heartbeat[2582]: 2006/02/08_12:48:20 info: remote
>resource transition completed.
>heartbeat[2582]: 2006/02/08_12:48:20 info: server1
>wants to go standby [foreign]
>heartbeat[2582]: 2006/02/08_12:48:20 info: standby:
>server2 can take our foreign resources
>heartbeat[3334]: 2006/02/08_12:48:20 info: give up
>foreign HA resources (standby).
>harc[3333]:     2006/02/08_12:48:20 info: Running
>/etc/ha.d/rc.d/status status
>heartbeat[3334]: 2006/02/08_12:48:20 info: foreign HA
>resource release completed (standby).
>heartbeat[2582]: 2006/02/08_12:48:20 info: Local
>standby process completed [foreign].
>heartbeat[2582]: 2006/02/08_12:48:21 WARN: 1 lost
>packet(s) for [server2] [10:12]
>heartbeat[2582]: 2006/02/08_12:48:21 info: remote
>resource transition completed.
>heartbeat[2582]: 2006/02/08_12:48:21 info: No pkts
>missing from server2!
>heartbeat[2582]: 2006/02/08_12:48:21 info: Other node
>completed standby takeover of foreign resources.
>heartbeat[2582]: 2006/02/08_12:48:27 info: server2
>wants to go standby [foreign]
>heartbeat[2582]: 2006/02/08_12:48:32 info: standby:
>acquire [foreign] resources from server2
>heartbeat[3353]: 2006/02/08_12:48:32 info: acquire
>local HA resources (standby).
>ResourceManager[3363]:  2006/02/08_12:48:32 info:
>Acquiring resource group: server1 10.15.0.15 squid
>sfagent_control sfserver_control
>ResourceManager[3363]:  2006/02/08_12:48:32 info:
>Running /etc/init.d/squid  start
>ResourceManager[3363]:  2006/02/08_12:48:32 ERROR:
>Return code 1 from /etc/init.d/squid
>ResourceManager[3363]:  2006/02/08_12:48:32 CRIT:
>Giving up resources due to failure of squid
>ResourceManager[3363]:  2006/02/08_12:48:32 info:
>Releasing resource group: server1 10.15.0.15 squid
>sfagent_control sfserver_control
>ResourceManager[3363]:  2006/02/08_12:48:32 info:
>Running /etc/init.d/sfserver_control  stop
>ResourceManager[3363]:  2006/02/08_12:48:39 info:
>Running /etc/init.d/sfagent_control  stop
>ResourceManager[3363]:  2006/02/08_12:48:40 info:
>Running /etc/init.d/squid  stop
>ResourceManager[3363]:  2006/02/08_12:48:40 info:
>Running /etc/ha.d/resource.d/IPaddr 10.15.0.15 stop
>IPaddr[3571]:   2006/02/08_12:48:40 info: /sbin/route
>-n del -host 10.15.0.15
>IPaddr[3571]:   2006/02/08_12:48:40 info:
>/sbin/ifconfig eth0:0 down
>IPaddr[3571]:   2006/02/08_12:48:40 info: IP Address
>10.15.0.15 released
>heartbeat[3353]: 2006/02/08_12:48:40 info: local HA
>resource acquisition completed (standby).
>heartbeat[2582]: 2006/02/08_12:48:40 info: Standby
>resource acquisition done [foreign].
>heartbeat[2582]: 2006/02/08_12:48:40 info: remote
>resource transition completed.
>heartbeat[2582]: 2006/02/08_12:50:53 WARN: Shutdown
>delayed until current resource activity finishes.
>heartbeat[2582]: 2006/02/08_12:50:54 info: Heartbeat
>shutdown in progress. (2582)
>heartbeat[3708]: 2006/02/08_12:50:54 info: Giving up
>all HA resources.
>heartbeat[2582]: 2006/02/08_12:50:54 info: Received
>shutdown notice from 'server2'.
>heartbeat[2582]: 2006/02/08_12:50:54 info: Resource
>takeover cancelled - shutdown in progress.
>ResourceManager[3718]:  2006/02/08_12:50:54 info:
>Releasing resource group: server1 10.15.0.15 squid
>sfagent_control sfserver_control
>ResourceManager[3718]:  2006/02/08_12:50:54 info:
>Running /etc/init.d/sfserver_control  stop
>ResourceManager[3718]:  2006/02/08_12:50:54 info:
>Running /etc/init.d/sfagent_control  stop
>ResourceManager[3718]:  2006/02/08_12:50:54 info:
>Running /etc/init.d/squid  stop
>ResourceManager[3718]:  2006/02/08_12:50:54 info:
>Running /etc/ha.d/resource.d/IPaddr 10.15.0.15 stop
>heartbeat[3708]: 2006/02/08_12:50:55 info: All HA
>resources relinquished.
>heartbeat[2582]: 2006/02/08_12:50:56 info: killing
>/usr/lib/heartbeat/ipfail process group 2608 with
>signal 15
>heartbeat[2582]: 2006/02/08_12:50:57 info: killing
>HBFIFO process 2596 with signal 15
>heartbeat[2582]: 2006/02/08_12:50:57 info: killing
>HBWRITE process 2597 with signal 15
>heartbeat[2582]: 2006/02/08_12:50:57 info: killing
>HBREAD process 2598 with signal 15
>heartbeat[2582]: 2006/02/08_12:50:57 info: killing
>HBWRITE process 2599 with signal 15
>heartbeat[2582]: 2006/02/08_12:50:57 info: killing
>HBREAD process 2600 with signal 15
>heartbeat[2582]: 2006/02/08_12:50:57 info: Core
>process 2596 exited. 5 remaining
>heartbeat[2582]: 2006/02/08_12:50:57 info: Core
>process 2597 exited. 4 remaining
>heartbeat[2582]: 2006/02/08_12:50:57 info: Core
>process 2598 exited. 3 remaining
>heartbeat[2582]: 2006/02/08_12:50:57 info: Core
>process 2599 exited. 2 remaining
>heartbeat[2582]: 2006/02/08_12:50:57 info: Core
>process 2600 exited. 1 remaining
>heartbeat[2582]: 2006/02/08_12:50:57 info: Heartbeat
>shutdown complete.
>
>
>Thanks in advance...
>
>Pamela
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>_______________________________________________
>Linux-HA mailing list
>Linux-HA at lists.linux-ha.org
>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>See also: http://linux-ha.org/ReportingProblems
>
>  
>


-- 
	**Johan De Meersman**
System Engineer
Belgacom Skynet

Phone: +32 2 706 1164
Fax: +32 2 706 13 12
Mobile: +32 478 789 214
E-mail: johan at team.skynet.be <mailto:johan at team.skynet.be>
Website: http://www.skynet.be/

------------------------------------------------------------------------

http://www.kidcity.be/ - http://www.arena51.be/ - http://www.adsl.be/
http://www.justforyou.be/ - http://www.carsforyou.be/ -
http://www.justforbusiness.be/


-- 
When you are about to die, a wombat is better than no company at all.
		-- Roger Zelazny, "Doorways in the Sand"
-- 

Public GPG key at blackhole.pca.dfn.de

GCS/IT d- s:+ a- C(+++)$ UL++++$ P+++(++++)$ L++(+++)$ !E- W+(+++)$
N+(++) o K w$ !O !M V PS(++)@ PE-(++)@ Y+ PGP++(+++) t(+) 5 X R tv--
b++(++++) DI++(++++) D++ G e++>+++++ h(+) r y+**

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux-ha.org/pipermail/linux-ha/attachments/20060210/6b88a25e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 1941 bytes
Desc: not available
URL: <http://lists.linux-ha.org/pipermail/linux-ha/attachments/20060210/6b88a25e/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: OpenPGP digital signature
URL: <http://lists.linux-ha.org/pipermail/linux-ha/attachments/20060210/6b88a25e/attachment.pgp>


More information about the Linux-HA mailing list