[Linux-HA] Problem with Heartbeat

Dejan Muhamedagic dejanmm at fastmail.fm
Mon Oct 29 06:53:09 MDT 2007


Hi,

On Sun, Oct 28, 2007 at 11:19:32PM -0300, welisson at conectcor.com.br wrote:
> Hi all.
> 
> 
> Following  i have  2 servers, settings for function of firewall, with 
> configuration.
> 
> Server Master
> P4 3.0HT
> 2GB Ram
> 4 HD (2 used system and 2 to cache squid, firewall, Shaper and BGP-4)
> Motherboard Intel
> 
> 
> Server Slave
> P4 2.0
> 1GB Ram
> 2 HD
> Motherboard Intel without squid but used to firewall, shaper and BGP-4
> 
> what it occurs is the following one, I have heartbeat installed in the 
> two servers, and of some days for here, I am having problems with 
> heartbeat of it to fall and to come back, as it follows in log below 
> register in the main server:
> 
> 
> Oct 22 21:10:53 gateway heartbeat[19084]: WARN: Late heartbeat: Node 
> gateway2.domain.com.br: interval 12530 ms
> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: node 
> gateway2.domain.com.br: is dead
> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: No STONITH device 
> configured.
> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: Shared disks are not 
> protected.
> Oct 22 22:20:37 gateway heartbeat[19084]: info: Resources being 
> acquired from gateway2.domain.com.br.
> Oct 22 22:20:37 gateway heartbeat[19084]: info: Link 
> gateway2.domain.com.br:/dev/ttyS0 dead.
> Oct 22 22:20:38 gateway heartbeat: info: Running /etc/ha.d/rc.d/status 
> status
> Oct 22 22:20:38 gateway heartbeat: info: /usr/lib/heartbeat/mach_down: 
> nice_failback: foreign resources acquired
> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Cluster node 
> gateway2.domain.com.br returning after partition.
> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Deadtime value may be 
> too small.
> Oct 22 22:20:42 gateway heartbeat[19084]: info: See documentation for 
> information on tuning deadtime.
> Oct 22 22:20:42 gateway heartbeat[19084]: info: Link 
> gateway2.domain.com.br:/dev/ttyS0 up.
> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Late heartbeat: Node 
> gateway2.domain.com.br: interval 35790 ms

This indicates one of three possible problems: flakey
communications, high load, or a kernel scheduler problems.

Thanks,

Dejan

> Oct 22 22:20:42 gateway heartbeat[19084]: info: Status update for node 
> gateway2.domain.com.br: status active
> Oct 22 22:20:42 gateway heartbeat[19084]: info: mach_down takeover complete.
> Oct 22 22:20:42 gateway heartbeat: info: mach_down takeover complete 
> for node gateway2.domain.com.br.
> Oct 22 22:20:42 gateway heartbeat[14883]: info: Local Resource 
> acquisition completed.
> Oct 22 22:20:42 gateway heartbeat: info: Running /etc/ha.d/rc.d/status 
> status
> Oct 22 22:20:44 gateway heartbeat[19084]: info: Heartbeat shutdown in 
> progress. (19084)
> Oct 22 22:20:44 gateway heartbeat[16667]: info: Giving up all HA resources.
> Oct 22 22:20:44 gateway heartbeat: info: Releasing resource group: 
> gateway.domain.com.br 200.xxx.xxx.xxx/30/eth0 200.xxx.xxx.x6/30/eth1 
> 200.xxx.xxx.x7/29/eth2 firewall shaper
> Oct 22 22:20:44 gateway heartbeat: info: Running /etc/init.d/shaper stop
> Oct 22 22:20:46 gateway heartbeat: info: Running /etc/init.d/firewall stop
> Oct 22 22:20:46 gateway heartbeat: info: Running 
> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x7/29/eth2 stop
> Oct 22 22:20:47 gateway heartbeat: info: Running 
> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x6/30/eth1 stop
> Oct 22 22:20:47 gateway heartbeat: info: /sbin/route -n del -host 
> 200.xxx.xxx.x6
> Oct 22 22:20:47 gateway heartbeat: info: /sbin/ifconfig eth1:0 down
> Oct 22 22:20:47 gateway heartbeat: info: IP Address 200.xxx.xxx.x6 released
> Oct 22 22:20:47 gateway heartbeat: info: Running 
> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.xxx/30/eth0 stop
> Oct 22 22:20:47 gateway heartbeat[16667]: info: All HA resources 
> relinquished.
> Oct 22 22:20:47 gateway heartbeat[19084]: WARN: 1 lost packet(s) for 
> [gateway2.domain.com.br] [239455:239457]
> Oct 22 22:20:47 gateway heartbeat[19084]: info: No pkts missing from 
> gateway2.domain.com.br!
> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBFIFO process 
> 19086 with signal 15
> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBWRITE process 
> 19087 with signal 15
> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBREAD process 
> 19088 with signal 15
> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19088 
> exited. 3 remaining
> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19086 
> exited. 2 remaining
> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19087 
> exited. 1 remaining
> Oct 22 22:20:48 gateway heartbeat[19084]: info: Heartbeat shutdown complete.
> Oct 22 22:20:48 gateway heartbeat[19084]: info: Heartbeat restart triggered.
> Oct 22 22:20:48 gateway heartbeat[19084]: info: Restarting heartbeat.
> Oct 22 22:20:48 gateway heartbeat[19084]: info: Performing heartbeat 
> restart exec.
> Oct 22 22:21:19 gateway heartbeat[19084]: info: **************************
> Oct 22 22:21:19 gateway heartbeat[19084]: info: Configuration 
> validated. Starting heartbeat 1.2.5
> Oct 22 22:21:19 gateway heartbeat[19947]: info: heartbeat: version 1.2.5
> Oct 22 22:21:19 gateway heartbeat[19947]: info: Heartbeat generation: 23
> Oct 22 22:21:20 gateway heartbeat[19947]: info: Starting serial 
> heartbeat on tty /dev/ttyS0 (19200 baud)
> Oct 22 22:21:20 gateway heartbeat[19947]: info: pid 19947 locked in memory.
> Oct 22 22:21:20 gateway heartbeat[19947]: info: Local status now set to: 
> 'up'
> Oct 22 22:21:21 gateway heartbeat[19949]: info: pid 19949 locked in memory.
> Oct 22 22:21:21 gateway heartbeat[19950]: info: pid 19950 locked in memory.
> Oct 22 22:21:21 gateway heartbeat[19951]: info: pid 19951 locked in memory.
> Oct 22 22:21:21 gateway heartbeat[19947]: WARN: string2msg_ll: node 
> [gateway2.domain.com.br] failed authentication
> Oct 22 22:21:22 gateway heartbeat[19947]: info: Link 
> gateway2.domain.com.br:/dev/ttyS0 up.
> Oct 22 22:21:22 gateway heartbeat[19947]: info: Status update for node 
> gateway2.domain.com.br: status active
> Oct 22 22:21:22 gateway heartbeat[19947]: info: Local status now set 
> to: 'active'
> Oct 22 22:21:22 gateway heartbeat: info: Running /etc/ha.d/rc.d/status 
> status
> Oct 22 22:21:22 gateway heartbeat[19947]: info: remote resource 
> transition completed.
> Oct 22 22:21:22 gateway heartbeat[19947]: info: remote resource 
> transition completed.
> Oct 22 22:21:22 gateway heartbeat[19947]: info: Local Resource 
> acquisition completed. (none)
> Oct 22 22:21:23 gateway heartbeat[19947]: info: gateway2.domain.com.br 
> wants to go standby [foreign]
> Oct 22 22:21:35 gateway heartbeat[19947]: info: standby: acquire 
> [foreign] resources from gateway2.domain.com.br
> Oct 22 22:21:35 gateway heartbeat[19956]: info: acquire local HA 
> resources (standby).
> Oct 22 22:21:35 gateway heartbeat: info: Acquiring resource group: 
> gateway.domain.com.br 200.xxx.xxx.xxx/30/eth0 200.xxx.xxx.x6/30/eth1 
> 200.xxx.xxx.x7/29/eth2 firewall shaper
> Oct 22 22:21:35 gateway heartbeat: info: Running 
> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.xxx/30/eth0 start
> Oct 22 22:21:35 gateway heartbeat: info: /sbin/ifconfig eth0:0 
> 200.xxx.xxx.xxx netmask 255.255.255.252 broadcast 200.208.220.131
> Oct 22 22:21:35 gateway heartbeat: info: Sending Gratuitous Arp for 
> 200.xxx.xxx.xxx on eth0:0 [eth0]
> Oct 22 22:21:35 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010 
> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.xxx 
> eth0 200.xxx.xxx.xxx auto 200.xxx.xxx.xxx ffffffffffff
> Oct 22 22:21:35 gateway heartbeat: info: Running 
> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x6/30/eth1 start
> Oct 22 22:21:35 gateway heartbeat: info: /sbin/ifconfig eth1:0 
> 200.xxx.xxx.x6 netmask 255.255.255.252 broadcast 200.208.223.67
> Oct 22 22:21:35 gateway heartbeat: info: Sending Gratuitous Arp for 
> 200.xxx.xxx.x6 on eth1:0 [eth1]
> Oct 22 22:21:35 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010 
> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.x6 eth1 
> 200.xxx.xxx.x6 auto 200.xxx.xxx.x6 ffffffffffff
> Oct 22 22:21:36 gateway heartbeat: info: Running 
> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x7/29/eth2 start
> Oct 22 22:21:36 gateway heartbeat: info: /sbin/ifconfig eth2:0 
> 200.xxx.xxx.x7 netmask 255.255.255.248 broadcast 200.208.220.151
> Oct 22 22:21:36 gateway heartbeat: info: Sending Gratuitous Arp for 
> 200.xxx.xxx.x7 on eth2:0 [eth2]
> Oct 22 22:21:36 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010 
> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.x7 eth2 
> 200.xxx.xxx.x7 auto 200.xxx.xxx.x7 ffffffffffff
> Oct 22 22:21:36 gateway heartbeat: info: Running /etc/init.d/firewall start
> Oct 22 22:21:36 gateway heartbeat: info: Running /etc/init.d/shaper start
> Oct 22 22:21:41 gateway heartbeat[19956]: info: local HA resource 
> acquisition completed (standby).
> Oct 22 22:21:41 gateway heartbeat[19947]: info: Standby resource 
> acquisition done [foreign].
> Oct 22 22:21:41 gateway heartbeat[19947]: info: Initial resource 
> acquisition complete (auto_failback)
> Oct 22 22:21:41 gateway heartbeat[19947]: info: remote resource 
> transition completed.
> 
> ----------------------------------------------------------------
> Conectcor - velocidade com qualidade
> www.conectcor.com.br
> 
> 
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list