heartbeat restarting every morning?

Matt Stockdale mstockda at logicworks.net
Wed Dec 18 10:13:29 MST 2002


Wonderful. What's the most recent known working redhat kernel?

I'm assuming that if I compile a 2.4.20 kernel from source, I won't have this problem?

On Wed, Dec 18, 2002 at 09:59:03AM -0600, Brian Tinsley wrote:
>    Sounds like the now infamous Red Hat kernel bug:
> 
>    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=77058
> 
>    Although the report never explicitly mentions 2.4.18-10, I can personally
>    attest to the fact that it also suffers from this problem.
> 
>    Matt Stockdale wrote:
> 
>  Redhat 7.3, with the redhat 2.4.18-10 kernel.
> 
>  On Wed, Dec 18, 2002 at 08:27:01AM -0600, Brian Tinsley wrote:
>   
> 
>  What kernel/distribution are you using?
> 
> 
>  Matt Stockdale wrote:
> 
>     
> 
>  I've got a fairly simple ha setup for a firewall, but I'm seeing some strange behaviour every morning at 5:55am (give or take a few seconds)..
> 
>  A few snippets from the secondary machines log..
> 
>  heartbeat: 2002/12/18_05:54:09 info: Daily informational memory statistics
>  heartbeat: 2002/12/18_05:54:09 info: MSG stats: 100/85563 age 1 [pid25180/CONTROL]
>  heartbeat: 2002/12/18_05:54:09 info: ha_malloc stats: 2500/2224638  92800/49900 [pid25180/CONTROL]
>  heartbeat: 2002/12/18_05:54:09 info: RealMalloc stats: 94064 total malloc bytes. pid [25180/CONTROL]
>  heartbeat: 2002/12/18_05:54:09 info: MSG stats: 0/85563 age 1 [pid25182/HBWRITE]
>  heartbeat: 2002/12/18_05:54:09 info: ha_malloc stats: 0/2224638  0/0 [pid25182/HBWRITE]
>  heartbeat: 2002/12/18_05:54:09 info: RealMalloc stats: 1264 total malloc bytes. pid [25182/HBWRITE]
>  heartbeat: 2002/12/18_05:54:09 info: MSG stats: 0/128554 age 1 [pid25183/HBREAD]
>  heartbeat: 2002/12/18_05:54:09 info: ha_malloc stats: 0/3342398  0/0 [pid25183/HBREAD]
>  heartbeat: 2002/12/18_05:54:09 info: RealMalloc stats: 1264 total malloc bytes. pid [25183/HBREAD]
>  heartbeat: 2002/12/18_05:54:09 info: MSG stats: 0/299680 age 1 [pid25184/MST_STATUS]
>  heartbeat: 2002/12/18_05:54:09 info: ha_malloc stats: 0/6379673  0/0 [pid25184/MST_STATUS]
>  heartbeat: 2002/12/18_05:54:09 info: RealMalloc stats: 1696 total malloc bytes. pid [25184/MST_STATUS]
>  heartbeat: 2002/12/18_05:54:09 info: These are nothing to worry about.
>  heartbeat: 2002/12/18_05:55:22 WARN: node mailpat-pri: is dead
>  heartbeat: 2002/12/18_05:55:22 info: Resources being acquired from mailpat-pri.
>  heartbeat: 2002/12/18_05:55:22 WARN: node mailpat-sec: is dead
>  heartbeat: 2002/12/18_05:55:22 ERROR: No local heartbeat. Forcing shutdown.
>  heartbeat: 2002/12/18_05:55:22 info: Link mailpat-pri:eth2 dead.
>  heartbeat: 2002/12/18_05:55:22 WARN: Cluster node mailpat-prireturning after partition
>  heartbeat: 2002/12/18_05:55:22 info: giveup_resources: current status: active
>  heartbeat: 2002/12/18_05:55:22 info: killing notify world process group 27118 with signal 9
>  heartbeat: 2002/12/18_05:55:22 info: Heartbeat shutdown in progress. (25184)
>  heartbeat: 2002/12/18_05:55:22 info: Link mailpat-pri:eth2 up.
>  heartbeat: 2002/12/18_05:55:22 WARN: Late heartbeat: Node mailpat-pri: interval 5570 ms
>  heartbeat: 2002/12/18_05:55:22 info: Status update for node mailpat-pri: status active
>  heartbeat: 2002/12/18_05:55:22 info: Giving up all HA resources.
>  heartbeat: 2002/12/18_05:55:22 info: Heartbeat shutdown already underway.
>  heartbeat: 2002/12/18_05:55:22 WARN: node mailpat-sec: is dead
>  heartbeat: 2002/12/18_05:55:22 ERROR: No local heartbeat. Forcing shutdown.
>  heartbeat: 2002/12/18_05:55:22 info: heartbeat: version 0.4.9e
>  heartbeat: 2002/12/18_05:55:22 info: Running /etc/ha.d/rc.d/status status
>  heartbeat: 2002/12/18_05:55:22 info: Running /etc/ha.d/rc.d/status status
>  heartbeat: 2002/12/18_05:55:22 WARN: node mailpat-sec: is dead
>  heartbeat: 2002/12/18_05:55:22 ERROR: No local heartbeat. Forcing shutdown.
>  heartbeat: 2002/12/18_05:55:22 info: Taking over resource group IPaddr::206.252.135.253
> 
>  It's worrying that it sees mailpat-pri (the master node) as up, yet it continues to take over the IP resource anyway..
> 
>  I can't see anything that's activating a shutdown/restart, is hearbeat coded to do this at 5:55 am? it's been happening ever since I brought up the cluster a few days ago. heartbeat does this on both machines almost simultaneously. It's usually not a problem, as it gets back to normal in 10 seconds or so, but this morning the secondary machine didn't relinquish the resources, even after the primary machine took them back.
> 
>  What does "ERROR: No local heartbeat. Forcing shutdown." mean? this always seems to happen about a minute and 10 seconds after it prints out the daily informational memory statistics..
> 
>  There's not alot of configuration info here, if noone has seen anything similar I'd be happy to go into more detail.
> 
>  Thanks,
>   Matt
> 
>  
> 
>       
> 
>  --
> 
>  -[========================]-
>  -[      Brian Tinsley     ]-
>  -[ Chief Systems Engineer ]-
>  -[        Emageon         ]-
>  -[========================]-
> 
> 
>     
> 
>   
> 
>  --
> 
>  -[========================]-
>  -[      Brian Tinsley     ]-
>  -[ Chief Systems Engineer ]-
>  -[        Emageon         ]-
>  -[========================]-

-- 
---------------------------------------------------------------
Matt Stockdale            Sr. Network Engineer - logicworks.net
mstockda at logicworks.net            "Dura lex, sed lex"




More information about the Linux-HA mailing list