[Linux-HA] Re: Heartbeat - all nodes pause after about 7 mins on FC3

Guochun Shi gshi at ncsa.uiuc.edu
Mon Mar 21 11:09:09 MST 2005


Jason, 

It is a bug --- heartbeat does not deal with ping node in flow control. It is now fixed in CVS, I attached the patch at the end of the email. 
Thanks a lot for testing and reporting problems

-Guochun

Index: heartbeat.c
===================================================================
RCS file: /home/cvs/linux-ha/linux-ha/heartbeat/heartbeat.c,v
retrieving revision 1.381
retrieving revision 1.382
diff -u -r1.381 -r1.382
--- heartbeat.c 21 Mar 2005 17:51:57 -0000      1.381
+++ heartbeat.c 21 Mar 2005 18:05:06 -0000      1.382
@@ -2,7 +2,7 @@
  * TODO:
  * 1) Man page update
  */
-/* $Id: heartbeat.c,v 1.381 2005/03/21 17:51:57 gshi Exp $ */
+/* $Id: heartbeat.c,v 1.382 2005/03/21 18:05:06 gshi Exp $ */
 /*
  * heartbeat: Linux-HA heartbeat code
  *
@@ -1960,17 +1960,15 @@
                int     minidx;
                int     i;

+               hist->lowest_acknode = NULL;
                minidx = -1;
                minseq = 0;
                for (i = 0; i < config->nodecount; i++){
                        struct node_info* hip = &config->nodes[i];

                        if (STRNCMP_CONST(hip->status,DEADSTATUS) == 0
-                       ||      STRNCMP_CONST(hip->status, INITSTATUS) == 0){
-
-                               if (hist->lowest_acknode == hip){
-                                       hist->lowest_acknode = NULL;
-                               }
+                           || STRNCMP_CONST(hip->status, INITSTATUS) == 0
+                           || hip->nodetype == PINGNODE_I){
                                continue;
                        }

@@ -5192,6 +5190,10 @@





At 11:07 AM 3/21/2005 -0500, you wrote:
>Hi Guochun!
> 
>Thanks for the reply. As soon as I turned on the extra debugging, I think I saw the issue. I setup a node to be ping'd for ipfail. It was the default gateway. It appeared that heartbeat was looking for a sequence number from the gateway! I deleted this node from the config, and the pausing issue has stopped. Unfortunately, failover doesn't work since it doesn't seem to have a node to ping (thus ipfail is not working).
> 
>Am I doing something wrong? Could you pass along a sample config? I've pored over different configs, but I don't see what my error would be. As I understand the ping directive, it just needs to be a highly available node, not necessarily a heartbeat node. I tried pointing the pings at each node's neighbor, but that didn't work. I also tried unicast/bcast heartbeats, but since the ping directive wasn't pointing to anything, nothing happened.
> 
>Would you like me to re-run the debugs, or, does this info help? Thanks very much for your assistance!
> 
> 
>-Jason
> 
> 
>--
>Jason Whiteaker
>Sr. Network Engineer
>STAR Financial Bank Service Center
>6230 Bluffton Road
>P.O. Box 11409
>Fort Wayne, IN 46858-1409
>Tel:  +1.260.479.2572
>Fax:  +1.260.479.2573
>E-mail:  <mailto:jason.whiteaker at starfinancial.com>jason.whiteaker at starfinancial.com
> 
>NetworkServices.jpg
> 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux-ha.org/pipermail/linux-ha/attachments/20050321/2d8a513b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NetworkServices.jpg
Type: image/jpeg
Size: 2753 bytes
Desc: not available
URL: <http://lists.linux-ha.org/pipermail/linux-ha/attachments/20050321/2d8a513b/attachment.jpg>


More information about the Linux-HA mailing list