[Linux-ha-dev] Using ha_logd can cause blocking in clients

Andrew Beekhof (GMail) beekhof at gmail.com
Mon Apr 4 05:13:59 MDT 2005


I think that much of the delayed message problems we've been having are 
related to how clients use ha_logd.

Right now, if clients exceed their queue to ha_logd, then they do a 
direct log (which in my case will call syslog directly) where it is 
quite plausible that they will block.

It probably also explains why messages are logged out of sync (if you 
remember my bringing that up a while back) since logs sent directly to 
syslog can easily come out before previous ones in the queue to 
ha_logd.

It would also explain why once it starts that it gets progressively 
worse, since blocking causes late messages, which causes more logs, 
which block in syslog, which causes more late messages, etc etc.

I'm about to try with the following patch and expect it to help 
(basically it discards the overflow logs).  Gshi, maybe you can improve 
on it and commit something like it.  Perhaps we need a ha.cf directive 
for determining the behavior when the log queue is full... 
(discard/block/other?).

While on the topic, i think it would be helpful to include some subtle 
way of indicating if a message was logged via ha_logd or directly via 
syslog.  That would show things like this up a lot earlier.

Andrew

Btw. the -dev mailing list seems to be misbehaving, hence the CC's

--- cl_log.c	17 Mar 2005 09:16:06 +0100	1.43
+++ cl_log.c	04 Apr 2005 13:11:44 +0200	
@@ -330,6 +330,8 @@
   * non-blocking IPC.
   */

+gboolean last_log_failed = FALSE;
+
  /* Cluster logging function */
  void
  cl_log(int priority, const char * fmt, ...)
@@ -376,15 +378,27 @@
  		return_to_orig_privs();
  	}
  	
-	if ( use_logging_daemon &&
-	     cl_log_depth <= 1 &&
-	     LogToLoggingDaemon(priority, buf, nbytes + 1, TRUE) == HA_OK){
-		goto LogDone;
+	if ( use_logging_daemon && cl_log_depth <= 1) {
+		if(LogToLoggingDaemon(priority, buf, nbytes + 1, TRUE) != HA_OK){
+			/* uhm? */
+			char msg[] = "Logging overflow,"
+				" discarding logs until congestion eases";
+			if(last_log_failed == FALSE) {
+				cl_direct_log(LOG_WARNING, msg, TRUE, NULL,
+					      cl_process_pid, NULLTIME);
+				last_log_failed = TRUE;
+			}
+			
+		} else {
+			last_log_failed = FALSE;
+		}
+		
+		
  	}else {
+		/* this may cause blocking... maybe should make it optional? */
  		cl_direct_log(priority, buf, TRUE, NULL, cl_process_pid, NULLTIME);
  	}
  	
- LogDone:
  	cl_log_depth--;
  	return;
  }
  	
--
Andrew Beekhof

"No means no, and no means yes, and everything in between and all the 
rest" - TISM



More information about the Linux-HA-Dev mailing list