[Linux-HA] possible race condition in OCF apache RA monitor

Lars Ellenberg lars.ellenberg at linbit.com
Fri Oct 15 13:04:30 MDT 2010


On Fri, Oct 15, 2010 at 01:16:21PM -0400, Buckingham, Brett wrote:
> >> The race condition is that when apache is started, it is possible for
> it
> >> to have written it's PID file, but not yet completed its
> initialization
> >> to the point where the wget would succeed.  I was able to work around
> >> this problem by placing a simple "sleep 5" after starting httpd and
> the
> >> first call to monitor_apache().
> 
> >If that's the case, then the start action should loop on
> >monitor_apache internally until that returns Ok.
> >That way, start will only return once monitoring does actually work.
> >Bonus: you get a start failure already, if monitoring is not configured
> >properly.
> 
> >... looking at the code ...
> >Wait. It does that already, since May 2007.
> 
> start_apache() only loops monitoring if monitor_apache() returns
> $OCF_NOT_RUNNING (7).  monitor_apache() returned 1 ($OCF_ERR_GENERIC)
> due to the control flow described above.
> 
> I think what is needed is specific monitoring logic for apache startup
> which allows for the PID file to be there but some period of time before
> an HTTP request is returned.  Once apache is running, I agree that the
> monitor_apache() function, which requires the PID file, process matching
> the pid, and a successful wget is OK.

Ok, so the problem is that monitor_apache_basic
(and monitor_apache_extended) return one of $OCF_ERR_CONFIGURED,
$OCF_SUCCESS or $OCF_GENERIC (or even $OCF_ERR_ARGS, if you get grep to
exit with 2 for a bad parameter...),
but the loop breaks for anything != $OCF_NOT_RUNNING.

what about the patchlet below.

diff --git a/heartbeat/apache b/heartbeat/apache
--- a/heartbeat/apache
+++ b/heartbeat/apache
@@ -404,19 +404,11 @@ start_apache() {
     return $OCF_SUCCESS
   fi
   ocf_run $HTTPD $HTTPDOPTS $OPTIONS -f $CONFIGFILE
-  ...
+  # loop until we are killed because of start action timeout,
+  # or monitor returns successs, whatever comes first.
+  while ! monitor_apache; do
+	ocf_log info "waiting for apache $CONFIGFILE to come up"
+	sleep 1
   done
-  ...

possible todo:
 * log should probably not be done every loop iteration
 * exit early for $OCF_ERR_CONFIGURED
   * bonus points:
     OCF_ERR_CONFIGURED is only a start failure,
     if a monitor action is configured?
 * maybe don't even enter the loop, of ocf_run exits != 0?

Second iteration below, just written down, not tested even once.

Please comment.

diff --git a/heartbeat/apache b/heartbeat/apache
--- a/heartbeat/apache
+++ b/heartbeat/apache
@@ -404,24 +404,46 @@ start_apache() {
     return $OCF_SUCCESS
   fi
   ocf_run $HTTPD $HTTPDOPTS $OPTIONS -f $CONFIGFILE
+  # loop until we are killed because of start action timeout,
+  # or monitor returns a final exit code, whatever comes first.
   tries=0
-  while :  # wait until the user set timeout
-  do
+  while :; do
     monitor_apache
-	ec=$?
-	if [ $ec -eq $OCF_NOT_RUNNING ]
-	then
-		tries=`expr $tries + 1`
-		ocf_log info "waiting for apache $CONFIGFILE to come up"
-		sleep 1
-	else
-		break
-	fi
+    rc=$?
+    case $rc in
+    $OCF_SUCCESS)
+      return $OCF_SUCCESS
+      ;;
+    $OCF_ERR_CONFIGURED)
+      # Is only returned if silent_status was ok already, that means
+      # the apache process is at least (or, was...) running,
+      # but monitor_apache_basic failed.
+      #
+      # Possibly the parameters necessary for monitor_apache_basic
+      # are wrong, but maybe they are just missing.
+      #
+      # If the user has not configured a monitor action, that's not fatal.
+      # If he has, then the next run of that monitor action will pick
+      # it up anyways. So we just return success here.
+      return $OCF_SUCCESS
+      ;;
+    $OCF_NOT_RUNNING|$OCF_ERR_GENERIC)
+      # not ready yet
+      ;;
+    *)
+      # should not happen. But treat as "not ready yet", anyways.
+      ;;
+    esac
+    
+    # even though this may look like a bashism, it is not,
+    # but POSIX since at least 1997.
+    : $((tries=tries + 1))
+    if [ $((tries % 20)) = 2 ] ; then
+      ocf_log info "waiting for apache $CONFIGFILE to come up"
+    fi
+    sleep 1
   done
-	if [ $ec -ne 0 ] && silent_status; then
-		stop_apache
-	fi
-	return $ec
+  # not reached.
 }
 
 stop_apache() {
@@ -496,7 +518,7 @@ monitor_apache_extended() {
   fixtesturl
   is_testconf_sane ||
     return $OCF_ERR_CONFIGURED
-  $whattorun "$test_url" | grep -Ei "$test_regex" > /dev/null
+  $whattorun "$test_url" | grep -Eie "$test_regex" > /dev/null
 }
 monitor_apache_basic() {
   if [ -z "$STATUSURL" ]; then
@@ -506,7 +528,7 @@ monitor_apache_basic() {
     ocf_log err "could not find a http client; make sure that either wget or curl is available"
 	return $OCF_ERR_CONFIGURED
   fi
-  ${ourhttpclient}_func "$STATUSURL" | grep -Ei "$TESTREGEX" > /dev/null
+  ${ourhttpclient}_func "$STATUSURL" | grep -Eie "$TESTREGEX" > /dev/null
 }
 monitor_apache() {
   silent_status

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.



More information about the Linux-HA mailing list