[Linux-HA] possible race condition in OCF apache RA monitor

Dejan Muhamedagic dejanmm at fastmail.fm
Fri Oct 15 04:54:26 MDT 2010


Hi,

On Thu, Oct 14, 2010 at 04:15:12PM -0400, Buckingham, Brett wrote:
> I'm new to Pacemaker / OpenAIS / Corosync / Linux-HA, and have been
> going through the "Clusters from Scratch v2" tutorial at
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf.
> 
> I could not get the CRM to start apache, but could start it directly via
> "/etc/init.d/httpd start".   My symptoms were identical to that
> described in the Pacemaker mailing list post:
> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg02479.html
> 
> /var/log/httpd/error.log seemed to suggest that apache was starting,
> then catching a SIGTERM 1 second later.
> 
> I tried the suggested solutions (ensuring that apache's PidFile and
> ExtendedStatus were enabled), but this didn't work.
> 
> I debugged the RA directly, and found what I believe to be a race
> condition in the monitor_apache() function.  The first thing it does is
> to call silent_status(), which basically checks to see if there is a PID
> file for apache, and that there is a running process with that PID.  If
> that is true, it then calls monitor_apache_basic(), which, with my
> configuration, wget's http://localhost:80.  If the wget fails, return
> code 1 is passed upwards.
> 
> The race condition is that when apache is started, it is possible for it
> to have written it's PID file, but not yet completed its initialization
> to the point where the wget would succeed.  I was able to work around
> this problem by placing a simple "sleep 5" after starting httpd and the
> first call to monitor_apache().
> 
> I'm running Fedora 13 as a VirtualBox VM guest on a Win7 host, Pacemaker
> 1.1.3, Corosync 1.2.8, and apache RA from resource-agents 3.0.16.
> 
> I would appreciate it if someone more familiar with this RA could
> double-check my theory.

Sounds plausible. Interestingly, nobody so far reported this.
The usual problem with apache is that people don't load or setup
correctly the status module. I suppose that that's not the case
here.

Thanks,

Dejan

> Cheers,
> Brett
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list