[Linux-HA] Don't start all groups at once?

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Oct 2 12:52:52 MDT 2007


On Tue, Oct 02, 2007 at 08:48:51AM -0700, Kelly Byrd wrote:
> Thank you both for your suggestions. The real problem with my home grown
> resource is that the "start" completes in a few seconds, but the initial
> booting a VM load can be 30 seconds or so.

In your case the resource is a virtual machine. Typically, this
is a last resource to be started, but still it would be better
that you wait until the VM is really up _before_ telling the
cluster that the resource is started. I don't know if there is
such a service offered by vmware. Perhaps you could ping the VM's
network address. Or check in some other way that it is
operational. Some resource agents simply keep trying the monitor
operation until it succeeds. It's OK not to timeout, the cluster
will do that for you.

> I don't really see how or why
> heartbeat should know about that. Except maybe if the behavior of this "4
> at once" queue were configurable.

It's not.

> I'll play with these values and see how it works. BTW, what's the process
> for getting a new resource script reviewed and submitted? Of course, I'm
> assuming you all even want the VMwareServerVM resource.

Of course we do. You should talk to alanr at unix.sh to clear the
legal issues.

> One thing I did find interesting was that I had to make my "validate-all"
> function pretty worthless. It was validating that a file exists, but that
> file only exists if another resource (the FS resource in the same group)
> has been started. It wasn't clear to me that validate-all should make it's
> decisions based on information available when everything is stopped.

The validate-all should verify that the parameters passed are
good. It is optional. See


and in particular the OCF spec. I don't know when (and if)
the CRM invokes validate-all.



