[Linux-HA] the return code of failing start action
Andrew Beekhof
beekhof at gmail.com
Fri Oct 5 04:26:23 MDT 2007
On 10/4/07, Junko IKEDA <ikedaj at intellilink.co.jp> wrote:
> Hi,
>
> when I tried the following case,
> the return code of start action was something strange.
>
> 1) There are two node; active and standby node
> 2) one resource is running on the active node
> 3) SplitBrain came up!
you created a split brain or it occurred on its own?
> 4) the resource would be going to start on the both node,
you dont have stonith configured right?
because this is exactly the reason why two-node clusters, particularly
ones without stonith configured are a seriously bad idea.
at least configure pingd so that only one side will try and run the resources
> I drive it into failure on purpose on the standby node.
> so, the return code of start action would be -1 on standby.
> (it worked well)
-1 means "timed out"... thats not a good value to return from an RA
the whole concept of trying to handle this is in a resource's start
action is a horrible substitute for a correctly configured cluster.
continuing down this path will only lead to pain.
> 5) after recovering SplitBrain, the return code on standby node was "-2"...
> and crm_mon on the active node also showed it as -2.
>
> Why is it incremented?
i'm not sure i follow this anymore... which return code are you talking about?
if you're talking about the one from the start action, it is never
modified in any way
> The fail count for this action was reset.
actions dont have failcounts.... resources do, so again, i'm not 100%
sure what you're talking about here
> Is the fail for start action special?
not in any way that would result in it being incremented if thats what you mean
More information about the Linux-HA
mailing list