[Linux-HA] the return code of failing start action

Andrew Beekhof beekhof at gmail.com
Fri Oct 5 04:26:23 MDT 2007


On 10/4/07, Junko IKEDA <ikedaj at intellilink.co.jp> wrote:
> Hi,
>
> when I tried the following case,
> the return code of start action was something strange.
>
> 1) There are two node; active and standby node
> 2) one resource is running on the active node
> 3) SplitBrain came up!

you created a split brain or it occurred on its own?

> 4) the resource would be going to start on the both node,

you dont have stonith configured right?

because this is exactly the reason why two-node clusters, particularly
ones without stonith configured are a seriously bad idea.

at least configure pingd so that only one side will try and run the resources

>    I drive it into failure on purpose on the standby node.
>    so, the return code of start action would be -1 on standby.
>    (it worked well)

-1 means "timed out"... thats not a good value to return from an RA

the whole concept of trying to handle this is in a resource's start
action is a horrible substitute for a correctly configured cluster.
continuing down this path will only lead to pain.

> 5) after recovering SplitBrain, the return code on standby node was "-2"...
>    and crm_mon on the active node also showed it as -2.
>
> Why is it incremented?

i'm not sure i follow this anymore... which return code are you talking about?
if you're talking about the one from the start action, it is never
modified in any way

> The fail count for this action was reset.

actions dont have failcounts.... resources do, so again, i'm not 100%
sure what you're talking about here

> Is the fail for start action special?

not in any way that would result in it being incremented if thats what you mean


More information about the Linux-HA mailing list