[Linux-HA] standby does not take over on multiple power failure

Thomas Åkerblom (HF/EBC) thomas.akerblom at ericsson.com
Mon Jun 4 02:19:20 MDT 2007


Hi Andrew.
I'm using 2.0.8-0.15, but I have seen the same behavior in 2.0.7.
In this case ha-9 is DC and also the standby server.
ha-8 has no power, but the standby server does not take over.
The logs begin right before I pulled the power cord.

Actually I do know how to get around this problem now, but I also have some new questions.
If I remove the line:
<nvpair id="default_resource_failure_stickiness" name="default_resource_failure_stickiness" value="-INFINITY"/>
In the cib file the problem disappears.
I wouldn't expect that parameter to have this effect, rather the opposite.
Is this a known/expected correlation?

I would like to set that parameter in order to be able to use the failure counters.
Furthermore I am not able to read and reset the counters using:

crm_failcount -G -U ha-8 -r rsc_lim8
	The result is always 0

crm_failcount -D -U ha-8 -r rsc_lim8
	Error performing operation: The object/attribute does not exist.

Anyway:
The initial problem can be considered to be solved (for my part).
BR.
Thomas

 *** Thomas
This communication is confidential and intended solely for the addressee(s). Any unauthorized review, use, disclosure or distribution is prohibited. If you believe this message has been sent to you in error, please notify the sender by replying to this transmission and delete the message without disclosing it. Thank you.
E-mail including attachments is susceptible to data corruption, interruption, unauthorized amendment, tampering and viruses, and we only send and receive e-mails on the basis that we are not liable for any such corruption, interception, amendment, tampering or viruses or any consequences thereof.


-----Original Message-----
From: linux-ha-bounces at lists.linux-ha.org [mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of Andrew Beekhof
Sent: den 29 maj 2007 10:04
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] standby does not take over on multiple power failure

On 5/21/07, Thomas Åkerblom (HF/EBC) <thomas.akerblom at ericsson.com> wrote:
> Hi.
> A have a system with nodes ha-1, ha-2, ha-3, ha-7, ha-8 and standby server ha-9.
> When I pull the power cord to ha-7 ha-9 will take over.
> Then I insert the power cord again and wait for ha-7 to start.
> Sometime during start, close to the start of heartbeat I pull the power cord to ha-8.
> ha-9 tries to take over but fails.
> Eventually ha-7 is up and running, ha-8 is down (no power) and ha-9 is in standby mode.
>
> If I insert the power cord to ha-8 it will come up but from this point ha-9 will never take over if ha-8 fails again.
> If I restart ha-9 the situation is normalized and ha-9 will take over if any of the nodes fail.
> This example is valid for all nodes in my system. The priority for ha-9 is to prefer the node with the lowest number (if ha-7 and ha-8 fail simultaneously, it will take over ha-7).
> The problem as described seem to occur when pulling the cord to the node with the lower number first (7) and then the node with the higher number (8).
> It does not occur the other way around (8 first and then 7), at least it is harder to produce.

first question - version?

>
> I append ha.cf cib.xml, ha-log, ha-debug and /var/log/messages from the standby ha-9.

unfortunately i need the logs from the machine making the decisions - the DC.
you can find out which machine this is by running:
   crmadmin -D

if you send me the logs from there i'll be able to help you more.

>  <<power 2.zip>>

and in case you're interested

       <rsc_location id="run_group_lim8" rsc="group_lim8">
         <rule id="pref_run_group_lim8" score="100" boolean_op="and">
           <expression id="exp_pref_run_group_lim8" attribute="#uname"
operation="eq" value="ha-8"/>
         </rule>
       </rsc_location>
       <rsc_location id="run_group_lim8_Sby1" rsc="group_lim8">
         <rule id="pref_run_group_lim8_Sby1" score="50" boolean_op="and">
           <expression id="exp_pref_run_group_lim8_Sby1"
attribute="#uname" operation="eq" value="ha-9"/>
         </rule>
       </rsc_location>

can be written slightly shorter as:

       <rsc_location id="run_group_lim8" rsc="group_lim8">
         <rule id="pref_run_group_lim8" score="100" boolean_op="and">
           <expression id="exp_pref_run_group_lim8" attribute="#uname"
operation="eq" value="ha-8"/>
         </rule>
         <rule id="pref_run_group_lim8_Sby1" score="50" boolean_op="and">
           <expression id="exp_pref_run_group_lim8_Sby1"
attribute="#uname" operation="eq" value="ha-9"/>
         </rule>
       </rsc_location>

>  *** Thomas
> This communication is confidential and intended solely for the addressee(s). Any unauthorized review, use, disclosure or distribution is prohibited. If you believe this message has been sent to you in error, please notify the sender by replying to this transmission and delete the message without disclosing it. Thank you.
> E-mail including attachments is susceptible to data corruption, interruption, unauthorized amendment, tampering and viruses, and we only send and receive e-mails on the basis that we are not liable for any such corruption, interception, amendment, tampering or viruses or any consequences thereof.
>
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logHa.zip
Type: application/x-zip-compressed
Size: 16709 bytes
Desc: logHa.zip
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20070604/49be4d58/logHa-0001.bin


More information about the Linux-HA mailing list