[Linux-HA] failover test and behavior

FG fabrice.grelaud at u-bordeaux1.fr
Thu Sep 6 10:47:16 MDT 2007


Hi,

I use heartbeat 2.1.1 in an active/passive configuration.

I'am testing differents failover and need some explanations:

My node are castor (active) and pollux (standby).

I'm testing the process failover with monitoring. My configuration use
default_stickiness = "200" and default_failure_stickiness ="-200" and as
constraint rsc_location castor with a score of "200".
With these options, i can have 5 process failures before all services
can failover to castor.

It goes as a charm... :-)

The score on castor  decrease from 1000 (4 resources x 200 +
score_constraint 200) to 0 and with the sixth failure, failover.
The scores after failover are: castor (-1000) and pollux (800).
[root at castor crm]# ptest -L -VVVVVVVVVVVVVVVVVVVVV 2>&1|grep assign
ptest[31985]: 2007/09/06_15:57:25 debug: debug5: do_calculations: assign
nodes to colors
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
IPaddr_147_210_36_7, Node[0] pollux: 800
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
IPaddr_147_210_36_7, Node[1] castor: -1000
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning
pollux to IPaddr_147_210_36_7
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
Filesystem_2, Node[0] pollux: 1000000
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
Filesystem_2, Node[1] castor: -1000000
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning
pollux to Filesystem_2
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
cyrus-imapd_3, Node[0] pollux: 1000000
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
cyrus-imapd_3, Node[1] castor: -1000000
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning
pollux to cyrus-imapd_3
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
saslauthd_4, Node[0] pollux: 1000000
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
saslauthd_4, Node[1] castor: -1000000
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning
pollux to saslauthd_4
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
pingd-child:0, Node[0] castor: 1
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
pingd-child:0, Node[1] pollux: 0
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning
castor to pingd-child:0
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
pingd-child:1, Node[0] pollux: 1
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color
pingd-child:1, Node[1] castor: -1000000
ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning
pollux to pingd-child:1

Now to test, I unplug the network card on pollux. I thought then  to
have a new failover to the first node (castor) but nothing...
So i watch my score and my log

[root at castor crm]# ptest -L -VVVVVVVVVVVVVVVVVVVVV 2>&1|grep assign
ptest[32467]: 2007/09/06_16:17:11 debug: debug5: do_calculations: assign
nodes to colors
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
IPaddr_147_210_36_7, Node[0] castor: -1000
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
IPaddr_147_210_36_7, Node[1] pollux: -1000000
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: All nodes
for resource IPaddr_147_210_36_7 are unavailable, unclean or shutting down
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
Filesystem_2, Node[0] castor: -1000000
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
Filesystem_2, Node[1] pollux: -1000000
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: All nodes
for resource Filesystem_2 are unavailable, unclean or shutting down
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
cyrus-imapd_3, Node[0] castor: -1000000
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
cyrus-imapd_3, Node[1] pollux: -1000000
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: All nodes
for resource cyrus-imapd_3 are unavailable, unclean or shutting down
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
saslauthd_4, Node[0] castor: -1000000
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
saslauthd_4, Node[1] pollux: -1000000
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: All nodes
for resource saslauthd_4 are unavailable, unclean or shutting down
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
pingd-child:0, Node[0] castor: 1
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
pingd-child:0, Node[1] pollux: 0
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Assigning
castor to pingd-child:0
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
pingd-child:1, Node[0] pollux: 1
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color
pingd-child:1, Node[1] castor: -1000000
ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Assigning
pollux to pingd-child:1

pengine[20890]: 2007/09/06_16:00:23 WARN: native_color: Resource
IPaddr_147_210_36_7 cannot run anywhere
pengine[20890]: 2007/09/06_16:00:23 WARN: native_color: Resource
Filesystem_2 cannot run anywhere
pengine[20890]: 2007/09/06_16:00:23 WARN: native_color: Resource
cyrus-imapd_3 cannot run anywhere
pengine[20890]: 2007/09/06_16:00:23 WARN: native_color: Resource
saslauthd_4 cannot run anywhere

Could someone explain me what's happening ? Is that split-brain ???
Because of pingd failed,and my rule to score="-INFINITY", i think scores
on pollux are logics, aren't it ? And finally we have the same score for
resources on the two nodes 
How can i avoid this behavior ?
/
/I attach my settings (cibadmin -Q in a normal state), would you please
help to verify it ?

Thanks, regards

Fabrice



More information about the Linux-HA mailing list