[Linux-HA] Scoring system question

Zoltan Boszormenyi zb at cybertec.at
Thu Feb 21 09:18:54 MST 2008


Hi,

we have a problem with automatic IPaddr failback on a system.
There are two nodes, IPaddr is preferred running on the "master" node.
Static score for that is 20. Resource stickiness for IPaddr is 40.
Pingd is set up the same way the documentation mentions, ha.cf has this:

respawn root /usr/lib64/heartbeat/pingd -m 100 -d 5s

Also, the node that loses the network connection to the ping node
gives up its IPaddr, again from the docs:

         <rule id="virt_ip_connected" score_attribute="pingd">
           <expression id="virt_ip_connected_defined" attribute="pingd" 
operation="defined"/>
         </rule>
         <rule id="virt_ip_unconnected" score="-INFINITY" boolean_op="or">
           <expression id="virt_ip_unconnected_undefined" 
attribute="pingd" operation="not_defined"/>
           <expression id="virt_ip_unconnected_zero" attribute="pingd" 
operation="lte" value="0"/>
         </rule>

This would _should_ mean the following scoring matrix and transition flow:
                                       master                           
slave
                                       static   stickiness  pingd   
static   stickiness   pingd
IPaddr not running            20         0            100      0         
0               100

decision is to run IPaddr on master

IPaddr running on master   20         40         100      0         0   
              100

master loses connection

IPaddr running on master   20         40         0           0         
0                 100

IPaddr migrated to slave

IPaddr running on master   20         0           0           0         
40                 100

master restores connection

IPaddr running on master   20         0           100        0         
40                 100

So, at this point, master has 120 points, slave has 140 points.
So, it should stay on the slave. But it doesn't stay, it's migrated
back to master. With trial-and-error, I raised resource_stickiness
to 200 and now it's staying on the slave. But unfortunately only
on my testing setup. On the real machines IPaddr is migrated back
to the slave at both resource_stickiness values.

The machines are running SLES10 SP1, heartbeat package is 2.1.3-0.6
coming from SuSE/Novell. It's a preview package from SLES10 SP2.

Can someone explain it to me?

Best regards,



-- 
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/





More information about the Linux-HA mailing list