[Linux-HA] ipfail in V2

Alan Robertson alanr at unix.sh
Fri Oct 21 07:46:20 MDT 2005

David Lang wrote:
> On Thu, 20 Oct 2005, Andrew Beekhof wrote:
>>> You MUST know what everyone else's values are - before they update the
>>> CRM.
>> for the perfect algorithm, yes.  but i never said this was trying to be.
> the problem is that in most cases failing over is a fairly tramatic 
> event. in some cases (stateless packet filtering firewall for example) 
> the outage durign the failover may be fast enough that you don't care 
> about it, but as your boxes start to actually do more things a failover 
> event causes more grief.
> also the tendancy is that as your box is doing more things the process 
> of failing over also takes longer, causing even more impact.
> and heaven help you if you have stonith configured, you have boxes 
> actually power down and require manual intervention to get them back up.

FYI: this would never cause a STONITH (nor do ping nodes in R1).

> this is a case where a poor implementation is in most cases worse then 
> no implementation.
> This being said I want to raise one additional possibility for people to 
> consider as a future enhancement.
> once there is the ability to have health values that get propogated 
> around then the possibility arises to do even fancier things when you 
> have multiple resources.
> for example you could put the 5 min loadave into the CIB and have 20 
> resources on 3 machines and tell the system that when the difference in 
> the load gets to be >x migrate some resources away from the heavily 
> loaded machine
> now doing this sort of thing will require changes to the CIB from what I 
> am reading, but the concept is very powerful and adding it is probably 
> worth the effort involved.

I think this should be possible as soon as we get our ducks in a row 
here.  It's what we're aiming out.

> If I am understanding things properly, currently the CIB has a health 
> value that is acted on immediatly.

That's not quite the case.  There are attribute values -- which programs 
can easily change -- which have meanings only determined by the rules 
you wrote.  And there are as many of those as your heart desires.

> this would require a second health value with the following 
> characteristics (or possibly allow for an arbatrary number of such values)
> 1. like the normal health value it needs to be updated regularly and if 
> not updated for a sufficiant time period needs to be considered to be bad.
> 2. there needs to be a configurable delay before acting on a difference 
> in the value
> 3. there needs to be a configurable delta that's acceptable (i.e. in 
> some cases health values of 78 from one machine and 79 from another 
> should not trigger an action)
> 4. the action to be taken when a difference exceeds the threashold needs 
> to be able to be specified (either a built-in function like 'fail the 
> node' or an external script gets run)
> 5. after an action takes place it should be possible to raise a flag 
> that will prevent further actions for a configurable time period (in my 
> example above, you move resources off a loaded machine, now you need to 
> let the loadave settle again before you decide if you need to move more 
> off)
> note that with a value of 0 for #2, 1 for #3, 'fail the node' for #4 and 
> 'no delay' for #5 this degenerates down to the existing health value 
> (which may be the way to go for it instead of implementing two 
> completely different types of things)

This isn't quite accurate.  Let me tell you what we _are_ looking at, 
and you can see if this will do the trick for you...  I think it's 
mostly more than what you're talking about, but in some ways it's less.

You can have a fc_health attribute, which you set to values with your 
program.  And, you can write rules that give weights to various nodes 
depending on this attribute value for the nodes.  And, those rules 
should only affect the placement of resources which need the fiber 
channel connection.

I would typically recommend that the fc_health attribute have a very 
small number of values - like -INFINITY, 0, 10" or "red", "yellow", 
"green".  The rules to use it change depending on what values you choose.

You can have another attribute called "ei_ping" for external Internet 
health which you set the number of ping nodes scaled by 10 (for 
example).  So, if you have no nodes visible, it gets 0, if you have one 
node visible, you get 10, if two, you get 20, etc.  You can write a rule 
which uses this attribute value as the weight for where to run your web 
server (for example).

If you need both the FC AND the web server, you have to scale the values 
so that the "most important" criteria wins when they're summed.

Or you can do something more custom, like if FC == TRUE and ping count > 
2, give it a weight of 1000.  If FC == TRUE and ping count == 1 give it 
a weight of 10.  If FC == FALSE OR ping count ==0 give it a weight of 
-INFINITY.   What you can do here is amazingly flexible - even for very 
simple programs doing the measurement.

If you want to write more complicated health measures, and are willing 
to write something as complex as ipfail (< 1K lines), there is virtually 
no limit on what you can do.

Each node would run a monitor process which updates the CIB when these 
(coarse-granularity) values changes.  When it does this, it gives the 
CIB a delay (we call it a hysteresis value) which tells it to wait a 
while before notifying the CRM that this value has changed.  This gives 
other nodes a chance to observe the same thing and keeps resources from 
being bounced around.

But, if you want this to work, and not constantly move things around, 
you need to be careful in how you choose to scale and modify these 
values - and in choosing the hysteresis interval.

     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 

More information about the Linux-HA mailing list