I stumbled across the linux-ha site yesterday, and I thought I might
send an email to the list regarding a related project I'm involved in.

It's 'Lm_sensors'.  What is does is provide access (via some kernel
drivers) to 'hardware health monitoring hardware' which does things
like monitor fan speeds, temperatures, power supply voltages, case
intrusion sensors, etc. 

By having access to these things, you can help prevent a failure
before it happens.  For example, if the CPU fan stops or slows way
down, you can use a can of air to blow out the dust bunny which
stopped the fan *before* the CPU fries.

There are various daemons, X apps, and web interfaces out there for
Lm_sensors, depending on your needs.

I helped start the project to help me keep a mission-critical server
up at work.  Unfortunately we couldn't afford a redundant server to
'roll over to' in the event of a distaster, so we settled for a sort
of 'home build' (easy to replace parts) with an outboard raid, UPS,
and hardware health monitoring (oh! and, Linux of course ;'). 

I really like the idea of roll-over servers using something like
'heart-beat' but I know that there are some applications which make
this difficult (like firewall machines, routers, dialup servers, VoIP
servers, etc.).  In any case, with or without a roll-over mechanism,
some hardware health monitoring is an extra insurance policy (and free
in several respects).

