Cluster manager structure for Linux-HA (?)

Harald Milz hm at
Sat Mar 21 04:14:00 MST 1998

alanr at wrote:
> But the ability to monitor "things", detect non-working facilities and
> services, invoke recovery actions (and in a future release) handle dependencies
> seems like a good place to start, rather than re-inventing the wheel.

This is why I proposed to unify SCSI adapter error reporting in the first
place. Michael Neuffer said he wanted to implment this during the re-write
of the SCSI mid layer code but we aren't very likely to see this before the
2.3 kernels I am afraid. 

Alan, your right with your proposal. What we need is a generic, unified
error reporting mechanism in the kernel itself, like AIX has it... so that
device drivers can report error codes i.a.w. a error code table which in
turn could be easily parsed by the cluster manager. One _could_ also set up
syslogd (together with klogd) to write to a pipe and let the cluster
manager read from that pipe but if a device driver developer decides to
slightly modify his error strings you get a hard time regexp parsing these
strings :-( 

I don't know mon but does it work different from this approach?

As far as HA is specifically concerned, what we need is asynchronous error
detection to make sure errors are noticed instantly. The "voting" part
would be a functionality within the CM daemon. 

