[Linux-HA] extrernal Stonith device...

Peter Weiss Peter.Weiss at ConSol.de
Tue Mar 15 03:18:25 MST 2005


Hi Gurus....

Still can't get that Stonith device to work.  Now we want to go in a more
serious phase of the work.  I'm stuck in the way how to set parameters for
the external script.  There's some hint that says to set some environment
variable EXTERNAL, but I found no place where to set it and what values
(parameters included?).

Stonith asking for the state seems to work:

# stonith -t external -S itaibi09
stonith: external STONITH device device OK.
INFO: Host itaibi09 external-reset initiating
Running /etc/stonith.ibmrsa
Power: On
State: Booting OS or in unsupported OS

The last lines of the output is from my script:

# /etc/ha.d/stonith.ibmrsa -h rsa-itaibi12 -c "power state"
Running /etc/stonith.ibmrsa
Power: On
State: Booting OS or in unsupported OS

I'll change "power state" to "power reset" so the machine gets really
stonith'd...

How do have to setup ha.cf to launch the script with different parameters for
state determination and a real reset?

/etc/ha.d/ha.cf:
stonith_host itaibi09 external /etc/ha.d/stonith.ibmrsa -h rsa-itaibi12 -c "power state"
stonith_host itaibi12 external /etc/ha.d/stonith.ibmrsa -h rsa-itaibi09 -c "power state"

/etc/ha.d/rpc.cfg:
rsa-itaibi09 /etc/ha.d/stonith.ibmrsa -h rsa-itaibi09 -u USERID -p bmw-linux -c "power state"
rsa-itaibi12 /etc/ha.d/stonith.ibmrsa -h rsa-itaibi12 -u USERID -p bmw-linux -c "power state"

Of course stonith -L shows up the external plugin.   From the stonith -h
output:

[...]
STONITH Device: external - EXTERNAL-program based host reset
Set environment variable $EXTERNAL to the proper reset script.

Config info [-p] syntax for external:
        hostname ...
host names are white-space delimited.

Config file [-F] syntax for external:
        hostname...
host names are white-space delimited.  All host names must be on one line.  Blank lines and lines beginning with # are ignored
[...]

Where do I have to set $EXTERNAL ??

When killing all heartbeat processes on the active node the failover fails to
stonith. Also these two segfaults at the end bother a little :-(

[...]
Mar 15 10:43:31 itaibi12 stonith: external STONITH device device OK.
Mar 15 10:44:14 itaibi12 heartbeat[8632]: WARN: node itaibi09: is dead
Mar 15 10:44:14 itaibi12 heartbeat[8632]: info: Link itaibi09:eth0 dead.
Mar 15 10:44:14 itaibi12 heartbeat[8632]: info: Link itaibi09:eth1 dead.
Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Link Status update: Link itaibi09/eth0 now has status dead
Mar 15 10:44:14 itaibi12 ipfail[8643]: debug: Found ping node 10.250.22.104!
Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Asking other side for ping node count.
Mar 15 10:44:14 itaibi12 ipfail[8643]: debug: Message [num_ping] sent.
Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Checking remote count of ping nodes.
Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Link Status update: Link itaibi09/eth1 now has status dead
Mar 15 10:44:14 itaibi12 ipfail[8643]: debug: Found ping node 10.250.22.104!
Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Asking other side for ping node count.
Mar 15 10:44:14 itaibi12 ipfail[8643]: debug: Message [num_ping] sent.
Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Checking remote count of ping nodes.
Mar 15 10:44:14 itaibi12 heartbeat[8676]: info: Resetting node itaibi09 with [external STONITH device]
Mar 15 10:44:14 itaibi12 heartbeat[8676]: info: Host itaibi09 external-reset initiating
Mar 15 10:44:14 itaibi12 heartbeat[8676]: ERROR: command '-h rsa-itaibi09 -c "power state"' failed
Mar 15 10:44:14 itaibi12 heartbeat[8676]: ERROR: Host itaibi09 not reset!
Mar 15 10:44:14 itaibi12 heartbeat[8632]: WARN: Exiting STONITH itaibi09 process 8676 returned rc 1.

[...]

Mar 15 10:44:43 itaibi12 heartbeat: info: Releasing resource group: itaibi09 10.250.22.199 apache::/etc/httpd/httpd.conf
Mar 15 10:44:43 itaibi12 heartbeat: info: Running /etc/ha.d/resource.d/apache /etc/httpd/httpd.conf stop
Mar 15 10:44:43 itaibi12 heartbeat: debug: Starting /etc/ha.d/resource.d/apache /etc/httpd/httpd.conf stop
Mar 15 10:44:44 itaibi12 heartbeat: apache is not running.
Mar 15 10:44:44 itaibi12 heartbeat: debug: /etc/ha.d/resource.d/apache /etc/httpd/httpd.conf stop done. RC=0
Mar 15 10:44:44 itaibi12 heartbeat: info: Running /etc/ha.d/resource.d/IPaddr 10.250.22.199 stop
Mar 15 10:44:44 itaibi12 heartbeat: debug: Starting /etc/ha.d/resource.d/IPaddr 10.250.22.199 stop
Mar 15 10:44:44 itaibi12 heartbeat: debug: /etc/ha.d/resource.d/IPaddr 10.250.22.199 stop done. RC=0
Mar 15 10:44:44 itaibi12 heartbeat[8697]: info: killing /usr/lib64/heartbeat/ipfail process group 8643 with signal 15
Mar 15 10:44:44 itaibi12 heartbeat[8697]: info: All HA resources relinquished.
Mar 15 10:44:44 itaibi12 heartbeat[8632]: info: killing /usr/lib64/heartbeat/ipfail process group 8643 with signal 15
Mar 15 10:44:44 itaibi12 kernel: heartbeat[8632]: segfault at 00000000006d6755 rip 0000002a95c2dda0 rsp 0000007fbfffd128 error 4
Mar 15 10:44:44 itaibi12 kernel: heartbeat[8810]: segfault at 00000000006d6755 rip 0000002a95c2dda0 rsp 0000007fbfffc5d8 error 4
Mar 15 10:59:00 itaibi12 /USR/SBIN/CRON[8843]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly) 

Any help appreciated -- Peter

-- 
BMW Group
Peter Weiss
FZ-33  IT Lösungen - Hardware- und  Netzwerkkomponenten
Telefon: +49-89-382-11538
mailto: Peter.WA.Weiss at Partner.bmw.de




More information about the Linux-HA mailing list