[Linux-HA] extrernal Stonith device...

Alan Robertson alanr at unix.sh
Tue Mar 15 06:36:53 MST 2005


Peter Weiss wrote:
> Hi Gurus....
> 
> Still can't get that Stonith device to work.  Now we want to go in a more
> serious phase of the work.  I'm stuck in the way how to set parameters for
> the external script.  There's some hint that says to set some environment
> variable EXTERNAL, but I found no place where to set it and what values
> (parameters included?).
> 
> Stonith asking for the state seems to work:
> 
> # stonith -t external -S itaibi09
> stonith: external STONITH device device OK.
> INFO: Host itaibi09 external-reset initiating
> Running /etc/stonith.ibmrsa
> Power: On
> State: Booting OS or in unsupported OS
> 
> The last lines of the output is from my script:
> 
> # /etc/ha.d/stonith.ibmrsa -h rsa-itaibi12 -c "power state"
> Running /etc/stonith.ibmrsa
> Power: On
> State: Booting OS or in unsupported OS
> 
> I'll change "power state" to "power reset" so the machine gets really
> stonith'd...
> 
> How do have to setup ha.cf to launch the script with different parameters for
> state determination and a real reset?
> 
> /etc/ha.d/ha.cf:
> stonith_host itaibi09 external /etc/ha.d/stonith.ibmrsa -h rsa-itaibi12 -c "power state"
> stonith_host itaibi12 external /etc/ha.d/stonith.ibmrsa -h rsa-itaibi09 -c "power state"


For release 1.2.x:

It looks pretty strange...

Status always returns OK (no matter what).  I.e., status is meaningless...
poweron/poweroff/reset are all treated the same.

It just runs the same command string you gave it regardless of whether the 
command given to it is:
	power off
	power on
	reset

This means this command had better do a reset (or power off) no matter what...

There is no way for such a STONITH external device to work for more than 
one host.  Hostlist never works for the external plugin.

Now, if your STONITH -p option references a command, that command might 
take options, but a given stonith directive will never give that command 
different options (so you cant have * for a host name in stonith_host)

There is no reference to any environment variables in the code.  The 
comment about $EXTERNAL appears to be incorrect.

If you use stonith_host, then rpc.cfg isn't used by heartbeat.  It will get 
used by the manual stonith command, unless you invoke it with a -p or -f 
option.

> /etc/ha.d/rpc.cfg:
> rsa-itaibi09 /etc/ha.d/stonith.ibmrsa -h rsa-itaibi09 -u USERID -p bmw-linux -c "power state"
> rsa-itaibi12 /etc/ha.d/stonith.ibmrsa -h rsa-itaibi12 -u USERID -p bmw-linux -c "power state"
> 
> Of course stonith -L shows up the external plugin.   From the stonith -h
> output:
> 
> [...]
> STONITH Device: external - EXTERNAL-program based host reset
> Set environment variable $EXTERNAL to the proper reset script.
> 
> Config info [-p] syntax for external:
>         hostname ...
> host names are white-space delimited.
> 
> Config file [-F] syntax for external:
>         hostname...
> host names are white-space delimited.  All host names must be on one line.  Blank lines and lines beginning with # are ignored
> [...]
> 
> Where do I have to set $EXTERNAL ??
> 
> When killing all heartbeat processes on the active node the failover fails to
> stonith. Also these two segfaults at the end bother a little :-(

It doesn't need to stonith if there's been a graceful shutdown.

> [...]
> Mar 15 10:43:31 itaibi12 stonith: external STONITH device device OK.
> Mar 15 10:44:14 itaibi12 heartbeat[8632]: WARN: node itaibi09: is dead
> Mar 15 10:44:14 itaibi12 heartbeat[8632]: info: Link itaibi09:eth0 dead.
> Mar 15 10:44:14 itaibi12 heartbeat[8632]: info: Link itaibi09:eth1 dead.
> Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Link Status update: Link itaibi09/eth0 now has status dead
> Mar 15 10:44:14 itaibi12 ipfail[8643]: debug: Found ping node 10.250.22.104!
> Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Asking other side for ping node count.
> Mar 15 10:44:14 itaibi12 ipfail[8643]: debug: Message [num_ping] sent.
> Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Checking remote count of ping nodes.
> Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Link Status update: Link itaibi09/eth1 now has status dead
> Mar 15 10:44:14 itaibi12 ipfail[8643]: debug: Found ping node 10.250.22.104!
> Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Asking other side for ping node count.
> Mar 15 10:44:14 itaibi12 ipfail[8643]: debug: Message [num_ping] sent.
> Mar 15 10:44:14 itaibi12 ipfail[8643]: info: Checking remote count of ping nodes.
> Mar 15 10:44:14 itaibi12 heartbeat[8676]: info: Resetting node itaibi09 with [external STONITH device]
> Mar 15 10:44:14 itaibi12 heartbeat[8676]: info: Host itaibi09 external-reset initiating
> Mar 15 10:44:14 itaibi12 heartbeat[8676]: ERROR: command '-h rsa-itaibi09 -c "power state"' failed
> Mar 15 10:44:14 itaibi12 heartbeat[8676]: ERROR: Host itaibi09 not reset!
> Mar 15 10:44:14 itaibi12 heartbeat[8632]: WARN: Exiting STONITH itaibi09 process 8676 returned rc 1.
> 
> [...]
> 
> Mar 15 10:44:43 itaibi12 heartbeat: info: Releasing resource group: itaibi09 10.250.22.199 apache::/etc/httpd/httpd.conf
> Mar 15 10:44:43 itaibi12 heartbeat: info: Running /etc/ha.d/resource.d/apache /etc/httpd/httpd.conf stop
> Mar 15 10:44:43 itaibi12 heartbeat: debug: Starting /etc/ha.d/resource.d/apache /etc/httpd/httpd.conf stop
> Mar 15 10:44:44 itaibi12 heartbeat: apache is not running.
> Mar 15 10:44:44 itaibi12 heartbeat: debug: /etc/ha.d/resource.d/apache /etc/httpd/httpd.conf stop done. RC=0
> Mar 15 10:44:44 itaibi12 heartbeat: info: Running /etc/ha.d/resource.d/IPaddr 10.250.22.199 stop
> Mar 15 10:44:44 itaibi12 heartbeat: debug: Starting /etc/ha.d/resource.d/IPaddr 10.250.22.199 stop
> Mar 15 10:44:44 itaibi12 heartbeat: debug: /etc/ha.d/resource.d/IPaddr 10.250.22.199 stop done. RC=0
> Mar 15 10:44:44 itaibi12 heartbeat[8697]: info: killing /usr/lib64/heartbeat/ipfail process group 8643 with signal 15
> Mar 15 10:44:44 itaibi12 heartbeat[8697]: info: All HA resources relinquished.
> Mar 15 10:44:44 itaibi12 heartbeat[8632]: info: killing /usr/lib64/heartbeat/ipfail process group 8643 with signal 15
> Mar 15 10:44:44 itaibi12 kernel: heartbeat[8632]: segfault at 00000000006d6755 rip 0000002a95c2dda0 rsp 0000007fbfffd128 error 4
> Mar 15 10:44:44 itaibi12 kernel: heartbeat[8810]: segfault at 00000000006d6755 rip 0000002a95c2dda0 rsp 0000007fbfffc5d8 error 4


This is bad news.  What kernel is this running on?  What architecture?

Did you get a core dump?


-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions." - William Wilberforce



More information about the Linux-HA mailing list