[Linux-HA] ha.cf stonith command question

Chun Tian (binghe) binghe.lisp at gmail.com
Fri Feb 22 21:42:17 MST 2008


Hi, Doug

> Did you change this yourself to get it working or did your
> distro/build include the scripts with $() instead of `` ?

No. I didn't change this, and I don't think it's the key.

Anyway, I should modify external/ipmi to enable logging, and try to  
find the REAL return value when stonith got `256'. At present, I think  
this is a bug of stonith and I'll try to prove this theory.

Regards,

Chun TIAN (binghe)
NetEase.com, Inc.

>
>
> Yeah my eyes are going bad.  I am so used to seeing ${...}  It sucks
> getting old :)
>
> thanks,
>
> Doug
>
> On Fri, Feb 22, 2008 at 2:15 PM, Chun Tian (binghe)
> <binghe.lisp at gmail.com> wrote:
>> Hi, there
>>
>>
>>> I have verified that it is being run as root so I am a bit confused.
>>> I also remember reading that the preferred way of doing command
>>> expressions now in BASH is not with backticks but with ${} do I
>>> substituted
>>>
>>> IPMITOOL=`which ipmitool 2</dev/null`   with
>>> IPMITOOL="${which ipmitool 2</dev/null}"
>>
>> Not ${} but $(), like this test:
>>
>> binghe at binghe-mac:~$ echo $(which ipmitool 2</dev/null)
>> /usr/bin/ipmitool
>> binghe at binghe-mac:~$ echo `which ipmitool 2</dev/null`
>> /usr/bin/ipmitool
>>
>>
>>
>>>
>>>
>>> But that did not do the trick.
>>>
>>> I have it working when I hard code the location of the ipmitool so I
>>> will stick with that for now.  After debugging I have determined  
>>> that
>>> the problem is that the IPMITOOL variable is not being set by the
>>> `which ipmitool 2</dev/null`  command.
>>>
>>> many thanks
>>>
>>> Doug
>>> On Fri, Feb 22, 2008 at 11:14 AM, Dejan Muhamedagic <dejanmm at fastmail.fm
>>>> wrote:
>>>> Hi,
>>>>
>>>>
>>>> On Fri, Feb 22, 2008 at 10:39:29AM -0500, Doug Lochart wrote:
>>>>> After hacking up the  /usr/lib64/stonith/plugins/external/ipmi
>>>>> script
>>>>> I have discovered what is the problem.  In the script the IPMITOOL
>>>>> shell variable is being set by `which ipmitool 2>/dev/null`  
>>>>> command.
>>>>> This returns the proper location when run by a regular user or by
>>>>> root.  However I just checked and saw that heartbeat is running as
>>>>> user nobody.
>>>>
>>>> There are many processes and some of them do run as nobody most
>>>> of the time. They all have ability to resume the original (root)
>>>> user id. At least in this case, the stonithd should run a stonith
>>>> plugin (ipmi) as root.
>>>>
>>>>
>>>>> I installed heartbeat from an RPM so I do no other
>>>>> configuration for it because usually all the user accounts are  
>>>>> added
>>>>> for you in an RPM.  Is heartbeat supposed to run as user  
>>>>> nobody?  If
>>>>> so am I right in my assumption that the 'which' command is not
>>>>> returning the path to the ipmitool _because_ it is being run by
>>>>> nobody?
>>>>
>>>> Don't think so, but you can try it yourself by inserting sth like:
>>>>
>>>> id > /tmp/ipmi-$$.id
>>>>
>>>> in the ipmi script. To fully debug you can also try:
>>>>
>>>> set -x
>>>> exec 2>/tmp/ipmi-$$.debug
>>>>
>>>> BTW, I just checked some old ha.cf of mine and they all have sth
>>>> similar to what you used. It must be that the problem is within
>>>> the script.
>>>>
>>>> Thanks,
>>>>
>>>> Dejan
>>>>
>>>>
>>>>
>>>>> I can hardcode the variable in my script but I would rather get it
>>>>> working the way it is supposed to work.  This little issue may
>>>>> foreshadow other more major ones if I don't get it straight now.
>>>>>
>>>>> thanks,
>>>>>
>>>>> regards
>>>>>
>>>>> Doug
>>>>>
>>>>>
>>>>> On Fri, Feb 22, 2008 at 10:02 AM, Doug Lochart
>>>>> <dlochart at gmail.com> wrote:
>>>>>> I tried Dejan's suggestion but I received the same result.  Chun
>>>>>> have
>>>>>> you had the cluster working and stonith _not_ complaining for a  
>>>>>> few
>>>>>> days and _then_ it started to complain about the 256 code?  Or
>>>>>> maybe
>>>>>> you are just now seeing that stonith is giving this message?
>>>>>>
>>>>>> I believe the problem may be how stonith is interpreting the ipmi
>>>>>> scripts return value.
>>>>>>
>>>>>> One thing I have seen in my googling is that there are not many
>>>>>> examples of people using stonith and ipmi at least not in the non
>>>>>> crm
>>>>>> (version 1) way.  Can anyone reading this post acknowledge that
>>>>>> they
>>>>>> are using it and offer any suggestions.
>>>>>>
>>>>>> In case it matters (because it might) I am running on CentOS 5.1
>>>>>> (fully patched as of last week) on x86_64 system.
>>>>>>
>>>>>> thanks and regards,
>>>>>>
>>>>>> Doug
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 22, 2008 at 8:47 AM, Chun Tian (binghe)
>>>>>> <binghe.lisp at gmail.com> wrote:
>>>>>>> Hi, there
>>>>>>>
>>>>>>>
>>>>>>>> I have tested stonith from the command line and was able to
>>>>>>>> reset the
>>>>>>>> target PC.  On the command I used the following:
>>>>>>>>
>>>>>>>> stonith  -t external/ipmi -T reset -p "capestor2 10.43.120.134
>>>>>>>> ADMIN
>>>>>>>> mypassword" capestor2
>>>>>>>>
>>>>>>>> This worked marvelously!  So then I move the stuff into the
>>>>>>>> ha.cf.
>>>>>>>> Not having much in the way of examples for ipmi this was my
>>>>>>>> best guess
>>>>>>>>
>>>>>>>> stonith_host capestor1 external/ipmi capestor2 10.43.120.134
>>>>>>>> ADMIN
>>>>>>>> mypassword
>>>>>>>>
>>>>>>>> Syslog gives me this:
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4133]: info: Checking
>>>>>>>> status of
>>>>>>>> STONITH device [IPMI STONITH device ]
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4133]: info: glib:
>>>>>>>> external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/
>>>>>>>> ipmi
>>>>>>>> status' returned 256
>>>>>>>
>>>>>>> I met this too.
>>>>>>>
>>>>>>> I guess this calling actually return 0 but SOMETIMES stonith/
>>>>>>> external
>>>>>>> thought the return value is 256...
>>>>>>>
>>>>>>> I have a running Heartbeat 4-node cluster with stonith enabled,
>>>>>>> a few
>>>>>>> days ago, I got a return value 256 when 'ipmi status' be  
>>>>>>> calling.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4133]: ERROR: STONITH
>>>>>>>> device
>>>>>>>> IPMI STONITH device  not operational!
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4111]: WARN: Managed
>>>>>>>> STONITH-stat process 4133 exited with return code 1.
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4111]: ERROR: STONITH
>>>>>>>> status
>>>>>>>> operation failed.
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4111]: info: This may
>>>>>>>> mean that
>>>>>>>> the STONITH device has failed!
>>>>>>>>
>>>>>>>> I even went so far as to copy the ipmi plugin to test-ipmi and
>>>>>>>> hardcoded the values for the variables that are passed in my
>>>>>>>> heartbeat.  That worked.
>>>>>>>>
>>>>>>>> Any ideas as to what I may be doing wrong?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>> Doug
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> What profits a man if he gains the whole world yet loses his
>>>>>>>> soul?
>>>>>>>
>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Linux-HA mailing list
>>>>>>>> Linux-HA at lists.linux-ha.org
>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Linux-HA mailing list
>>>>>>> Linux-HA at lists.linux-ha.org
>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What profits a man if he gains the whole world yet loses his  
>>>>>> soul?
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What profits a man if he gains the whole world yet loses his soul?
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> Linux-HA at lists.linux-ha.org
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA at lists.linux-ha.org
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>
>>>
>>>
>>>
>>> --
>>> What profits a man if he gains the whole world yet loses his soul?
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA at lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
>
> -- 
> What profits a man if he gains the whole world yet loses his soul?
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list