[Linux-HA] ha.cf stonith command question
Chun Tian (binghe)
binghe.lisp at gmail.com
Fri Feb 22 21:42:17 MST 2008
Hi, Doug
> Did you change this yourself to get it working or did your
> distro/build include the scripts with $() instead of `` ?
No. I didn't change this, and I don't think it's the key.
Anyway, I should modify external/ipmi to enable logging, and try to
find the REAL return value when stonith got `256'. At present, I think
this is a bug of stonith and I'll try to prove this theory.
Regards,
Chun TIAN (binghe)
NetEase.com, Inc.
>
>
> Yeah my eyes are going bad. I am so used to seeing ${...} It sucks
> getting old :)
>
> thanks,
>
> Doug
>
> On Fri, Feb 22, 2008 at 2:15 PM, Chun Tian (binghe)
> <binghe.lisp at gmail.com> wrote:
>> Hi, there
>>
>>
>>> I have verified that it is being run as root so I am a bit confused.
>>> I also remember reading that the preferred way of doing command
>>> expressions now in BASH is not with backticks but with ${} do I
>>> substituted
>>>
>>> IPMITOOL=`which ipmitool 2</dev/null` with
>>> IPMITOOL="${which ipmitool 2</dev/null}"
>>
>> Not ${} but $(), like this test:
>>
>> binghe at binghe-mac:~$ echo $(which ipmitool 2</dev/null)
>> /usr/bin/ipmitool
>> binghe at binghe-mac:~$ echo `which ipmitool 2</dev/null`
>> /usr/bin/ipmitool
>>
>>
>>
>>>
>>>
>>> But that did not do the trick.
>>>
>>> I have it working when I hard code the location of the ipmitool so I
>>> will stick with that for now. After debugging I have determined
>>> that
>>> the problem is that the IPMITOOL variable is not being set by the
>>> `which ipmitool 2</dev/null` command.
>>>
>>> many thanks
>>>
>>> Doug
>>> On Fri, Feb 22, 2008 at 11:14 AM, Dejan Muhamedagic <dejanmm at fastmail.fm
>>>> wrote:
>>>> Hi,
>>>>
>>>>
>>>> On Fri, Feb 22, 2008 at 10:39:29AM -0500, Doug Lochart wrote:
>>>>> After hacking up the /usr/lib64/stonith/plugins/external/ipmi
>>>>> script
>>>>> I have discovered what is the problem. In the script the IPMITOOL
>>>>> shell variable is being set by `which ipmitool 2>/dev/null`
>>>>> command.
>>>>> This returns the proper location when run by a regular user or by
>>>>> root. However I just checked and saw that heartbeat is running as
>>>>> user nobody.
>>>>
>>>> There are many processes and some of them do run as nobody most
>>>> of the time. They all have ability to resume the original (root)
>>>> user id. At least in this case, the stonithd should run a stonith
>>>> plugin (ipmi) as root.
>>>>
>>>>
>>>>> I installed heartbeat from an RPM so I do no other
>>>>> configuration for it because usually all the user accounts are
>>>>> added
>>>>> for you in an RPM. Is heartbeat supposed to run as user
>>>>> nobody? If
>>>>> so am I right in my assumption that the 'which' command is not
>>>>> returning the path to the ipmitool _because_ it is being run by
>>>>> nobody?
>>>>
>>>> Don't think so, but you can try it yourself by inserting sth like:
>>>>
>>>> id > /tmp/ipmi-$$.id
>>>>
>>>> in the ipmi script. To fully debug you can also try:
>>>>
>>>> set -x
>>>> exec 2>/tmp/ipmi-$$.debug
>>>>
>>>> BTW, I just checked some old ha.cf of mine and they all have sth
>>>> similar to what you used. It must be that the problem is within
>>>> the script.
>>>>
>>>> Thanks,
>>>>
>>>> Dejan
>>>>
>>>>
>>>>
>>>>> I can hardcode the variable in my script but I would rather get it
>>>>> working the way it is supposed to work. This little issue may
>>>>> foreshadow other more major ones if I don't get it straight now.
>>>>>
>>>>> thanks,
>>>>>
>>>>> regards
>>>>>
>>>>> Doug
>>>>>
>>>>>
>>>>> On Fri, Feb 22, 2008 at 10:02 AM, Doug Lochart
>>>>> <dlochart at gmail.com> wrote:
>>>>>> I tried Dejan's suggestion but I received the same result. Chun
>>>>>> have
>>>>>> you had the cluster working and stonith _not_ complaining for a
>>>>>> few
>>>>>> days and _then_ it started to complain about the 256 code? Or
>>>>>> maybe
>>>>>> you are just now seeing that stonith is giving this message?
>>>>>>
>>>>>> I believe the problem may be how stonith is interpreting the ipmi
>>>>>> scripts return value.
>>>>>>
>>>>>> One thing I have seen in my googling is that there are not many
>>>>>> examples of people using stonith and ipmi at least not in the non
>>>>>> crm
>>>>>> (version 1) way. Can anyone reading this post acknowledge that
>>>>>> they
>>>>>> are using it and offer any suggestions.
>>>>>>
>>>>>> In case it matters (because it might) I am running on CentOS 5.1
>>>>>> (fully patched as of last week) on x86_64 system.
>>>>>>
>>>>>> thanks and regards,
>>>>>>
>>>>>> Doug
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 22, 2008 at 8:47 AM, Chun Tian (binghe)
>>>>>> <binghe.lisp at gmail.com> wrote:
>>>>>>> Hi, there
>>>>>>>
>>>>>>>
>>>>>>>> I have tested stonith from the command line and was able to
>>>>>>>> reset the
>>>>>>>> target PC. On the command I used the following:
>>>>>>>>
>>>>>>>> stonith -t external/ipmi -T reset -p "capestor2 10.43.120.134
>>>>>>>> ADMIN
>>>>>>>> mypassword" capestor2
>>>>>>>>
>>>>>>>> This worked marvelously! So then I move the stuff into the
>>>>>>>> ha.cf.
>>>>>>>> Not having much in the way of examples for ipmi this was my
>>>>>>>> best guess
>>>>>>>>
>>>>>>>> stonith_host capestor1 external/ipmi capestor2 10.43.120.134
>>>>>>>> ADMIN
>>>>>>>> mypassword
>>>>>>>>
>>>>>>>> Syslog gives me this:
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4133]: info: Checking
>>>>>>>> status of
>>>>>>>> STONITH device [IPMI STONITH device ]
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4133]: info: glib:
>>>>>>>> external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/
>>>>>>>> ipmi
>>>>>>>> status' returned 256
>>>>>>>
>>>>>>> I met this too.
>>>>>>>
>>>>>>> I guess this calling actually return 0 but SOMETIMES stonith/
>>>>>>> external
>>>>>>> thought the return value is 256...
>>>>>>>
>>>>>>> I have a running Heartbeat 4-node cluster with stonith enabled,
>>>>>>> a few
>>>>>>> days ago, I got a return value 256 when 'ipmi status' be
>>>>>>> calling.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4133]: ERROR: STONITH
>>>>>>>> device
>>>>>>>> IPMI STONITH device not operational!
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4111]: WARN: Managed
>>>>>>>> STONITH-stat process 4133 exited with return code 1.
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4111]: ERROR: STONITH
>>>>>>>> status
>>>>>>>> operation failed.
>>>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4111]: info: This may
>>>>>>>> mean that
>>>>>>>> the STONITH device has failed!
>>>>>>>>
>>>>>>>> I even went so far as to copy the ipmi plugin to test-ipmi and
>>>>>>>> hardcoded the values for the variables that are passed in my
>>>>>>>> heartbeat. That worked.
>>>>>>>>
>>>>>>>> Any ideas as to what I may be doing wrong?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>> Doug
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> What profits a man if he gains the whole world yet loses his
>>>>>>>> soul?
>>>>>>>
>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Linux-HA mailing list
>>>>>>>> Linux-HA at lists.linux-ha.org
>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Linux-HA mailing list
>>>>>>> Linux-HA at lists.linux-ha.org
>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What profits a man if he gains the whole world yet loses his
>>>>>> soul?
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What profits a man if he gains the whole world yet loses his soul?
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> Linux-HA at lists.linux-ha.org
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA at lists.linux-ha.org
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>
>>>
>>>
>>>
>>> --
>>> What profits a man if he gains the whole world yet loses his soul?
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA at lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
>
> --
> What profits a man if he gains the whole world yet loses his soul?
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA
mailing list