[Linux-HA] ha.cf stonith command question
Chun Tian (binghe)
binghe.lisp at gmail.com
Fri Feb 22 12:15:05 MST 2008
Hi, there
> I have verified that it is being run as root so I am a bit confused.
> I also remember reading that the preferred way of doing command
> expressions now in BASH is not with backticks but with ${} do I
> substituted
>
> IPMITOOL=`which ipmitool 2</dev/null` with
> IPMITOOL="${which ipmitool 2</dev/null}"
Not ${} but $(), like this test:
binghe at binghe-mac:~$ echo $(which ipmitool 2</dev/null)
/usr/bin/ipmitool
binghe at binghe-mac:~$ echo `which ipmitool 2</dev/null`
/usr/bin/ipmitool
>
>
> But that did not do the trick.
>
> I have it working when I hard code the location of the ipmitool so I
> will stick with that for now. After debugging I have determined that
> the problem is that the IPMITOOL variable is not being set by the
> `which ipmitool 2</dev/null` command.
>
> many thanks
>
> Doug
> On Fri, Feb 22, 2008 at 11:14 AM, Dejan Muhamedagic <dejanmm at fastmail.fm
> > wrote:
>> Hi,
>>
>>
>> On Fri, Feb 22, 2008 at 10:39:29AM -0500, Doug Lochart wrote:
>>> After hacking up the /usr/lib64/stonith/plugins/external/ipmi
>>> script
>>> I have discovered what is the problem. In the script the IPMITOOL
>>> shell variable is being set by `which ipmitool 2>/dev/null` command.
>>> This returns the proper location when run by a regular user or by
>>> root. However I just checked and saw that heartbeat is running as
>>> user nobody.
>>
>> There are many processes and some of them do run as nobody most
>> of the time. They all have ability to resume the original (root)
>> user id. At least in this case, the stonithd should run a stonith
>> plugin (ipmi) as root.
>>
>>
>>> I installed heartbeat from an RPM so I do no other
>>> configuration for it because usually all the user accounts are added
>>> for you in an RPM. Is heartbeat supposed to run as user nobody? If
>>> so am I right in my assumption that the 'which' command is not
>>> returning the path to the ipmitool _because_ it is being run by
>>> nobody?
>>
>> Don't think so, but you can try it yourself by inserting sth like:
>>
>> id > /tmp/ipmi-$$.id
>>
>> in the ipmi script. To fully debug you can also try:
>>
>> set -x
>> exec 2>/tmp/ipmi-$$.debug
>>
>> BTW, I just checked some old ha.cf of mine and they all have sth
>> similar to what you used. It must be that the problem is within
>> the script.
>>
>> Thanks,
>>
>> Dejan
>>
>>
>>
>>> I can hardcode the variable in my script but I would rather get it
>>> working the way it is supposed to work. This little issue may
>>> foreshadow other more major ones if I don't get it straight now.
>>>
>>> thanks,
>>>
>>> regards
>>>
>>> Doug
>>>
>>>
>>> On Fri, Feb 22, 2008 at 10:02 AM, Doug Lochart
>>> <dlochart at gmail.com> wrote:
>>>> I tried Dejan's suggestion but I received the same result. Chun
>>>> have
>>>> you had the cluster working and stonith _not_ complaining for a few
>>>> days and _then_ it started to complain about the 256 code? Or
>>>> maybe
>>>> you are just now seeing that stonith is giving this message?
>>>>
>>>> I believe the problem may be how stonith is interpreting the ipmi
>>>> scripts return value.
>>>>
>>>> One thing I have seen in my googling is that there are not many
>>>> examples of people using stonith and ipmi at least not in the non
>>>> crm
>>>> (version 1) way. Can anyone reading this post acknowledge that
>>>> they
>>>> are using it and offer any suggestions.
>>>>
>>>> In case it matters (because it might) I am running on CentOS 5.1
>>>> (fully patched as of last week) on x86_64 system.
>>>>
>>>> thanks and regards,
>>>>
>>>> Doug
>>>>
>>>>
>>>>
>>>> On Fri, Feb 22, 2008 at 8:47 AM, Chun Tian (binghe)
>>>> <binghe.lisp at gmail.com> wrote:
>>>>> Hi, there
>>>>>
>>>>>
>>>>>> I have tested stonith from the command line and was able to
>>>>>> reset the
>>>>>> target PC. On the command I used the following:
>>>>>>
>>>>>> stonith -t external/ipmi -T reset -p "capestor2 10.43.120.134
>>>>>> ADMIN
>>>>>> mypassword" capestor2
>>>>>>
>>>>>> This worked marvelously! So then I move the stuff into the
>>>>>> ha.cf.
>>>>>> Not having much in the way of examples for ipmi this was my
>>>>>> best guess
>>>>>>
>>>>>> stonith_host capestor1 external/ipmi capestor2 10.43.120.134
>>>>>> ADMIN
>>>>>> mypassword
>>>>>>
>>>>>> Syslog gives me this:
>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4133]: info: Checking
>>>>>> status of
>>>>>> STONITH device [IPMI STONITH device ]
>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4133]: info: glib:
>>>>>> external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/
>>>>>> ipmi
>>>>>> status' returned 256
>>>>>
>>>>> I met this too.
>>>>>
>>>>> I guess this calling actually return 0 but SOMETIMES stonith/
>>>>> external
>>>>> thought the return value is 256...
>>>>>
>>>>> I have a running Heartbeat 4-node cluster with stonith enabled,
>>>>> a few
>>>>> days ago, I got a return value 256 when 'ipmi status' be calling.
>>>>>
>>>>>
>>>>>>
>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4133]: ERROR: STONITH
>>>>>> device
>>>>>> IPMI STONITH device not operational!
>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4111]: WARN: Managed
>>>>>> STONITH-stat process 4133 exited with return code 1.
>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4111]: ERROR: STONITH
>>>>>> status
>>>>>> operation failed.
>>>>>> Feb 21 17:20:24 capestor2 heartbeat: [4111]: info: This may
>>>>>> mean that
>>>>>> the STONITH device has failed!
>>>>>>
>>>>>> I even went so far as to copy the ipmi plugin to test-ipmi and
>>>>>> hardcoded the values for the variables that are passed in my
>>>>>> heartbeat. That worked.
>>>>>>
>>>>>> Any ideas as to what I may be doing wrong?
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>> Doug
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What profits a man if he gains the whole world yet loses his
>>>>>> soul?
>>>>>
>>>>>
>>>>>> _______________________________________________
>>>>>> Linux-HA mailing list
>>>>>> Linux-HA at lists.linux-ha.org
>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> Linux-HA at lists.linux-ha.org
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What profits a man if he gains the whole world yet loses his soul?
>>>>
>>>
>>>
>>>
>>> --
>>> What profits a man if he gains the whole world yet loses his soul?
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA at lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
>
> --
> What profits a man if he gains the whole world yet loses his soul?
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA
mailing list