[Linux-HA] auto_failback off, but the resource group still
dejanmm at fastmail.fm
Fri Jan 11 07:40:22 MST 2008
On Thu, Jan 10, 2008 at 05:29:42PM -0500, Jason Price wrote:
> Wow. Ok, after dealing with many other sundry issues, I've finally gotten
> back to this challenge.
> I did up the logging as you suggested, and dove through that. The clue I
> needed was when the log mentioned what the standard error was from
> 'resource_samba_storage_monitor_0'. For some unknown reason heartbeat calls
> the 'status' keyword of the RC script. That was coded to do some
> useful/human readable stuff, but did NOT conform to the error codes of 0, 1,
> or 7. I never bothered to fix it because no docs indicated that the script
> would call 'status', only 'monitor'.
Yes, they would. Though the v2 mentions only monitor, that's
actually translated to status for the LSB and heartbeat resource
agents. See docs about resource agents at linux-ha.org.
> I've made 'status' identical to 'monitor', and the error went away. One
> issue resolved.
> The only other issue is why the resource group wants to run on node2.
> Jan 9 22:30:11 node1 lrmd: : info: RA output: (resource_samba_storage:stop:stdout) FAILED TO MOUNT FS /viz/maint WITH ERROR attempting umount -f (is SAMBA still running??)
That's really bad: the failed stop operation. If you had stonith,
which you should btw, that node would have been reset. Please
make sure first that your resource agent runs correctly.
> Attached are the debug logs. They may have some extra stuff in them, but
> included is the debug level of syslog.
More information about the Linux-HA