[Linux-HA] 4 Node cluster insanity.

John Lange john.lange at open-it.ca
Fri Feb 9 12:28:50 MST 2007


Just a quick follow up to clarify something. If I just reboot a node, it
does not crash the cluster.

If I do a "heartbeat stop" on a node, it causes the active node to
crashes and reboot.

Here are the logs of a "heartbeat stop" on node vs3. At the time, vs1
was up and running just fine and had hundreds of clients mounted via
nfs. Why would stopping heartbeat on vs3 cause vs1 to try and
unmount /data/cameras?

Feb  9 13:22:11 vs3 cib: [4372]: info: cib_diff_notify: Update (client: 4372, call:74): 0.209.9481 -> 0.209.9482 (ok)
Feb  9 13:22:11 vs3 cib: [4372]: info: cib_diff_notify: Update (client: 4372, call:74): 0.209.9481 -> 0.209.9482 (ok)
Feb  9 13:22:11 vs3 cib: [7481]: info: write_cib_contents: Wrote version 0.209.9482 of the CIB to disk (digest: cffb6c3f9fcbf6078e77ddfa9b5ab0b1)
Feb  9 13:22:11 vs3 cib: [7481]: info: write_cib_contents: Wrote version 0.209.9482 of the CIB to disk (digest: cffb6c3f9fcbf6078e77ddfa9b5ab0b1)
Feb  9 13:22:12 vs1 lrmd: [4369]: info: RA output: (imagestoreclone:3:stop:stderr) umount: /data/cameras: device is busy umount: /data/cameras: device is busy 
Feb  9 13:22:12 vs1 Filesystem[16513]: [16585]: ERROR: Couldn't unmount /data/cameras; trying cleanup with SIGKILL
Feb  9 13:22:12 vs1 Filesystem[16513]: [16587]: INFO: No processes on /data/cameras were signalled
Feb  9 13:22:13 vs1 lrmd: [4369]: info: RA output: (imagestoreclone:3:stop:stderr) umount: /data/cameras: device is busy umount: /data/cameras: device is busy 
Feb  9 13:22:13 vs1 Filesystem[16513]: [16590]: ERROR: Couldn't unmount /data/cameras; trying cleanup with SIGKILL
Feb  9 13:22:13 vs1 Filesystem[16513]: [16592]: INFO: No processes on /data/cameras were signalled
Feb  9 13:22:14 vs1 Filesystem[16513]: [16594]: ERROR: Couldn't unmount /data/cameras, giving up!

John

On Fri, 2007-02-09 at 13:16 -0600, John Lange wrote:

[....]




More information about the Linux-HA mailing list