[Linux-HA] behavior of lrmd/crmd when lrmd process is killed
Junko IKEDA
ikedaj at intellilink.co.jp
Sun Jun 29 23:42:09 MDT 2008
> Lars, what do you think about having the IPC polling code do a wait()
> on the farside PID?
> Because heartbeat (which does the wait) seems to be able to notice the
> lrmd died but whatever mechanism the IPC code is using doesn't seem
> able to.
By the way, this behavior would be generated if a process is spawned on the
node which isn't DC.
If the process is running on DC, the disconnection of lrmd would be
dispatched successfully.
* DC
hb_report-2/opensuse11-b/ha-log:line 1275
heartbeat[7634]: 2008/06/30_14:13:58 WARN: Managed /usr/lib/heartbeat/lrmd
-r process 7644 killed by signal 9 [SIGKILL - Kill, unblockable].
crmd[7647]: 2008/06/30_14:13:58 CRIT: lrm_dispatch: LRM Connection failed
* not DC
hb_report-2/opensuse11-a/ha-log:line 776
heartbeat[5644]: 2008/06/30_14:17:59 WARN: Managed /usr/lib/heartbeat/lrmd
-r process 5654 killed by signal 9 [SIGKILL - Kill, unblockable].
Thanks,
Junko
>
> On Fri, Jun 27, 2008 at 12:51, Junko IKEDA <ikedaj at intellilink.co.jp>
wrote:
> >> > It might be worth seeing if you can repeat the result with a resource
> >> > based on a simple daemon process ( while(1) { sleep(1); } ).
> >>
> >> A simple daemon process showed the same result as pgsql.
> >> See attached;
> >>
> >> hb_report-simpledaemon/x3650a/ha-log.txt:line 528
> >> heartbeat[4019]: 2008/06/27_15:28:04 WARN: Managed
> > /usr/lib64/heartbeat/lrmd
> >> -r process 4037 killed by signal 9 [SIGKILL - Kill, unblockable].
> >>
> >> hb_report-simpledaemon/x3650a/ha-log.txt:line 569
> >> crmd[4040]: 2008/06/27_15:35:45 CRIT: lrm_connection_destroy: LRM
> > Connection
> >> failed
> >
> >
> > The other platform can reproduce the same result.
> >
> > # cat /etc/SuSE-release
> > openSUSE 11.0 (i586)
> > VERSION = 11.0
> >
> > # rpm -qa | grep heartbeat
> > heartbeat-resources-2.1.3-22.4
> > pacemaker-heartbeat-0.6.5-7.1
> > heartbeat-2.1.3-22.4
> > heartbeat-common-2.1.3-22.4
> >
> > # pgrep -lf lrmd
> > 12789 /usr/lib/heartbeat/lrmd -r
> > # kill -9 12789; date
> > Fri Jun 27 19:22:48 JST 2008
> >
> > heartbeat[12778]: 2008/06/27_19:22:48 WARN: Managed
> /usr/lib/heartbeat/lrmd
> > -r process 12789 killed by signal 9 [SIGKILL - Kill, unblockable].
> >
> > # pgrep -lf simpledaemon
> > 12823 /root/tmp/bin/simpledaemon
> > # kill -9 12823; date
> > Fri Jun 27 19:30:58 JST 2008
> >
> > heartbeat[12778]: 2008/06/27_19:30:58 debug: Signing client 12792 off
> > ccm[12787]: 2008/06/27_19:30:58 info: client (pid=12792) removed from
ccm
> > crmd[12792]: 2008/06/27_19:30:58 CRIT: lrm_dispatch: LRM Connection
failed
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report-2.tar.gz
Type: application/octet-stream
Size: 92002 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080630/efa23fb5/hb_report-2.tar-0001.obj
More information about the Linux-HA
mailing list