[Linux-HA] behavior of lrmd/crmd when lrmd process is killed
Andrew Beekhof
beekhof at gmail.com
Fri Jun 27 07:18:48 MDT 2008
Lars, what do you think about having the IPC polling code do a wait()
on the farside PID?
Because heartbeat (which does the wait) seems to be able to notice the
lrmd died but whatever mechanism the IPC code is using doesn't seem
able to.
On Fri, Jun 27, 2008 at 12:51, Junko IKEDA <ikedaj at intellilink.co.jp> wrote:
>> > It might be worth seeing if you can repeat the result with a resource
>> > based on a simple daemon process ( while(1) { sleep(1); } ).
>>
>> A simple daemon process showed the same result as pgsql.
>> See attached;
>>
>> hb_report-simpledaemon/x3650a/ha-log.txt:line 528
>> heartbeat[4019]: 2008/06/27_15:28:04 WARN: Managed
> /usr/lib64/heartbeat/lrmd
>> -r process 4037 killed by signal 9 [SIGKILL - Kill, unblockable].
>>
>> hb_report-simpledaemon/x3650a/ha-log.txt:line 569
>> crmd[4040]: 2008/06/27_15:35:45 CRIT: lrm_connection_destroy: LRM
> Connection
>> failed
>
>
> The other platform can reproduce the same result.
>
> # cat /etc/SuSE-release
> openSUSE 11.0 (i586)
> VERSION = 11.0
>
> # rpm -qa | grep heartbeat
> heartbeat-resources-2.1.3-22.4
> pacemaker-heartbeat-0.6.5-7.1
> heartbeat-2.1.3-22.4
> heartbeat-common-2.1.3-22.4
>
> # pgrep -lf lrmd
> 12789 /usr/lib/heartbeat/lrmd -r
> # kill -9 12789; date
> Fri Jun 27 19:22:48 JST 2008
>
> heartbeat[12778]: 2008/06/27_19:22:48 WARN: Managed /usr/lib/heartbeat/lrmd
> -r process 12789 killed by signal 9 [SIGKILL - Kill, unblockable].
>
> # pgrep -lf simpledaemon
> 12823 /root/tmp/bin/simpledaemon
> # kill -9 12823; date
> Fri Jun 27 19:30:58 JST 2008
>
> heartbeat[12778]: 2008/06/27_19:30:58 debug: Signing client 12792 off
> ccm[12787]: 2008/06/27_19:30:58 info: client (pid=12792) removed from ccm
> crmd[12792]: 2008/06/27_19:30:58 CRIT: lrm_dispatch: LRM Connection failed
More information about the Linux-HA
mailing list