[Linux-HA] behavior of lrmd/crmd when lrmd process is killed

Junko IKEDA ikedaj at intellilink.co.jp
Thu Jun 26 22:48:33 MDT 2008


Hi,

When I checked the following bug using the latest heartbeat-dev and
pacemaker-dev,
http://developerbugs.linux-foundation.org/show_bug.cgi?id=1924

I found the weird behavior.

There are these five resources.

============
Last updated: Fri Jun 27 13:07:11 2008
Current DC: x3650b (db1e4cef-d242-419e-9393-bf5113384744)
2 Nodes configured.
1 Resources configured.
============

Node: x3650a (ce2caf3f-c150-4394-916d-3b4b635394d7): online
Node: x3650b (db1e4cef-d242-419e-9393-bf5113384744): online

Resource Group: grpPostgreSQLDB
    prmFsPostgreSQLDB1  (ocf::heartbeat:Filesystem):    Started x3650a
    prmFsPostgreSQLDB2  (ocf::heartbeat:Filesystem):    Started x3650a
    prmFsPostgreSQLDB3  (ocf::heartbeat:Filesystem):    Started x3650a
    prmIpPostgreSQLDB   (ocf::heartbeat:IPaddr):        Started x3650a
    prmApPostgreSQLDB   (ocf::heartbeat:pgsql): Started x3650a


When "lrmd" is killed, crmd can not notice that event due to (maybe) a
glib's problem.

hb_report-10/x3650a:line 616
heartbeat[24311]: 2008/06/27_12:57:55 WARN: Managed
/usr/lib64/heartbeat/lrmd -r process 24327 killed by signal 9 [SIGKILL -
Kill, unblockable].

but if I stop pgsql like this,

# su - postgres
$ pg_ctl stop
waiting for server to shut down.... done
server stopped

the frozen process is resumed.

hb_report-10/x3650a:line 657
crmd[24330]: 2008/06/27_13:09:36 CRIT: lrm_connection_destroy: LRM
Connection failed

Heartbeat 2.1.3 did the same.
I wonder why the status of Postgres affects this.

Thanks,
Junko
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report-10.tar.gz
Type: application/octet-stream
Size: 74398 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080627/ff095cd8/hb_report-10.tar-0001.obj


More information about the Linux-HA mailing list