[Linux-HA] crm_mon confused?

Ian Turner vectro at vectro.org
Thu Nov 1 12:44:08 MDT 2007


List,

I seem to have found the issue behind the mysteriously curt output of 
cibadmin -Ql and crm_mon -r1 on one of my two nodes. For some reason the 
problematic machine isn't able to retrieve an updated version of the cib.

From Anthony (the good node):
Nov  1 11:34:23 anthony heartbeat: [2785]: info: Retransmitting pkt 98802
Nov  1 11:34:23 anthony heartbeat: [2785]: info: msg size =3773, type=cib
Nov  1 11:34:23 anthony heartbeat: [2785]: debug: rexmit request from node 
brutus for msg(98802-98802)
Nov  1 11:34:24 anthony last message repeated 4 times
Nov  1 11:34:24 anthony heartbeat: [2785]: info: Retransmitting pkt 98802
Nov  1 11:34:24 anthony heartbeat: [2785]: info: msg size =3773, type=cib
Nov  1 11:34:24 anthony heartbeat: [2785]: debug: rexmit request from node 
brutus for msg(98802-98802)

This goes on, and on, and on.

Running cibadmin -Ss on Brutus (the problematic node) sometimes hangs, 
sometimes not. Killing the cib process on brutus seems to have no effect.

There are no firewalls installed (iptables module is not even present), the 
machines are on a common Ethernet, and ifconfig reports no receive or 
transmit errors. Network has zero packet loss and excellent bandwidth and 
latency.

Any other thoughts about this?

--Ian Turner


More information about the Linux-HA mailing list