[Linux-HA] pingd does not work for connection loss
Andrew Beekhof
beekhof at gmail.com
Mon Aug 7 00:54:40 MDT 2006
On 8/4/06, Yann Zurcher <yann.zurcher at gmail.com> wrote:
> Hi,
>
> Setting up a cluster with two servers and two NICs each works pretty good
> and all the configuration seems to be correct.
> Unfortunately I have a problem with the pingd resource. (setup as a clone)
>
> If I set down the eth0 port it will fail over. If I just unplug the network
> cable on the switch side, it doesn't switch over.
> Monitored with the crm_mon tool I can see that pingd is stopped (see below),
> but if i have a look to the processes the pingd is started.
>
> Do I have a configuration Problem or is this just a bug in the pingd
> service?
>
> Thanks for your help!
>
> Regards,
> Yann
>
> Configuration:
> ha80
> eth0 192.168.1.180/24
> eth1 192.168.138.80/24
>
> ha81
> eth0 192.168.1.181/24
> eth1 192.168.138.81/24
>
> crm_mon output (on both server equal):
> Refresh in 4s...
> ============
> Last updated: Thu Aug 3 19:51:24 2006
> Current DC: ha81.netsapiens.com (983a6ebd-8436-44d7-b184-2cabb7f4fde7)
> 2 Nodes configured.
> 2 Resources configured.
> ============
>
> Node: ha80.netsapiens.com (9b778648-333b-4fa4-a0ee-10d0d8bc27dd): online
> Node: ha81.netsapiens.com (983a6ebd-8436-44d7-b184-2cabb7f4fde7): online
>
> Full list of resources:
>
> Resource Group: group_1
> IPaddr_192_168_1_177 (heartbeat::ocf:IPaddr): Started
> ha80.netsapiens.com
> netsapiens_enotify_2 (lsb:netsapiens_enotify): Started
> ha80.netsapiens.com
> netsapiens_ncs_3 (lsb:netsapiens_ncs): Started ha80.netsapiens.com
> netsapiens_nms_4 (lsb:netsapiens_nms): Started ha80.netsapiens.com
> Clone Set: connect
> pingd:0 (heartbeat::ocf:pingd): Stopped
> pingd:1 (heartbeat::ocf:pingd): Stopped
>
>
> Attached are the cib.xml, ha.cf and my log files.
So I found a couple of things in the logs.
First heartbeat doesn't seem to like your ping-node directive:
heartbeat[3013]: 2006/08/03_18:47:33 ERROR: Illegal ping [ping
membership] in config file [192.168.1.1.]
Secondly, I see:
heartbeat[3045]: 2006/08/03_18:48:14 info: respawn directive:
hacluster /usr/lib/heartbeat/pingd -m 100 -d 5s
So you're actually telling heartbeat to start it, and the CRM to start
it... pick just one.
The third thing I'm noticing is that there are no log entries from the
CRM... very odd.
And the fourth thing:
<nvpair id="cib-bootstrap-options-default_action_timeout"
name="default_action_timeout" value="5s"/>
Thats really quite short, you might want to increase it to 20s or 30s.
More information about the Linux-HA
mailing list