[Linux-HA] pingd strange problem
Andrew Beekhof
beekhof at gmail.com
Wed Dec 27 12:06:48 MST 2006
pingd had some problems in 2.0.7 - they have been fixed ready for 2.0.8
On 12/27/06, Carlos Velasco <heartbeat at nimastelecom.com> wrote:
> Hello,
>
> I'm using HB 2.0.7 SUSE Linux Enterprise Server 10.
>
> My setup is two nodes (nas-a, nas-b) with 2 resource groups (group_1,
> group_2).
>
> For resource location, group_1 should be serviced by nas-a if it's sane.
> Group_2 should be running in nas-b if it's sane.
>
> There's a pingd RA clone configured that checks the router availability.
>
> With HB running on both nodes and stable, group_1 is running in nas-a,
> group_2 is running in nas-b.
>
> Then nas-a lost connectivity (pingd) to the router.
> The group_1 resources failover to nas-b, right.
> But nas-a still has group_1 resources running.
> Moreover, nas-a is still DC ?? It's offline...
>
> crm_mon shows:
>
> =====================
> Current DC: nas-a (65ad0c4d-e58b-44ba-9784-1a6c7e3e24f0)
> 2 Nodes configured.
> 3 Resources configured.
> ============
>
> Node: nas-a (65ad0c4d-e58b-44ba-9784-1a6c7e3e24f0): OFFLINE
> Node: nas-b (6c882c2a-9ef0-4d5d-b6b4-f344af6a0e74): online
>
> Clone Set: pingd
> pingd-child:0 (heartbeat::ocf:pingd): Stopped
> pingd-child:1 (heartbeat::ocf:pingd): Started nas-b
> Resource Group: group_1
> LVM_group_1 (heartbeat::ocf:LVM_Nimas): Started nas-b
> Filesystem_group_1 (heartbeat::ocf:Filesystem): Started nas-b
> IPaddr2_group_1 (heartbeat::ocf:IPaddr2): Started nas-b
> Resource Group: group_2
> LVM_group_2 (heartbeat::ocf:LVM_Nimas): Started nas-b
> Filesystem_group_2 (heartbeat::ocf:Filesystem): Started nas-b
> IPaddr2_group_2 (heartbeat::ocf:IPaddr2): Started nas-b
> =====================
>
>
> Reading the logs for nas-a, I think it's not matching the shutdown for
> nas-a when pingd reports the failure:
>
> ===
> heartbeat[19747]: 2006/12/27_15:06:42 WARN: node 10.10.112.1: is dead
> heartbeat[19747]: 2006/12/27_15:06:42 info: Link 10.10.112.1:10.10.112.1
> dead.
> crmd[19761]: 2006/12/27_15:06:42 notice:
> crmd_ha_status_callback:callbacks.c Status update: Node 10.10.112.1 now
> has status [dead]
> pingd[19957]: 2006/12/27_15:06:42 notice: pingd_nstatus_callback:pingd.c
> Status update: Ping node 10.10.112.1 now has status [dead]
> pingd[19957]: 2006/12/27_15:06:42 info: send_update:pingd.c 0 active
> ping nodes
> pingd[19957]: 2006/12/27_15:06:42 notice: pingd_lstatus_callback:pingd.c
> Status update: Ping node 10.10.112.1 now has status [dead]
> pingd[19957]: 2006/12/27_15:06:42 notice: pingd_nstatus_callback:pingd.c
> Status update: Ping node 10.10.112.1 now has status [dead]
> pingd[19957]: 2006/12/27_15:06:42 info: send_update:pingd.c 0 active
> ping nodes
> crmd[19761]: 2006/12/27_15:06:42 WARN: get_uuid:utils.c Could not
> calculate UUID for 10.10.112.1
> cib[19757]: 2006/12/27_15:06:42 info: activateCibXml:io.c CIB size is
> 345680 bytes (was 355188)
> cib[19757]: 2006/12/27_15:06:42 info: cib_diff_notify:notify.c
> Local-only Change (client:19761, call: 58): 0.45.1362 (ok)
> crmd[19761]: 2006/12/27_15:06:42 info: do_state_transition:fsa.c nas-a:
> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_IPC_MESSAGE origin=route_message ]
> tengine[19771]: 2006/12/27_15:06:42 info: te_update_diff:callbacks.c
> Processing diff (cib_update): 0.45.1362 -> 0.45.1362
> crmd[19761]: 2006/12/27_15:06:42 info: do_state_transition:fsa.c All 2
> cluster nodes are eligable to run resources.
> tengine[19771]: 2006/12/27_15:06:42 WARN: match_down_event:events.c No
> match for shutdown action on 65ad0c4d-e58b-44ba-9784-1a6c7e3e
> 24f0
> tengine[19771]: 2006/12/27_15:06:42 info: extract_event:events.c
> Stonith/shutdown event not matched
> ===
>
> I have attached the cibs, ha.cfs, logs of starting and the induced ping
> fail, and a cibadmin -Q when nas-a is failing ping.
>
> Is there something wrong in the configuration?
>
> Regards,
> Carlos Velasco
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>
More information about the Linux-HA
mailing list