[Linux-HA] pingd problems
Andrew Beekhof
beekhof at gmail.com
Thu May 24 00:53:41 MDT 2007
On 5/23/07, fabiomm at br.ibm.com <fabiomm at br.ibm.com> wrote:
> Hi Everyone!
>
> I've send an e-mail before talking about a problem with DB2 resource. Now
> I'm sending this e-mail because I have problems with pingd.
>
> I'm configuring a cluster over SLES 10 on xSeries Servers (x86_64) running
> heartbeat 2.0.5 to manage DRBD, IP, Filesystems and DB2 resources.
please apply the sles10 updates to get the latest heartbeat version.
>
> The node s0580crmdb2pr1 is the active node and the node s0580crmdb2pr2 is
> the passive node (where no resources are running).
>
> The /etc/ha.d/ha.cf is configured as follows:
>
>
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility local0
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport 694
> bcast eth1
> auto_failback on
> watchdog /dev/watchdog
> ping 10.226.0.100
> respawn root /usr/lib64/heartbeat/pingd -m 200 -d 5s
> apiauth ping gid=root uid=root
> crm yes
> node s0580crmdb2pr1 s0580crmdb2pr2
>
> Here the IP address 10.226.0.100 is the gateway for both servers. My
> cib.xml have a place rule configured for pingd as follows:
>
> <constraints>
> <rsc_location id="place_db2pr1" rsc="group_db2pr1">
> <rule id="prefered_place_db2pr1" score="100">
> <expression attribute="#uname"
> id="745c2e82-e6cb-4c56-9611-797a2533a47d" operation="eq"
> value="s0580crmdb2pr1"/>
> </rule>
> <rule id="prefered_place_db2pr1_connected"
> score_attribute="pingd">
> <expression id="5042183e-b115-474e-9596-51d1581c0032"
> attribute="pingd" operation="defined"/>
> </rule>
> </rsc_location>
> </constraints>
> </configuration>
> </cib>
>
> But everytime I start my heartbeat, I'm getting the following error
> messages:
>
>
> attrd[6143]: 2007/05/23_16:13:05 info: attrd_timer_callback:attrd.c
> Sending flush op to all hosts for: pingd
> attrd[6143]: 2007/05/23_16:13:05 info: attrd_ha_callback:attrd.c Sent
> update 6: pingd=0
> tengine[9698]: 2007/05/23_16:14:14 info: te_crm_command:actions.c Skipping
> wait for 21
> heartbeat[6114]: 2007/05/23_16:14:31 info: killing
> /usr/lib64/heartbeat/pingd -m 200 -d 5s process group 6138 with signal 15
> pingd[6138]: 2007/05/23_16:14:31 info: send_update:pingd.c 0 active ping
> nodes
> pingd[6138]: 2007/05/23_16:14:31 ERROR: crm_send_ipc_message:ipc.c IPC
> Channel to 6143 is not connected
> pingd[6138]: 2007/05/23_16:14:31 WARN: #========= IPC[outbound] message
> start ==========#
> pingd[6138]: 2007/05/23_16:14:31 WARN: MSG: Dumping message with 6 fields
> pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[0] : [t=attrd]
> pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[1] : [src=pingd]
> pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[2] : [task=update]
> pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[3] : [attr_name=pingd]
> pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[4] : [attr_value=0]
> pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[5] : [attr_dampening=5s]
> pingd[6138]: 2007/05/23_16:14:31 ERROR: send_update:pingd.c Could not send
> update
> heartbeat[30416]: 2007/05/23_16:16:45 info: glib: ping heartbeat started.
> heartbeat[30416]: 2007/05/23_16:16:46 info: Status update for node
> 10.226.0.100: status ping
> heartbeat[30416]: 2007/05/23_16:18:46 info: Starting child client
> "/usr/lib64/heartbeat/pingd -m 200 -d 5s" (0,0)
> heartbeat[30580]: 2007/05/23_16:18:46 info: Starting
> "/usr/lib64/heartbeat/pingd -m 200 -d 5s" as uid 0 gid 0 (pid 30580)
> cib[30582]: 2007/05/23_16:18:46 info: readCibXmlFile: [on-disk] <rule
> id="prefered_place_db2pr1_connected" score_attribute="pingd">
> cib[30582]: 2007/05/23_16:18:46 info: readCibXmlFile: [on-disk]
> <expression id="5042183e-b115-474e-9596-51d1581c0032" attribute="pingd"
> operation="defined"/>
> heartbeat[30416]: 2007/05/23_16:19:01 WARN: Client [pingd] pid 30580
> failed authorization [client failed authorization]
> heartbeat[30416]: 2007/05/23_16:19:01 ERROR: api_process_registration_msg:
> cannot add client(pingd)
> pengine[30673]: 2007/05/23_16:19:55 WARN: generate_location_rule:unpack.c
> node s0580crmdb2pr2 did not have a value for pingd
> pengine[30673]: 2007/05/23_16:19:55 WARN: generate_location_rule:unpack.c
> node s0580crmdb2pr1 did not have a value for pingd
> pengine[30673]: 2007/05/23_16:19:58 WARN: generate_location_rule:unpack.c
> node s0580crmdb2pr2 did not have a value for pingd
> pengine[30673]: 2007/05/23_16:19:58 WARN: generate_location_rule:unpack.c
> node s0580crmdb2pr1 did not have a value for pingd
> pengine[30673]: 2007/05/23_16:20:00 WARN: generate_location_rule:unpack.c
> node s0580crmdb2pr2 did not have a value for pingd
> pengine[30673]: 2007/05/23_16:20:00 WARN: generate_location_rule:unpack.c
> node s0580crmdb2pr1 did not have a value for pingd
>
> And my crm_verify -LV shows errors related to pingd too:
>
> s0580crmdb2pr1:~ # crm_verify -LV
> crm_verify[30641]: 2007/05/23_16:19:22 WARN:
> generate_location_rule:unpack.c nod
> e s0580crmdb2pr2 did not have a value for pingd
> crm_verify[30641]: 2007/05/23_16:19:22 WARN:
> generate_location_rule:unpack.c nod
> e s0580crmdb2pr1 did not have a value for pingd
>
> I'd like to ask you some help over this problem.
>
> Best Regards,
> Fabio Martins
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list