[Linux-HA] pingd problems

fabiomm at br.ibm.com fabiomm at br.ibm.com
Wed May 23 13:44:51 MDT 2007


Hi Everyone!

I've send an e-mail before talking about a problem with DB2 resource. Now 
I'm sending this e-mail because I have problems with pingd.

I'm configuring a cluster over SLES 10 on xSeries Servers (x86_64) running 
heartbeat 2.0.5 to manage DRBD, IP, Filesystems and DB2 resources.

The node s0580crmdb2pr1 is the active node and the node s0580crmdb2pr2 is 
the passive node (where no resources are running).

The /etc/ha.d/ha.cf is configured as follows:


debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
bcast   eth1
auto_failback on
watchdog /dev/watchdog
ping 10.226.0.100
respawn root /usr/lib64/heartbeat/pingd -m 200 -d 5s
apiauth ping gid=root uid=root
crm yes
node s0580crmdb2pr1 s0580crmdb2pr2

Here the IP address 10.226.0.100 is the gateway for both servers. My 
cib.xml have a place rule configured for pingd as follows:

     <constraints>
       <rsc_location id="place_db2pr1" rsc="group_db2pr1">
         <rule id="prefered_place_db2pr1" score="100">
           <expression attribute="#uname" 
id="745c2e82-e6cb-4c56-9611-797a2533a47d" operation="eq" 
value="s0580crmdb2pr1"/>
         </rule>
         <rule id="prefered_place_db2pr1_connected" 
score_attribute="pingd">
           <expression id="5042183e-b115-474e-9596-51d1581c0032" 
attribute="pingd" operation="defined"/>
         </rule>
       </rsc_location>
     </constraints>
   </configuration>
 </cib>

But everytime I start my heartbeat, I'm getting the following error 
messages:


attrd[6143]: 2007/05/23_16:13:05 info: attrd_timer_callback:attrd.c 
Sending flush op to all hosts for: pingd
attrd[6143]: 2007/05/23_16:13:05 info: attrd_ha_callback:attrd.c Sent 
update 6: pingd=0
tengine[9698]: 2007/05/23_16:14:14 info: te_crm_command:actions.c Skipping 
wait for 21
heartbeat[6114]: 2007/05/23_16:14:31 info: killing 
/usr/lib64/heartbeat/pingd -m 200 -d 5s process group 6138 with signal 15
pingd[6138]: 2007/05/23_16:14:31 info: send_update:pingd.c 0 active ping 
nodes
pingd[6138]: 2007/05/23_16:14:31 ERROR: crm_send_ipc_message:ipc.c IPC 
Channel to 6143 is not connected
pingd[6138]: 2007/05/23_16:14:31 WARN: #========= IPC[outbound] message 
start ==========#
pingd[6138]: 2007/05/23_16:14:31 WARN: MSG: Dumping message with 6 fields
pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[0] : [t=attrd]
pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[1] : [src=pingd]
pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[2] : [task=update]
pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[3] : [attr_name=pingd]
pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[4] : [attr_value=0]
pingd[6138]: 2007/05/23_16:14:31 WARN: MSG[5] : [attr_dampening=5s]
pingd[6138]: 2007/05/23_16:14:31 ERROR: send_update:pingd.c Could not send 
update
heartbeat[30416]: 2007/05/23_16:16:45 info: glib: ping heartbeat started.
heartbeat[30416]: 2007/05/23_16:16:46 info: Status update for node 
10.226.0.100: status ping
heartbeat[30416]: 2007/05/23_16:18:46 info: Starting child client 
"/usr/lib64/heartbeat/pingd -m 200 -d 5s" (0,0)
heartbeat[30580]: 2007/05/23_16:18:46 info: Starting 
"/usr/lib64/heartbeat/pingd -m 200 -d 5s" as uid 0  gid 0 (pid 30580)
cib[30582]: 2007/05/23_16:18:46 info: readCibXmlFile: [on-disk] <rule 
id="prefered_place_db2pr1_connected" score_attribute="pingd">
cib[30582]: 2007/05/23_16:18:46 info: readCibXmlFile: [on-disk] 
<expression id="5042183e-b115-474e-9596-51d1581c0032" attribute="pingd" 
operation="defined"/>
heartbeat[30416]: 2007/05/23_16:19:01 WARN: Client [pingd] pid 30580 
failed authorization [client failed authorization]
heartbeat[30416]: 2007/05/23_16:19:01 ERROR: api_process_registration_msg: 
cannot add client(pingd)
pengine[30673]: 2007/05/23_16:19:55 WARN: generate_location_rule:unpack.c 
node s0580crmdb2pr2 did not have a value for pingd
pengine[30673]: 2007/05/23_16:19:55 WARN: generate_location_rule:unpack.c 
node s0580crmdb2pr1 did not have a value for pingd
pengine[30673]: 2007/05/23_16:19:58 WARN: generate_location_rule:unpack.c 
node s0580crmdb2pr2 did not have a value for pingd
pengine[30673]: 2007/05/23_16:19:58 WARN: generate_location_rule:unpack.c 
node s0580crmdb2pr1 did not have a value for pingd
pengine[30673]: 2007/05/23_16:20:00 WARN: generate_location_rule:unpack.c 
node s0580crmdb2pr2 did not have a value for pingd
pengine[30673]: 2007/05/23_16:20:00 WARN: generate_location_rule:unpack.c 
node s0580crmdb2pr1 did not have a value for pingd

And my crm_verify -LV shows errors related to pingd too:

s0580crmdb2pr1:~ # crm_verify -LV
crm_verify[30641]: 2007/05/23_16:19:22 WARN: 
generate_location_rule:unpack.c nod
e s0580crmdb2pr2 did not have a value for pingd
crm_verify[30641]: 2007/05/23_16:19:22 WARN: 
generate_location_rule:unpack.c nod
e s0580crmdb2pr1 did not have a value for pingd

I'd like to ask you some help over this problem.

Best Regards,
Fabio Martins


More information about the Linux-HA mailing list