[Linux-HA] Heartbeat Shutdown issues

Raoul Bhatia [IPAX] r.bhatia at ipax.at
Thu Oct 11 13:55:47 MDT 2007


Andrew Beekhof wrote:
> On 10/10/07, Raoul Bhatia [IPAX] <r.bhatia at ipax.at> wrote:
>> hi,
>>
>> every now and then i encounter shutdown issues with heartbeat.
>> right now for example:
>>
>>> crmd[23092]: 2007/10/10_22:38:41 info: do_state_transition: (Re)Issuing shutdown request now that we are the DC
>>> crmd[23092]: 2007/10/10_22:38:41 info: do_shutdown_req: Sending shutdown request to DC: webcluster01
>>> crmd[23092]: 2007/10/10_22:38:41 info: do_shutdown_req: Processing shutdown locally
>>> crmd[23092]: 2007/10/10_22:38:41 info: handle_shutdown_request: Creating shutdown request for webcluster01
>>> pengine[23100]: 2007/10/10_22:38:41 WARN: process_pe_message: Transition 14: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/heartbeat/pengine/pe-warn-3782.bz2
>>> pengine[23100]: 2007/10/10_22:38:41 info: process_pe_message: Configuration WARNINGs found during PE processing.  Please run "crm_verify -L" to identify issues.
>> as i do not know why this could happen, i am emailing the cib.xml and
>> the pe-warn-3782.bz2 files.
> 
> ptest[25187]: 2007/10/11_08:11:09 WARN: unpack_rsc_op: Processing
> failed op drbd_www:0_stop_0 on webcluster01: Error
> ptest[25187]: 2007/10/11_08:11:09 WARN: unpack_rsc_op: Processing
> failed op drbd_mysql:0_stop_0 on webcluster01: Error
> 
> which leads to
> 
> ptest[25187]: 2007/10/11_08:11:09 WARN: should_dump_action: action 13
> (drbd_www:0_stop_0) was for an unmanaged resource (drbd_www:0)
> ptest[25187]: 2007/10/11_08:11:09 WARN: should_dump_action: action 13
> (drbd_www:0_stop_0) was for an unmanaged resource (drbd_www:0)
> 
> 
> shutting down with (partially) active resources isn't a good idea,
> hence the warnings
> 
> 
>> as far as i can see, when i issue some "kills" i get a core dump in
>> /var/lib/heartbeat/cores/hacluster/ - please find it attached.
> 
> these are only useful on the machine that generated them
> what i need instead is the stack-trace
> 
> and why (and to whom) are you issuing "kills"?

good question. i thought i narrowed it down to "heartbeat: master
control process", as i think it said a couple of times that it waited
for this pid.

most of the time, after waiting for 5-10 minutes, i simply do a
"killall heartbeat" to be able to restart heartbeat after some kind of
test/failure/bug and then manually clean up other processes like lrmd,
ha_logd and once crmd.

i don't know any other way to be able to restart heartbeat.

cheers,
raoul
-- 
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc.          email.          r.bhatia at ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG         web.          http://www.ipax.at
Barawitzkagasse 10/2/2/11           email.            office at ipax.at
1190 Wien                           tel.               +43 1 3670030
FN 277995t HG Wien                  fax.            +43 1 3670030 15
____________________________________________________________________



More information about the Linux-HA mailing list