[Linux-HA] Newbie's resources not flying

J. B. Schatz jschatz at linux-nexus.com
Sat Mar 12 08:02:04 MST 2005


> On 2005-03-09T12:19:25, "J. B. Schatz" <jschatz at linux-nexus.com> wrote:
>
>> All observations are **greatly** appreciated. TIA.
>>
>> ha.cf file:
>> debugfile /var/heartbeat/ha-debug
>> logfacility local7
>> keepalive 2
>> deadtime 30
>> warntime 10
>> initdead 120
>> udpport 694
>> #baud 19200
>> #serial /dev/ttyS0
>> bcast eth2
>> ping 191.250.1.1#ruteador
>> auto_failback off
>> node ns1
>> node ns2
>> respawn cluster /usr/lib/heartbeat/ipfail
>
> Well, that one looks well enough.
>
>>
>> haresources file:
>> ns1 201.225.35.18/28/eth0 drbddisk::r0
>> Filesystem::/dev/drbd/0::/home::reiserfs samba
>
> I assume that in practice they are both on one line?
>
> Now, with the IPaddr syntax you used, you might have to specify it as
> IPaddr::201.225.35.18/28/eth0 (I _think_ the heartbeat heuristic only
> auto-completes with the IPaddr prefix for pure dot-quad notation, but
> this might be a RedHerring).
>
> I also assume the file is exactly the same on both nodes? ie, "md5sum
> /etc/ha.d/haresources" reports the same on both sides...? ha.cf is too?
>
> The debugfile might have more information. Maybe you can spot what's
> going on if you set debug 1 in ha.cf.
>
>
> Sincerely,
>     Lars Marowsky-Brée <lmb at suse.de>
>
> --
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>
Didn't send to the list; here it is...
Lars,

Thanks for suggesting to set the debug setting higher, now there is a lot
more info. Unfortunately my newbie eyes still do not detect anything that
might indicate the solution; the WARN line is curious but I don't know
what
it means. The new, fleshed out, logs follow below. Please anyone see
anything suspicious in them?

I studied the "Heartbeat IPaddr resource agent" doc after your suggestion to
try adding IPaddr:: to the haresources file. Your guess that it's probably
a red herring seems correct, because I also ran the test this time with
haresources revised as shown:

ns1 IPaddr::201.225.35.18/28/eth0 drbddisk::r0
Filesystem::/dev/drbd/0::/home::reiserfs samba

And, yes, the config is all on one line in haresources (it must be getting
word-wrapped in the email) and the config files of both nodes have
matching md5sum results.

log from node1 (ns1):
Mar 10 12:15:58 [heartbeat] info: **************************
Mar 10 12:15:58 [heartbeat] info: Configuration validated. Starting
heartbeat 1.2.3
Mar 10 12:15:58 [heartbeat] debug: HA configuration OK.  Heartbeat starting.
Mar 10 12:15:58 [heartbeat] info: heartbeat: version 1.2.3
Mar 10 12:15:58 [heartbeat] info: Heartbeat generation: 19
Mar 10 12:15:58 [heartbeat] debug: opening bcast eth2 (UDP/IP broadcast)
Mar 10 12:15:58 [heartbeat] debug: SO_BINDTODEVICE(r) set for device eth2
Mar 10 12:15:58 [heartbeat] info: UDP Broadcast heartbeat started on port
694 (694) interface eth2
Mar 10 12:15:58 [heartbeat] debug: bcast channel eth2 now open...
Mar 10 12:15:58 [heartbeat] debug: opening ping 191.250.1.1 (ping membership)
Mar 10 12:15:58 [heartbeat] info: ping heartbeat started.
Mar 10 12:15:58 [heartbeat] debug: ping channel 191.250.1.1 now open...
Mar 10 12:15:58 [heartbeat] debug: FIFO process pid: 26629
Mar 10 12:15:58 [heartbeat] debug: write process pid: 26630
Mar 10 12:15:58 [heartbeat] debug: read child process pid: 26631
Mar 10 12:15:58 [heartbeat] debug: write process pid: 26632
Mar 10 12:15:58 [heartbeat] info: pid 26629 locked in memory.
Mar 10 12:15:58 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
milliseconds
Mar 10 12:15:58 [heartbeat] info: pid 26630 locked in memory.
Mar 10 12:15:58 [heartbeat] debug: Limiting CPU: 24 CPU seconds every
60000 milliseconds
Mar 10 12:15:58 [heartbeat] info: pid 26631 locked in memory.
Mar 10 12:15:58 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
milliseconds
Mar 10 12:15:58 [heartbeat] info: pid 26632 locked in memory.
Mar 10 12:15:58 [heartbeat] debug: Limiting CPU: 24 CPU seconds every
60000 milliseconds
Mar 10 12:15:58 [heartbeat] debug: read child process pid: 26633
Mar 10 12:15:58 [heartbeat] debug: Limiting CPU: 30 CPU seconds every
60000 milliseconds
Mar 10 12:15:58 [heartbeat] info: pid 26610 locked in memory.
Mar 10 12:15:58 [heartbeat] debug: Waiting for child processes to start
Mar 10 12:15:58 [heartbeat] info: Local status now set to: 'up'
Mar 10 12:15:58 [heartbeat] debug: All your child process are belong to us
Mar 10 12:15:58 [heartbeat] debug: Starting local status message @ 2000 ms
intervals
Mar 10 12:15:58 [heartbeat] info: Link ns1:eth2 up.
Mar 10 12:15:58 [heartbeat] debug: CreateInitialFilter: ip-request
Mar 10 12:15:58 [heartbeat] debug: CreateInitialFilter: ip-request-resp
Mar 10 12:15:58 [heartbeat] debug: CreateInitialFilter: status
Mar 10 12:15:58 [heartbeat] debug: CreateInitialFilter: ask_resources
Mar 10 12:15:58 [heartbeat] debug: CreateInitialFilter: hb_takeover
Mar 10 12:15:59 [heartbeat] info: pid 26633 locked in memory.
Mar 10 12:15:59 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
milliseconds
Mar 10 12:15:59 [heartbeat] info: Link 191.250.1.1:191.250.1.1 up.
Mar 10 12:15:59 [heartbeat] info: Status update for node 191.250.1.1:
status ping
Mar 10 12:15:59 [heartbeat] debug: Status seqno: 0 msgtime: 1110474958
Mar 10 12:16:07 [heartbeat] info: Link ns2:eth2 up.
Mar 10 12:16:07 [heartbeat] info: Status update for node ns2: status up
Mar 10 12:16:07 [heartbeat] debug: Status seqno: 1 msgtime: 1110474967
Mar 10 12:16:07 [heartbeat] debug: StartNextRemoteRscReq() - calling hook
Mar 10 12:16:07 [heartbeat] debug: notify_world: invoking harc: OLD
status: up
Mar 10 12:16:07 [heartbeat] debug: Process [status] started pid 26650
Mar 10 12:16:07 [heartbeat] debug: Starting notify process [status]
Mar 10 12:16:07 [heartbeat] debug: Comm_now_up(): updating status to active
Mar 10 12:16:07 [heartbeat] info: Local status now set to: 'active'
Mar 10 12:16:07 [heartbeat] debug: Sending local starting msg:
resourcestate = 0
Mar 10 12:16:07 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 0
Mar 10 12:16:07 [heartbeat] info: Starting child client
"/usr/lib/heartbeat/ipfail" (65,65)
Mar 10 12:16:07 [heartbeat] debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
Mar 10 12:16:07 [heartbeat] debug: notify_world: Running harc status
Mar 10 12:16:07 [heartbeat] info: Starting "/usr/lib/heartbeat/ipfail" as
uid 65  gid 65 (pid 26651)
Mar 10 12:16:07 [heartbeat] WARN: Exiting status process 26650 returned rc 1.
Mar 10 12:16:07 [heartbeat] debug: RscMgmtProc 'status' exited code 1
Mar 10 12:16:07 [heartbeat] debug: APIregistration_dispatch() {
Mar 10 12:16:07 [heartbeat] debug: process_registerevent() {
Mar 10 12:16:07 [heartbeat] debug: client->gsource = 0x809a720
Mar 10 12:16:07 [heartbeat] debug: }/*process_registerevent*/;
Mar 10 12:16:07 [heartbeat] debug: }/*APIregistration_dispatch*/;
Mar 10 12:16:07 [heartbeat] debug: Checking client authorization for
client ipfail (65:65)
Mar 10 12:16:07 [heartbeat] debug: Signing on API client 26651 (ipfail)
Mar 10 12:16:07 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 1
Mar 10 12:16:08 [heartbeat] info: Status update for node ns2: status active
Mar 10 12:16:08 [heartbeat] debug: Status seqno: 3 msgtime: 1110474968
Mar 10 12:16:08 [heartbeat] debug: StartNextRemoteRscReq() - calling hook
Mar 10 12:16:08 [heartbeat] debug: notify_world: invoking harc: OLD
status: active
Mar 10 12:16:08 [heartbeat] debug: Process [status] started pid 26652
Mar 10 12:16:08 [heartbeat] debug: Starting notify process [status]
Mar 10 12:16:08 [heartbeat] info: AnnounceTakeover(local 0, foreign 1,
reason 'HB_R_BOTHSTARTING' (0))
Mar 10 12:16:08 [heartbeat] debug: process_resources: other now unstable
Mar 10 12:16:08 [heartbeat] debug: Sending hold resources msg: none,
stable=0 # <none>
Mar 10 12:16:08 [heartbeat] info: AnnounceTakeover(local 0, foreign 1,
reason 'T_RESOURCES' (0))
Mar 10 12:16:08 [heartbeat] info: STATE 1 => 3
Mar 10 12:16:08 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 3
Mar 10 12:16:08 [heartbeat] info: STATE 3 => 2
Mar 10 12:16:08 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 2
Mar 10 12:16:08 [heartbeat] debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
Mar 10 12:16:08 [heartbeat] debug: notify_world: Running harc status
Mar 10 12:16:08 [heartbeat] WARN: Exiting status process 26652 returned rc 1.
Mar 10 12:16:08 [heartbeat] debug: RscMgmtProc 'status' exited code 1
Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 0, foreign 1,
reason 'T_RESOURCES' (0))
Mar 10 12:16:19 [heartbeat] info: remote resource transition completed.
Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: none,
stable=0 # <none>
Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 0, foreign 1,
reason 'T_RESOURCES' (0))
Mar 10 12:16:19 [heartbeat] info: STATE 2 => 3
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 3
Mar 10 12:16:19 [heartbeat] debug: Calling PerformAutoFailback()
Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 1
Mar 10 12:16:19 [heartbeat] info: remote resource transition completed.
Mar 10 12:16:19 [heartbeat] debug: Process [req_our_resources(ask)]
started pid 26653
Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: local,
stable=1 # <none>
Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
reason 'T_RESOURCES(us)' (0))
Mar 10 12:16:19 [heartbeat] info: Initial resource acquisition complete
(T_RESOURCES(us))
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 3
Mar 10 12:16:19 [heartbeat] debug: Calling PerformAutoFailback()
Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
reason 'T_RESOURCES(them)' (1))
Mar 10 12:16:19 [heartbeat] info: STATE 3 => 4
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 1
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:16:19 [heartbeat] debug:
req_our_resources(/usr/lib/heartbeat/ResourceManager listkeys ns1)
Mar 10 12:16:19 [heartbeat] debug: req_our_resources(): running
[/usr/lib/heartbeat/req_resource IPaddr::201.225.35.18/28/eth0]
Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 1
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:16:19 [heartbeat] debug: StartNextRemoteRscReq(): child count 1
Mar 10 12:16:19 [heartbeat] info: 1 local resources from
[/usr/lib/heartbeat/ResourceManager listkeys ns1]
Mar 10 12:16:19 [heartbeat] info: Local Resource acquisition completed.
Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: local,
stable=1 # req_our_resources()
Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
reason 'T_RESOURCES(us)' (1))
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:16:19 [heartbeat] info: Exiting req_our_resources(ask) process
26653 returned rc 0.
Mar 10 12:16:19 [heartbeat] debug: RscMgmtProc 'req_our_resources(ask)'
exited code 0
Mar 10 12:16:19 [heartbeat] debug: StartNextRemoteRscReq() - calling hook
Mar 10 12:16:19 [heartbeat] debug: notify_world: invoking harc: OLD
status: active
Mar 10 12:16:19 [heartbeat] debug: Process [ip-request-resp] started pid
26685
Mar 10 12:16:19 [heartbeat] debug: Starting notify process [ip-request-resp]
Mar 10 12:16:19 [heartbeat] debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
Mar 10 12:16:19 [heartbeat] debug: notify_world: Running harc ip-request-resp
Mar 10 12:16:19 [heartbeat] WARN: Exiting ip-request-resp process 26685
returned rc 1.
Mar 10 12:16:19 [heartbeat] debug: RscMgmtProc 'ip-request-resp' exited
code 1
Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
reason 'ip-request-resp' (1))
Mar 10 12:23:59 [heartbeat] debug: Process 26610 processing SIGTERM
Mar 10 12:23:59 [heartbeat] debug: hb_initiate_shutdown() called.
Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:23:59 [heartbeat] info: hb_giveup_resources(): current status:
active
Mar 10 12:23:59 [heartbeat] debug: Sending hold resources msg: none,
stable=0 # shutdown
Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 5
Mar 10 12:23:59 [heartbeat] info: Heartbeat shutdown in progress. (26610)
Mar 10 12:23:59 [heartbeat] debug: Process [hb_giveup_resources] started
pid 26760
Mar 10 12:23:59 [heartbeat] info: Giving up all HA resources.
Mar 10 12:23:59 [heartbeat] ERROR: /usr/lib/heartbeat/ResourceManager
givegroup IPaddr::201.225.35.18/28/eth0 returned 256
Mar 10 12:23:59 [heartbeat] info: killing /usr/lib/heartbeat/ipfail
process group 26651 with signal 15
Mar 10 12:23:59 [heartbeat] info: All HA resources relinquished.
Mar 10 12:23:59 [heartbeat] debug: Sending T_SHUTDONE.
Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 5
Mar 10 12:23:59 [heartbeat] debug: Received T_SHUTDONE from us.
Mar 10 12:23:59 [heartbeat] debug: Calling hb_mcp_final_shutdown in a second.
Mar 10 12:23:59 [heartbeat] debug: RscMgmtProc 'hb_giveup_resources'
exited code 0
Mar 10 12:23:59 [heartbeat] debug: hb_mcp_final_shutdown() phase 0
Mar 10 12:23:59 [heartbeat] info: killing /usr/lib/heartbeat/ipfail
process group 26651 with signal 15
Mar 10 12:23:59 [heartbeat] WARN: 1 lost packet(s) for [ns2] [243:245]
Mar 10 12:23:59 [heartbeat] info: other_holds_resources: 3
Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 5
Mar 10 12:23:59 [heartbeat] info: No pkts missing from ns2!
Mar 10 12:23:59 [heartbeat] debug: standby message ignored during shutdown
Mar 10 12:23:59 [heartbeat] info: other_holds_resources: 3
Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 5
Mar 10 12:24:00 [heartbeat] debug: Signing client 26651 off
Mar 10 12:24:00 [heartbeat] debug: G_remove_client(pid=26651,
reason='signoff' gsource=0x809a720) {
Mar 10 12:24:00 [heartbeat] debug: api_remove_client_int: removing pid
[26651] reason: signoff
Mar 10 12:24:00 [heartbeat] debug: }/*G_remove_client;*/
Mar 10 12:24:00 [heartbeat] debug: hb_mcp_final_shutdown() phase 1
Mar 10 12:24:00 [heartbeat] info: killing HBREAD process 26631 with signal 15
Mar 10 12:24:00 [heartbeat] info: killing HBWRITE process 26632 with
signal 15
Mar 10 12:24:00 [heartbeat] info: killing HBREAD process 26633 with signal 15
Mar 10 12:24:00 [heartbeat] info: killing HBFIFO process 26629 with signal 15
Mar 10 12:24:00 [heartbeat] info: killing HBWRITE process 26630 with
signal 15
Mar 10 12:24:00 [heartbeat] debug: Process 26629 processing SIGTERM
Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26629 [rc=15]
Mar 10 12:24:00 [heartbeat] info: Core process 26629 exited. 5 remaining
Mar 10 12:24:00 [heartbeat] debug: Process 26630 processing SIGTERM
Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26630 [rc=15]
Mar 10 12:24:00 [heartbeat] info: Core process 26630 exited. 4 remaining
Mar 10 12:24:00 [heartbeat] debug: Process 26631 processing SIGTERM
Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26631 [rc=15]
Mar 10 12:24:00 [heartbeat] info: Core process 26631 exited. 3 remaining
Mar 10 12:24:00 [heartbeat] debug: Process 26632 processing SIGTERM
Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26632 [rc=15]
Mar 10 12:24:00 [heartbeat] info: Core process 26632 exited. 2 remaining
Mar 10 12:24:00 [heartbeat] debug: Process 26633 processing SIGTERM
Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26633 [rc=15]
Mar 10 12:24:00 [heartbeat] info: Core process 26633 exited. 1 remaining
Mar 10 12:24:00 [heartbeat] info: Heartbeat shutdown complete.
Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26610 [rc=0]

log from node 2 (ns2):
Mar 10 12:16:06 [heartbeat] info: **************************
Mar 10 12:16:06 [heartbeat] info: Configuration validated. Starting
heartbeat 1.2.3
Mar 10 12:16:06 [heartbeat] debug: HA configuration OK.  Heartbeat starting.
Mar 10 12:16:06 [heartbeat] info: heartbeat: version 1.2.3
Mar 10 12:16:07 [heartbeat] info: Heartbeat generation: 10
Mar 10 12:16:07 [heartbeat] debug: opening bcast eth2 (UDP/IP broadcast)
Mar 10 12:16:07 [heartbeat] debug: SO_BINDTODEVICE(r) set for device eth2
Mar 10 12:16:07 [heartbeat] info: UDP Broadcast heartbeat started on port
694 (694) interface eth2
Mar 10 12:16:07 [heartbeat] debug: bcast channel eth2 now open...
Mar 10 12:16:07 [heartbeat] debug: opening ping 191.250.1.1 (ping membership)
Mar 10 12:16:07 [heartbeat] info: ping heartbeat started.
Mar 10 12:16:07 [heartbeat] debug: ping channel 191.250.1.1 now open...
Mar 10 12:16:07 [heartbeat] debug: FIFO process pid: 17555
Mar 10 12:16:07 [heartbeat] debug: write process pid: 17556
Mar 10 12:16:07 [heartbeat] debug: read child process pid: 17557
Mar 10 12:16:07 [heartbeat] debug: write process pid: 17558
Mar 10 12:16:07 [heartbeat] debug: read child process pid: 17559
Mar 10 12:16:07 [heartbeat] debug: Limiting CPU: 30 CPU seconds every
60000 milliseconds
Mar 10 12:16:07 [heartbeat] info: pid 17555 locked in memory.
Mar 10 12:16:07 [heartbeat] info: pid 17556 locked in memory.
Mar 10 12:16:07 [heartbeat] info: pid 17557 locked in memory.
Mar 10 12:16:07 [heartbeat] info: pid 17558 locked in memory.
Mar 10 12:16:07 [heartbeat] info: pid 17520 locked in memory.
Mar 10 12:16:07 [heartbeat] debug: Waiting for child processes to start
Mar 10 12:16:07 [heartbeat] info: Local status now set to: 'up'
Mar 10 12:16:07 [heartbeat] debug: All your child process are belong to us
Mar 10 12:16:07 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
milliseconds
Mar 10 12:16:07 [heartbeat] debug: Limiting CPU: 24 CPU seconds every
60000 milliseconds
Mar 10 12:16:07 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
milliseconds
Mar 10 12:16:07 [heartbeat] debug: Limiting CPU: 24 CPU seconds every
60000 milliseconds
Mar 10 12:16:07 [heartbeat] debug: Starting local status message @ 2000 ms
intervals
Mar 10 12:16:07 [heartbeat] info: Link ns2:eth2 up.
Mar 10 12:16:07 [heartbeat] debug: CreateInitialFilter: ip-request
Mar 10 12:16:07 [heartbeat] debug: CreateInitialFilter: ip-request-resp
Mar 10 12:16:07 [heartbeat] debug: CreateInitialFilter: status
Mar 10 12:16:07 [heartbeat] debug: CreateInitialFilter: ask_resources
Mar 10 12:16:07 [heartbeat] debug: CreateInitialFilter: hb_takeover
Mar 10 12:16:07 [heartbeat] info: Link ns1:eth2 up.
Mar 10 12:16:07 [heartbeat] info: Status update for node ns1: status active
Mar 10 12:16:07 [heartbeat] debug: Status seqno: 7 msgtime: 1110474967
Mar 10 12:16:07 [heartbeat] debug: StartNextRemoteRscReq() - calling hook
Mar 10 12:16:07 [heartbeat] debug: notify_world: invoking harc: OLD
status: up
Mar 10 12:16:07 [heartbeat] debug: Process [status] started pid 17560
Mar 10 12:16:07 [heartbeat] debug: Starting notify process [status]
Mar 10 12:16:07 [heartbeat] debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
Mar 10 12:16:07 [heartbeat] debug: notify_world: Running harc status
Mar 10 12:16:07 [heartbeat] WARN: Exiting status process 17560 returned rc 1.
Mar 10 12:16:07 [heartbeat] debug: RscMgmtProc 'status' exited code 1
Mar 10 12:16:08 [heartbeat] info: pid 17559 locked in memory.
Mar 10 12:16:08 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
milliseconds
Mar 10 12:16:08 [heartbeat] info: Link 191.250.1.1:191.250.1.1 up.
Mar 10 12:16:08 [heartbeat] info: Status update for node 191.250.1.1:
status ping
Mar 10 12:16:08 [heartbeat] debug: Status seqno: 0 msgtime: 1110474967
Mar 10 12:16:08 [heartbeat] debug: Comm_now_up(): updating status to active
Mar 10 12:16:08 [heartbeat] info: Local status now set to: 'active'
Mar 10 12:16:08 [heartbeat] debug: Sending local starting msg:
resourcestate = 0
Mar 10 12:16:08 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 0
Mar 10 12:16:08 [heartbeat] info: Starting child client
"/usr/lib/heartbeat/ipfail" (65,65)
Mar 10 12:16:08 [heartbeat] info: AnnounceTakeover(local 0, foreign 1,
reason 'T_RESOURCES' (0))
Mar 10 12:16:08 [heartbeat] debug: process_resources(2):  other now unstable
Mar 10 12:16:08 [heartbeat] info: other_holds_resources: 0
Mar 10 12:16:08 [heartbeat] info: STATE 1 => 3
Mar 10 12:16:08 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 3
Mar 10 12:16:08 [heartbeat] info: Starting "/usr/lib/heartbeat/ipfail" as
uid 65  gid 65 (pid 17561)
Mar 10 12:16:08 [heartbeat] debug: APIregistration_dispatch() {
Mar 10 12:16:08 [heartbeat] debug: process_registerevent() {
Mar 10 12:16:08 [heartbeat] debug: client->gsource = 0x809aaf0
Mar 10 12:16:08 [heartbeat] debug: }/*process_registerevent*/;
Mar 10 12:16:08 [heartbeat] debug: }/*APIregistration_dispatch*/;
Mar 10 12:16:08 [heartbeat] debug: Checking client authorization for
client ipfail (65:65)
Mar 10 12:16:08 [heartbeat] debug: Signing on API client 17561 (ipfail)
Mar 10 12:16:08 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 3
Mar 10 12:16:19 [heartbeat] debug: Process [req_our_resources(ask)]
started pid 17562
Mar 10 12:16:19 [heartbeat] info: local resource transition completed.
Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: local,
stable=1 # <none>
Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
reason 'T_RESOURCES(us)' (0))
Mar 10 12:16:19 [heartbeat] info: Initial resource acquisition complete
(T_RESOURCES(us))
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
                - Last output repeated twice -
Mar 10 12:16:19 [heartbeat] debug: process_resources(2):  other now unstable
Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 0
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:16:19 [heartbeat] info: remote resource transition completed.
Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: local,
stable=1 # <none>
Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
reason 'T_RESOURCES(us)' (1))
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:16:19 [heartbeat] debug: Calling PerformAutoFailback()
Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 1
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:16:19 [heartbeat] debug:
req_our_resources(/usr/lib/heartbeat/ResourceManager listkeys ns2)
Mar 10 12:16:19 [heartbeat] info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys ns2] to acquire.
Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: local,
stable=1 # req_our_resources()
Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
reason 'T_RESOURCES(us)' (1))
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:16:19 [heartbeat] info: Exiting req_our_resources(ask) process
17562 returned rc 0.
Mar 10 12:16:19 [heartbeat] debug: RscMgmtProc 'req_our_resources(ask)'
exited code 0
Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 1
Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:23:59 [heartbeat] debug: process_resources(2):  other now unstable
Mar 10 12:23:59 [heartbeat] info: other_holds_resources: 0
Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:23:59 [heartbeat] debug: process_resources(4):  other now stable
- T_SHUTDONE
Mar 10 12:23:59 [heartbeat] debug: Process [go_standby] started pid 17584
Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 3, standby running(ms): 0, resourcestate: 4
Mar 10 12:23:59 [heartbeat] info: Received shutdown notice from 'ns1'.
Mar 10 12:23:59 [heartbeat] info: Resources being acquired from ns1.
Mar 10 12:23:59 [heartbeat] debug: StartNextRemoteRscReq(): child count 1
Mar 10 12:23:59 [heartbeat] debug: takeover_from_node: other now stable
Mar 10 12:23:59 [heartbeat] debug: Process [req_our_resources(ask)]
started pid 17585
Mar 10 12:23:59 [heartbeat] info: acquire local HA resources (standby).
Mar 10 12:23:59 [heartbeat] info: go_standby: who: 2 resource set: local
Mar 10 12:23:59 [heartbeat] info: go_standby: (query/action):
(ourkeys/takegroup)
Mar 10 12:23:59 [heartbeat] debug:
req_our_resources(/usr/lib/heartbeat/ResourceManager listkeys ns2)
Mar 10 12:23:59 [heartbeat] info: local HA resource acquisition completed
(standby).
Mar 10 12:23:59 [heartbeat] debug: Sending standby [done] msg
Mar 10 12:23:59 [heartbeat] debug: Received standby message done from ns2
in state 3
Mar 10 12:23:59 [heartbeat] info: Standby resource acquisition done [all].
Mar 10 12:23:59 [heartbeat] debug: Sending hold resources msg: all,
stable=1 # <none>
Mar 10 12:23:59 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
reason 'T_RESOURCES(us)' (1))
Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 2, other_is_stable: 1, takeover_in_progress: 1,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:23:59 [heartbeat] info: New standby state: 0
Mar 10 12:23:59 [heartbeat] info: Retransmitting pkt 244
Mar 10 12:23:59 [heartbeat] info: Exiting go_standby process 17584
returned rc 0.
Mar 10 12:23:59 [heartbeat] debug: RscMgmtProc 'go_standby' exited code 0
Mar 10 12:23:59 [heartbeat] debug: StartNextRemoteRscReq(): child count 1
Mar 10 12:23:59 [heartbeat] info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys ns2] to acquire.
Mar 10 12:23:59 [heartbeat] debug: Sending hold resources msg: all,
stable=1 # req_our_resources()
Mar 10 12:23:59 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
reason 'T_RESOURCES(us)' (1))
Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 1,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:23:59 [heartbeat] info: Exiting req_our_resources(ask) process
17585 returned rc 0.
Mar 10 12:23:59 [heartbeat] debug: RscMgmtProc 'req_our_resources(ask)'
exited code 0
Mar 10 12:23:59 [heartbeat] debug: StartNextRemoteRscReq() - calling hook
Mar 10 12:23:59 [heartbeat] debug: notify_world: invoking harc: OLD
status: active
Mar 10 12:23:59 [heartbeat] debug: Process [status] started pid 17600
Mar 10 12:23:59 [heartbeat] debug: Starting notify process [status]
Mar 10 12:23:59 [heartbeat] debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
Mar 10 12:23:59 [heartbeat] debug: notify_world: Running harc status
Mar 10 12:23:59 [heartbeat] WARN: Exiting status process 17600 returned rc 1.
Mar 10 12:23:59 [heartbeat] debug: RscMgmtProc 'status' exited code 1
Mar 10 12:24:10 [heartbeat] debug: Process 17520 processing SIGTERM
Mar 10 12:24:10 [heartbeat] debug: hb_initiate_shutdown() called.
Mar 10 12:24:10 [heartbeat] debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 1,
going_standby: 0, standby running(ms): 0, resourcestate: 4
Mar 10 12:24:10 [heartbeat] WARN: Shutdown delayed until current resource
activity finishes.
Mar 10 12:24:32 [heartbeat] WARN: node ns1: is dead
Mar 10 12:24:32 [heartbeat] info: Link ns1:eth2 dead.
Mar 10 12:26:07 [heartbeat] info: Daily informational memory statistics
Mar 10 12:26:07 [heartbeat] info: MSG stats: 310/1234 ms age 1050
[pid17520/MST_CONTROL]
Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 7530/30624
517142/240839 [pid17520/MST_CONTROL]
Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 521014 total malloc
bytes. pid [17520/MST_CONTROL]
Mar 10 12:26:07 [heartbeat] info: Current arena value: 702104
Mar 10 12:26:07 [heartbeat] info: MSG stats: 0/3 ms age 127860
[pid17555/HBFIFO]
Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 0/50  252/0
[pid17555/HBFIFO]
Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 1468 total malloc
bytes. pid [17555/HBFIFO]
Mar 10 12:26:07 [heartbeat] info: Current arena value: 132760
Mar 10 12:26:07 [heartbeat] info: MSG stats: 0/0 ms age 584370980
[pid17556/HBWRITE]
Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 0/0  0/0
[pid17556/HBWRITE]
Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 0 total malloc bytes.
pid [17556/HBWRITE]
Mar 10 12:26:07 [heartbeat] info: Current arena value: 0
Mar 10 12:26:07 [heartbeat] info: MSG stats: 0/0 ms age 584370980
[pid17557/HBREAD]
Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 0/1128  14/0
[pid17557/HBREAD]
Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 270 total malloc
bytes. pid [17557/HBREAD]
Mar 10 12:26:07 [heartbeat] info: Current arena value: 132760
Mar 10 12:26:07 [heartbeat] info: MSG stats: 0/615 ms age 1050
[pid17558/HBWRITE]
Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 0/13292  588/0
[pid17558/HBWRITE]
Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 3116 total malloc
bytes. pid [17558/HBWRITE]
Mar 10 12:26:07 [heartbeat] info: Current arena value: 132760
Mar 10 12:26:07 [heartbeat] info: MSG stats: 0/302 ms age 1050
[pid17559/HBREAD]
Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 0/6040  252/0
[pid17559/HBREAD]
Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 1468 total malloc
bytes. pid [17559/HBREAD]
Mar 10 12:26:07 [heartbeat] info: Current arena value: 132760
Mar 10 12:26:07 [heartbeat] info: These are nothing to worry about.




More information about the Linux-HA mailing list