[Linux-HA] Newbie's resources not flying

J. B. Schatz jschatz at linux-nexus.com
Wed Mar 16 06:16:42 MST 2005


>> On 2005-03-09T12:19:25, "J. B. Schatz" <jschatz at linux-nexus.com> wrote:
>>
>>> All observations are **greatly** appreciated. TIA.
>>>
>>> ha.cf file:
>>> debugfile /var/heartbeat/ha-debug
>>> logfacility local7
>>> keepalive 2
>>> deadtime 30
>>> warntime 10
>>> initdead 120
>>> udpport 694
>>> #baud 19200
>>> #serial /dev/ttyS0
>>> bcast eth2
>>> ping 191.250.1.1#ruteador
>>> auto_failback off
>>> node ns1
>>> node ns2
>>> respawn cluster /usr/lib/heartbeat/ipfail
>>
>> Well, that one looks well enough.
>>
>>>
>>> haresources file:
>>> ns1 201.225.35.18/28/eth0 drbddisk::r0
>>> Filesystem::/dev/drbd/0::/home::reiserfs samba
>>
>> I assume that in practice they are both on one line?
>>
>> Now, with the IPaddr syntax you used, you might have to specify it as
>> IPaddr::201.225.35.18/28/eth0 (I _think_ the heartbeat heuristic only
>> auto-completes with the IPaddr prefix for pure dot-quad notation, but
>> this might be a RedHerring).
>>
>> I also assume the file is exactly the same on both nodes? ie, "md5sum
>> /etc/ha.d/haresources" reports the same on both sides...? ha.cf is too?
>>
>> The debugfile might have more information. Maybe you can spot what's
>> going on if you set debug 1 in ha.cf.
>>
>>
>> Sincerely,
>>     Lars Marowsky-Brée <lmb at suse.de>
>>
>> --
>> High Availability & Clustering
>> SUSE Labs, Research and Development
>> SUSE LINUX Products GmbH - A Novell Business
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>
> Didn't send to the list; here it is...
> Lars,
>
> Thanks for suggesting to set the debug setting higher, now there is a lot
> more info. Unfortunately my newbie eyes still do not detect anything that
> might indicate the solution; the WARN line is curious but I don't know
> what
> it means. The new, fleshed out, logs follow below. Please anyone see
> anything suspicious in them?
>
> I studied the "Heartbeat IPaddr resource agent" doc after your suggestion
> to
> try adding IPaddr:: to the haresources file. Your guess that it's probably
> a red herring seems correct, because I also ran the test this time with
> haresources revised as shown:
>
> ns1 IPaddr::201.225.35.18/28/eth0 drbddisk::r0
> Filesystem::/dev/drbd/0::/home::reiserfs samba
>
> And, yes, the config is all on one line in haresources (it must be getting
> word-wrapped in the email) and the config files of both nodes have
> matching md5sum results.
>
> log from node1 (ns1):
> Mar 10 12:15:58 [heartbeat] info: **************************
> Mar 10 12:15:58 [heartbeat] info: Configuration validated. Starting
> heartbeat 1.2.3
> Mar 10 12:15:58 [heartbeat] debug: HA configuration OK.  Heartbeat
> starting.
> Mar 10 12:15:58 [heartbeat] info: heartbeat: version 1.2.3
> Mar 10 12:15:58 [heartbeat] info: Heartbeat generation: 19
> Mar 10 12:15:58 [heartbeat] debug: opening bcast eth2 (UDP/IP broadcast)
> Mar 10 12:15:58 [heartbeat] debug: SO_BINDTODEVICE(r) set for device eth2
> Mar 10 12:15:58 [heartbeat] info: UDP Broadcast heartbeat started on port
> 694 (694) interface eth2
> Mar 10 12:15:58 [heartbeat] debug: bcast channel eth2 now open...
> Mar 10 12:15:58 [heartbeat] debug: opening ping 191.250.1.1 (ping
> membership)
> Mar 10 12:15:58 [heartbeat] info: ping heartbeat started.
> Mar 10 12:15:58 [heartbeat] debug: ping channel 191.250.1.1 now open...
> Mar 10 12:15:58 [heartbeat] debug: FIFO process pid: 26629
> Mar 10 12:15:58 [heartbeat] debug: write process pid: 26630
> Mar 10 12:15:58 [heartbeat] debug: read child process pid: 26631
> Mar 10 12:15:58 [heartbeat] debug: write process pid: 26632
> Mar 10 12:15:58 [heartbeat] info: pid 26629 locked in memory.
> Mar 10 12:15:58 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
> milliseconds
> Mar 10 12:15:58 [heartbeat] info: pid 26630 locked in memory.
> Mar 10 12:15:58 [heartbeat] debug: Limiting CPU: 24 CPU seconds every
> 60000 milliseconds
> Mar 10 12:15:58 [heartbeat] info: pid 26631 locked in memory.
> Mar 10 12:15:58 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
> milliseconds
> Mar 10 12:15:58 [heartbeat] info: pid 26632 locked in memory.
> Mar 10 12:15:58 [heartbeat] debug: Limiting CPU: 24 CPU seconds every
> 60000 milliseconds
> Mar 10 12:15:58 [heartbeat] debug: read child process pid: 26633
> Mar 10 12:15:58 [heartbeat] debug: Limiting CPU: 30 CPU seconds every
> 60000 milliseconds
> Mar 10 12:15:58 [heartbeat] info: pid 26610 locked in memory.
> Mar 10 12:15:58 [heartbeat] debug: Waiting for child processes to start
> Mar 10 12:15:58 [heartbeat] info: Local status now set to: 'up'
> Mar 10 12:15:58 [heartbeat] debug: All your child process are belong to us
> Mar 10 12:15:58 [heartbeat] debug: Starting local status message @ 2000 ms
> intervals
> Mar 10 12:15:58 [heartbeat] info: Link ns1:eth2 up.
> Mar 10 12:15:58 [heartbeat] debug: CreateInitialFilter: ip-request
> Mar 10 12:15:58 [heartbeat] debug: CreateInitialFilter: ip-request-resp
> Mar 10 12:15:58 [heartbeat] debug: CreateInitialFilter: status
> Mar 10 12:15:58 [heartbeat] debug: CreateInitialFilter: ask_resources
> Mar 10 12:15:58 [heartbeat] debug: CreateInitialFilter: hb_takeover
> Mar 10 12:15:59 [heartbeat] info: pid 26633 locked in memory.
> Mar 10 12:15:59 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
> milliseconds
> Mar 10 12:15:59 [heartbeat] info: Link 191.250.1.1:191.250.1.1 up.
> Mar 10 12:15:59 [heartbeat] info: Status update for node 191.250.1.1:
> status ping
> Mar 10 12:15:59 [heartbeat] debug: Status seqno: 0 msgtime: 1110474958
> Mar 10 12:16:07 [heartbeat] info: Link ns2:eth2 up.
> Mar 10 12:16:07 [heartbeat] info: Status update for node ns2: status up
> Mar 10 12:16:07 [heartbeat] debug: Status seqno: 1 msgtime: 1110474967
> Mar 10 12:16:07 [heartbeat] debug: StartNextRemoteRscReq() - calling hook
> Mar 10 12:16:07 [heartbeat] debug: notify_world: invoking harc: OLD
> status: up
> Mar 10 12:16:07 [heartbeat] debug: Process [status] started pid 26650
> Mar 10 12:16:07 [heartbeat] debug: Starting notify process [status]
> Mar 10 12:16:07 [heartbeat] debug: Comm_now_up(): updating status to
> active
> Mar 10 12:16:07 [heartbeat] info: Local status now set to: 'active'
> Mar 10 12:16:07 [heartbeat] debug: Sending local starting msg:
> resourcestate = 0
> Mar 10 12:16:07 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 0
> Mar 10 12:16:07 [heartbeat] info: Starting child client
> "/usr/lib/heartbeat/ipfail" (65,65)
> Mar 10 12:16:07 [heartbeat] debug: notify_world: setting SIGCHLD Handler
> to SIG_DFL
> Mar 10 12:16:07 [heartbeat] debug: notify_world: Running harc status
> Mar 10 12:16:07 [heartbeat] info: Starting "/usr/lib/heartbeat/ipfail" as
> uid 65  gid 65 (pid 26651)
> Mar 10 12:16:07 [heartbeat] WARN: Exiting status process 26650 returned rc
> 1.
> Mar 10 12:16:07 [heartbeat] debug: RscMgmtProc 'status' exited code 1
> Mar 10 12:16:07 [heartbeat] debug: APIregistration_dispatch() {
> Mar 10 12:16:07 [heartbeat] debug: process_registerevent() {
> Mar 10 12:16:07 [heartbeat] debug: client->gsource = 0x809a720
> Mar 10 12:16:07 [heartbeat] debug: }/*process_registerevent*/;
> Mar 10 12:16:07 [heartbeat] debug: }/*APIregistration_dispatch*/;
> Mar 10 12:16:07 [heartbeat] debug: Checking client authorization for
> client ipfail (65:65)
> Mar 10 12:16:07 [heartbeat] debug: Signing on API client 26651 (ipfail)
> Mar 10 12:16:07 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 1
> Mar 10 12:16:08 [heartbeat] info: Status update for node ns2: status
> active
> Mar 10 12:16:08 [heartbeat] debug: Status seqno: 3 msgtime: 1110474968
> Mar 10 12:16:08 [heartbeat] debug: StartNextRemoteRscReq() - calling hook
> Mar 10 12:16:08 [heartbeat] debug: notify_world: invoking harc: OLD
> status: active
> Mar 10 12:16:08 [heartbeat] debug: Process [status] started pid 26652
> Mar 10 12:16:08 [heartbeat] debug: Starting notify process [status]
> Mar 10 12:16:08 [heartbeat] info: AnnounceTakeover(local 0, foreign 1,
> reason 'HB_R_BOTHSTARTING' (0))
> Mar 10 12:16:08 [heartbeat] debug: process_resources: other now unstable
> Mar 10 12:16:08 [heartbeat] debug: Sending hold resources msg: none,
> stable=0 # <none>
> Mar 10 12:16:08 [heartbeat] info: AnnounceTakeover(local 0, foreign 1,
> reason 'T_RESOURCES' (0))
> Mar 10 12:16:08 [heartbeat] info: STATE 1 => 3
> Mar 10 12:16:08 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 3
> Mar 10 12:16:08 [heartbeat] info: STATE 3 => 2
> Mar 10 12:16:08 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 2
> Mar 10 12:16:08 [heartbeat] debug: notify_world: setting SIGCHLD Handler
> to SIG_DFL
> Mar 10 12:16:08 [heartbeat] debug: notify_world: Running harc status
> Mar 10 12:16:08 [heartbeat] WARN: Exiting status process 26652 returned rc
> 1.
> Mar 10 12:16:08 [heartbeat] debug: RscMgmtProc 'status' exited code 1
> Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 0, foreign 1,
> reason 'T_RESOURCES' (0))
> Mar 10 12:16:19 [heartbeat] info: remote resource transition completed.
> Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: none,
> stable=0 # <none>
> Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 0, foreign 1,
> reason 'T_RESOURCES' (0))
> Mar 10 12:16:19 [heartbeat] info: STATE 2 => 3
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 3
> Mar 10 12:16:19 [heartbeat] debug: Calling PerformAutoFailback()
> Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 1
> Mar 10 12:16:19 [heartbeat] info: remote resource transition completed.
> Mar 10 12:16:19 [heartbeat] debug: Process [req_our_resources(ask)]
> started pid 26653
> Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: local,
> stable=1 # <none>
> Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
> reason 'T_RESOURCES(us)' (0))
> Mar 10 12:16:19 [heartbeat] info: Initial resource acquisition complete
> (T_RESOURCES(us))
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 3
> Mar 10 12:16:19 [heartbeat] debug: Calling PerformAutoFailback()
> Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
> reason 'T_RESOURCES(them)' (1))
> Mar 10 12:16:19 [heartbeat] info: STATE 3 => 4
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 1
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:16:19 [heartbeat] debug:
> req_our_resources(/usr/lib/heartbeat/ResourceManager listkeys ns1)
> Mar 10 12:16:19 [heartbeat] debug: req_our_resources(): running
> [/usr/lib/heartbeat/req_resource IPaddr::201.225.35.18/28/eth0]
> Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 1
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:16:19 [heartbeat] debug: StartNextRemoteRscReq(): child count 1
> Mar 10 12:16:19 [heartbeat] info: 1 local resources from
> [/usr/lib/heartbeat/ResourceManager listkeys ns1]
> Mar 10 12:16:19 [heartbeat] info: Local Resource acquisition completed.
> Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: local,
> stable=1 # req_our_resources()
> Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
> reason 'T_RESOURCES(us)' (1))
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:16:19 [heartbeat] info: Exiting req_our_resources(ask) process
> 26653 returned rc 0.
> Mar 10 12:16:19 [heartbeat] debug: RscMgmtProc 'req_our_resources(ask)'
> exited code 0
> Mar 10 12:16:19 [heartbeat] debug: StartNextRemoteRscReq() - calling hook
> Mar 10 12:16:19 [heartbeat] debug: notify_world: invoking harc: OLD
> status: active
> Mar 10 12:16:19 [heartbeat] debug: Process [ip-request-resp] started pid
> 26685
> Mar 10 12:16:19 [heartbeat] debug: Starting notify process
> [ip-request-resp]
> Mar 10 12:16:19 [heartbeat] debug: notify_world: setting SIGCHLD Handler
> to SIG_DFL
> Mar 10 12:16:19 [heartbeat] debug: notify_world: Running harc
> ip-request-resp
> Mar 10 12:16:19 [heartbeat] WARN: Exiting ip-request-resp process 26685
> returned rc 1.
> Mar 10 12:16:19 [heartbeat] debug: RscMgmtProc 'ip-request-resp' exited
> code 1
> Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
> reason 'ip-request-resp' (1))
> Mar 10 12:23:59 [heartbeat] debug: Process 26610 processing SIGTERM
> Mar 10 12:23:59 [heartbeat] debug: hb_initiate_shutdown() called.
> Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:23:59 [heartbeat] info: hb_giveup_resources(): current status:
> active
> Mar 10 12:23:59 [heartbeat] debug: Sending hold resources msg: none,
> stable=0 # shutdown
> Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 5
> Mar 10 12:23:59 [heartbeat] info: Heartbeat shutdown in progress. (26610)
> Mar 10 12:23:59 [heartbeat] debug: Process [hb_giveup_resources] started
> pid 26760
> Mar 10 12:23:59 [heartbeat] info: Giving up all HA resources.
> Mar 10 12:23:59 [heartbeat] ERROR: /usr/lib/heartbeat/ResourceManager
> givegroup IPaddr::201.225.35.18/28/eth0 returned 256
> Mar 10 12:23:59 [heartbeat] info: killing /usr/lib/heartbeat/ipfail
> process group 26651 with signal 15
> Mar 10 12:23:59 [heartbeat] info: All HA resources relinquished.
> Mar 10 12:23:59 [heartbeat] debug: Sending T_SHUTDONE.
> Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 5
> Mar 10 12:23:59 [heartbeat] debug: Received T_SHUTDONE from us.
> Mar 10 12:23:59 [heartbeat] debug: Calling hb_mcp_final_shutdown in a
> second.
> Mar 10 12:23:59 [heartbeat] debug: RscMgmtProc 'hb_giveup_resources'
> exited code 0
> Mar 10 12:23:59 [heartbeat] debug: hb_mcp_final_shutdown() phase 0
> Mar 10 12:23:59 [heartbeat] info: killing /usr/lib/heartbeat/ipfail
> process group 26651 with signal 15
> Mar 10 12:23:59 [heartbeat] WARN: 1 lost packet(s) for [ns2] [243:245]
> Mar 10 12:23:59 [heartbeat] info: other_holds_resources: 3
> Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 5
> Mar 10 12:23:59 [heartbeat] info: No pkts missing from ns2!
> Mar 10 12:23:59 [heartbeat] debug: standby message ignored during shutdown
> Mar 10 12:23:59 [heartbeat] info: other_holds_resources: 3
> Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 5
> Mar 10 12:24:00 [heartbeat] debug: Signing client 26651 off
> Mar 10 12:24:00 [heartbeat] debug: G_remove_client(pid=26651,
> reason='signoff' gsource=0x809a720) {
> Mar 10 12:24:00 [heartbeat] debug: api_remove_client_int: removing pid
> [26651] reason: signoff
> Mar 10 12:24:00 [heartbeat] debug: }/*G_remove_client;*/
> Mar 10 12:24:00 [heartbeat] debug: hb_mcp_final_shutdown() phase 1
> Mar 10 12:24:00 [heartbeat] info: killing HBREAD process 26631 with signal
> 15
> Mar 10 12:24:00 [heartbeat] info: killing HBWRITE process 26632 with
> signal 15
> Mar 10 12:24:00 [heartbeat] info: killing HBREAD process 26633 with signal
> 15
> Mar 10 12:24:00 [heartbeat] info: killing HBFIFO process 26629 with signal
> 15
> Mar 10 12:24:00 [heartbeat] info: killing HBWRITE process 26630 with
> signal 15
> Mar 10 12:24:00 [heartbeat] debug: Process 26629 processing SIGTERM
> Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26629 [rc=15]
> Mar 10 12:24:00 [heartbeat] info: Core process 26629 exited. 5 remaining
> Mar 10 12:24:00 [heartbeat] debug: Process 26630 processing SIGTERM
> Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26630 [rc=15]
> Mar 10 12:24:00 [heartbeat] info: Core process 26630 exited. 4 remaining
> Mar 10 12:24:00 [heartbeat] debug: Process 26631 processing SIGTERM
> Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26631 [rc=15]
> Mar 10 12:24:00 [heartbeat] info: Core process 26631 exited. 3 remaining
> Mar 10 12:24:00 [heartbeat] debug: Process 26632 processing SIGTERM
> Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26632 [rc=15]
> Mar 10 12:24:00 [heartbeat] info: Core process 26632 exited. 2 remaining
> Mar 10 12:24:00 [heartbeat] debug: Process 26633 processing SIGTERM
> Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26633 [rc=15]
> Mar 10 12:24:00 [heartbeat] info: Core process 26633 exited. 1 remaining
> Mar 10 12:24:00 [heartbeat] info: Heartbeat shutdown complete.
> Mar 10 12:24:00 [heartbeat] debug: Exiting from pid 26610 [rc=0]
>
> log from node 2 (ns2):
> Mar 10 12:16:06 [heartbeat] info: **************************
> Mar 10 12:16:06 [heartbeat] info: Configuration validated. Starting
> heartbeat 1.2.3
> Mar 10 12:16:06 [heartbeat] debug: HA configuration OK.  Heartbeat
> starting.
> Mar 10 12:16:06 [heartbeat] info: heartbeat: version 1.2.3
> Mar 10 12:16:07 [heartbeat] info: Heartbeat generation: 10
> Mar 10 12:16:07 [heartbeat] debug: opening bcast eth2 (UDP/IP broadcast)
> Mar 10 12:16:07 [heartbeat] debug: SO_BINDTODEVICE(r) set for device eth2
> Mar 10 12:16:07 [heartbeat] info: UDP Broadcast heartbeat started on port
> 694 (694) interface eth2
> Mar 10 12:16:07 [heartbeat] debug: bcast channel eth2 now open...
> Mar 10 12:16:07 [heartbeat] debug: opening ping 191.250.1.1 (ping
> membership)
> Mar 10 12:16:07 [heartbeat] info: ping heartbeat started.
> Mar 10 12:16:07 [heartbeat] debug: ping channel 191.250.1.1 now open...
> Mar 10 12:16:07 [heartbeat] debug: FIFO process pid: 17555
> Mar 10 12:16:07 [heartbeat] debug: write process pid: 17556
> Mar 10 12:16:07 [heartbeat] debug: read child process pid: 17557
> Mar 10 12:16:07 [heartbeat] debug: write process pid: 17558
> Mar 10 12:16:07 [heartbeat] debug: read child process pid: 17559
> Mar 10 12:16:07 [heartbeat] debug: Limiting CPU: 30 CPU seconds every
> 60000 milliseconds
> Mar 10 12:16:07 [heartbeat] info: pid 17555 locked in memory.
> Mar 10 12:16:07 [heartbeat] info: pid 17556 locked in memory.
> Mar 10 12:16:07 [heartbeat] info: pid 17557 locked in memory.
> Mar 10 12:16:07 [heartbeat] info: pid 17558 locked in memory.
> Mar 10 12:16:07 [heartbeat] info: pid 17520 locked in memory.
> Mar 10 12:16:07 [heartbeat] debug: Waiting for child processes to start
> Mar 10 12:16:07 [heartbeat] info: Local status now set to: 'up'
> Mar 10 12:16:07 [heartbeat] debug: All your child process are belong to us
> Mar 10 12:16:07 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
> milliseconds
> Mar 10 12:16:07 [heartbeat] debug: Limiting CPU: 24 CPU seconds every
> 60000 milliseconds
> Mar 10 12:16:07 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
> milliseconds
> Mar 10 12:16:07 [heartbeat] debug: Limiting CPU: 24 CPU seconds every
> 60000 milliseconds
> Mar 10 12:16:07 [heartbeat] debug: Starting local status message @ 2000 ms
> intervals
> Mar 10 12:16:07 [heartbeat] info: Link ns2:eth2 up.
> Mar 10 12:16:07 [heartbeat] debug: CreateInitialFilter: ip-request
> Mar 10 12:16:07 [heartbeat] debug: CreateInitialFilter: ip-request-resp
> Mar 10 12:16:07 [heartbeat] debug: CreateInitialFilter: status
> Mar 10 12:16:07 [heartbeat] debug: CreateInitialFilter: ask_resources
> Mar 10 12:16:07 [heartbeat] debug: CreateInitialFilter: hb_takeover
> Mar 10 12:16:07 [heartbeat] info: Link ns1:eth2 up.
> Mar 10 12:16:07 [heartbeat] info: Status update for node ns1: status
> active
> Mar 10 12:16:07 [heartbeat] debug: Status seqno: 7 msgtime: 1110474967
> Mar 10 12:16:07 [heartbeat] debug: StartNextRemoteRscReq() - calling hook
> Mar 10 12:16:07 [heartbeat] debug: notify_world: invoking harc: OLD
> status: up
> Mar 10 12:16:07 [heartbeat] debug: Process [status] started pid 17560
> Mar 10 12:16:07 [heartbeat] debug: Starting notify process [status]
> Mar 10 12:16:07 [heartbeat] debug: notify_world: setting SIGCHLD Handler
> to SIG_DFL
> Mar 10 12:16:07 [heartbeat] debug: notify_world: Running harc status
> Mar 10 12:16:07 [heartbeat] WARN: Exiting status process 17560 returned rc
> 1.
> Mar 10 12:16:07 [heartbeat] debug: RscMgmtProc 'status' exited code 1
> Mar 10 12:16:08 [heartbeat] info: pid 17559 locked in memory.
> Mar 10 12:16:08 [heartbeat] debug: Limiting CPU: 6 CPU seconds every 60000
> milliseconds
> Mar 10 12:16:08 [heartbeat] info: Link 191.250.1.1:191.250.1.1 up.
> Mar 10 12:16:08 [heartbeat] info: Status update for node 191.250.1.1:
> status ping
> Mar 10 12:16:08 [heartbeat] debug: Status seqno: 0 msgtime: 1110474967
> Mar 10 12:16:08 [heartbeat] debug: Comm_now_up(): updating status to
> active
> Mar 10 12:16:08 [heartbeat] info: Local status now set to: 'active'
> Mar 10 12:16:08 [heartbeat] debug: Sending local starting msg:
> resourcestate = 0
> Mar 10 12:16:08 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 0
> Mar 10 12:16:08 [heartbeat] info: Starting child client
> "/usr/lib/heartbeat/ipfail" (65,65)
> Mar 10 12:16:08 [heartbeat] info: AnnounceTakeover(local 0, foreign 1,
> reason 'T_RESOURCES' (0))
> Mar 10 12:16:08 [heartbeat] debug: process_resources(2):  other now
> unstable
> Mar 10 12:16:08 [heartbeat] info: other_holds_resources: 0
> Mar 10 12:16:08 [heartbeat] info: STATE 1 => 3
> Mar 10 12:16:08 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 3
> Mar 10 12:16:08 [heartbeat] info: Starting "/usr/lib/heartbeat/ipfail" as
> uid 65  gid 65 (pid 17561)
> Mar 10 12:16:08 [heartbeat] debug: APIregistration_dispatch() {
> Mar 10 12:16:08 [heartbeat] debug: process_registerevent() {
> Mar 10 12:16:08 [heartbeat] debug: client->gsource = 0x809aaf0
> Mar 10 12:16:08 [heartbeat] debug: }/*process_registerevent*/;
> Mar 10 12:16:08 [heartbeat] debug: }/*APIregistration_dispatch*/;
> Mar 10 12:16:08 [heartbeat] debug: Checking client authorization for
> client ipfail (65:65)
> Mar 10 12:16:08 [heartbeat] debug: Signing on API client 17561 (ipfail)
> Mar 10 12:16:08 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 3
> Mar 10 12:16:19 [heartbeat] debug: Process [req_our_resources(ask)]
> started pid 17562
> Mar 10 12:16:19 [heartbeat] info: local resource transition completed.
> Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: local,
> stable=1 # <none>
> Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
> reason 'T_RESOURCES(us)' (0))
> Mar 10 12:16:19 [heartbeat] info: Initial resource acquisition complete
> (T_RESOURCES(us))
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
>                 - Last output repeated twice -
> Mar 10 12:16:19 [heartbeat] debug: process_resources(2):  other now
> unstable
> Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 0
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:16:19 [heartbeat] info: remote resource transition completed.
> Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: local,
> stable=1 # <none>
> Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
> reason 'T_RESOURCES(us)' (1))
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:16:19 [heartbeat] debug: Calling PerformAutoFailback()
> Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 1
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:16:19 [heartbeat] debug:
> req_our_resources(/usr/lib/heartbeat/ResourceManager listkeys ns2)
> Mar 10 12:16:19 [heartbeat] info: No local resources
> [/usr/lib/heartbeat/ResourceManager listkeys ns2] to acquire.
> Mar 10 12:16:19 [heartbeat] debug: Sending hold resources msg: local,
> stable=1 # req_our_resources()
> Mar 10 12:16:19 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
> reason 'T_RESOURCES(us)' (1))
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:16:19 [heartbeat] info: Exiting req_our_resources(ask) process
> 17562 returned rc 0.
> Mar 10 12:16:19 [heartbeat] debug: RscMgmtProc 'req_our_resources(ask)'
> exited code 0
> Mar 10 12:16:19 [heartbeat] info: other_holds_resources: 1
> Mar 10 12:16:19 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:23:59 [heartbeat] debug: process_resources(2):  other now
> unstable
> Mar 10 12:23:59 [heartbeat] info: other_holds_resources: 0
> Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:23:59 [heartbeat] debug: process_resources(4):  other now stable
> - T_SHUTDONE
> Mar 10 12:23:59 [heartbeat] debug: Process [go_standby] started pid 17584
> Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
> going_standby: 3, standby running(ms): 0, resourcestate: 4
> Mar 10 12:23:59 [heartbeat] info: Received shutdown notice from 'ns1'.
> Mar 10 12:23:59 [heartbeat] info: Resources being acquired from ns1.
> Mar 10 12:23:59 [heartbeat] debug: StartNextRemoteRscReq(): child count 1
> Mar 10 12:23:59 [heartbeat] debug: takeover_from_node: other now stable
> Mar 10 12:23:59 [heartbeat] debug: Process [req_our_resources(ask)]
> started pid 17585
> Mar 10 12:23:59 [heartbeat] info: acquire local HA resources (standby).
> Mar 10 12:23:59 [heartbeat] info: go_standby: who: 2 resource set: local
> Mar 10 12:23:59 [heartbeat] info: go_standby: (query/action):
> (ourkeys/takegroup)
> Mar 10 12:23:59 [heartbeat] debug:
> req_our_resources(/usr/lib/heartbeat/ResourceManager listkeys ns2)
> Mar 10 12:23:59 [heartbeat] info: local HA resource acquisition completed
> (standby).
> Mar 10 12:23:59 [heartbeat] debug: Sending standby [done] msg
> Mar 10 12:23:59 [heartbeat] debug: Received standby message done from ns2
> in state 3
> Mar 10 12:23:59 [heartbeat] info: Standby resource acquisition done [all].
> Mar 10 12:23:59 [heartbeat] debug: Sending hold resources msg: all,
> stable=1 # <none>
> Mar 10 12:23:59 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
> reason 'T_RESOURCES(us)' (1))
> Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 2, other_is_stable: 1, takeover_in_progress: 1,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:23:59 [heartbeat] info: New standby state: 0
> Mar 10 12:23:59 [heartbeat] info: Retransmitting pkt 244
> Mar 10 12:23:59 [heartbeat] info: Exiting go_standby process 17584
> returned rc 0.
> Mar 10 12:23:59 [heartbeat] debug: RscMgmtProc 'go_standby' exited code 0
> Mar 10 12:23:59 [heartbeat] debug: StartNextRemoteRscReq(): child count 1
> Mar 10 12:23:59 [heartbeat] info: No local resources
> [/usr/lib/heartbeat/ResourceManager listkeys ns2] to acquire.
> Mar 10 12:23:59 [heartbeat] debug: Sending hold resources msg: all,
> stable=1 # req_our_resources()
> Mar 10 12:23:59 [heartbeat] info: AnnounceTakeover(local 1, foreign 1,
> reason 'T_RESOURCES(us)' (1))
> Mar 10 12:23:59 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 1,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:23:59 [heartbeat] info: Exiting req_our_resources(ask) process
> 17585 returned rc 0.
> Mar 10 12:23:59 [heartbeat] debug: RscMgmtProc 'req_our_resources(ask)'
> exited code 0
> Mar 10 12:23:59 [heartbeat] debug: StartNextRemoteRscReq() - calling hook
> Mar 10 12:23:59 [heartbeat] debug: notify_world: invoking harc: OLD
> status: active
> Mar 10 12:23:59 [heartbeat] debug: Process [status] started pid 17600
> Mar 10 12:23:59 [heartbeat] debug: Starting notify process [status]
> Mar 10 12:23:59 [heartbeat] debug: notify_world: setting SIGCHLD Handler
> to SIG_DFL
> Mar 10 12:23:59 [heartbeat] debug: notify_world: Running harc status
> Mar 10 12:23:59 [heartbeat] WARN: Exiting status process 17600 returned rc
> 1.
> Mar 10 12:23:59 [heartbeat] debug: RscMgmtProc 'status' exited code 1
> Mar 10 12:24:10 [heartbeat] debug: Process 17520 processing SIGTERM
> Mar 10 12:24:10 [heartbeat] debug: hb_initiate_shutdown() called.
> Mar 10 12:24:10 [heartbeat] debug: hb_rsc_isstable:
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 1,
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> Mar 10 12:24:10 [heartbeat] WARN: Shutdown delayed until current resource
> activity finishes.
> Mar 10 12:24:32 [heartbeat] WARN: node ns1: is dead
> Mar 10 12:24:32 [heartbeat] info: Link ns1:eth2 dead.
> Mar 10 12:26:07 [heartbeat] info: Daily informational memory statistics
> Mar 10 12:26:07 [heartbeat] info: MSG stats: 310/1234 ms age 1050
> [pid17520/MST_CONTROL]
> Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 7530/30624
> 517142/240839 [pid17520/MST_CONTROL]
> Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 521014 total malloc
> bytes. pid [17520/MST_CONTROL]
> Mar 10 12:26:07 [heartbeat] info: Current arena value: 702104
> Mar 10 12:26:07 [heartbeat] info: MSG stats: 0/3 ms age 127860
> [pid17555/HBFIFO]
> Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 0/50  252/0
> [pid17555/HBFIFO]
> Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 1468 total malloc
> bytes. pid [17555/HBFIFO]
> Mar 10 12:26:07 [heartbeat] info: Current arena value: 132760
> Mar 10 12:26:07 [heartbeat] info: MSG stats: 0/0 ms age 584370980
> [pid17556/HBWRITE]
> Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 0/0  0/0
> [pid17556/HBWRITE]
> Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 0 total malloc bytes.
> pid [17556/HBWRITE]
> Mar 10 12:26:07 [heartbeat] info: Current arena value: 0
> Mar 10 12:26:07 [heartbeat] info: MSG stats: 0/0 ms age 584370980
> [pid17557/HBREAD]
> Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 0/1128  14/0
> [pid17557/HBREAD]
> Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 270 total malloc
> bytes. pid [17557/HBREAD]
> Mar 10 12:26:07 [heartbeat] info: Current arena value: 132760
> Mar 10 12:26:07 [heartbeat] info: MSG stats: 0/615 ms age 1050
> [pid17558/HBWRITE]
> Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 0/13292  588/0
> [pid17558/HBWRITE]
> Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 3116 total malloc
> bytes. pid [17558/HBWRITE]
> Mar 10 12:26:07 [heartbeat] info: Current arena value: 132760
> Mar 10 12:26:07 [heartbeat] info: MSG stats: 0/302 ms age 1050
> [pid17559/HBREAD]
> Mar 10 12:26:07 [heartbeat] info: ha_malloc stats: 0/6040  252/0
> [pid17559/HBREAD]
> Mar 10 12:26:07 [heartbeat] info: RealMalloc stats: 1468 total malloc
> bytes. pid [17559/HBREAD]
> Mar 10 12:26:07 [heartbeat] info: Current arena value: 132760
> Mar 10 12:26:07 [heartbeat] info: These are nothing to worry about.
>

Hello all,

Perhaps everyone feels as mystified as I do?

If the log files seem not to contain a clue, then can anyone please
suggest a different way to troubleshoot the problem I'm having with my
haresources not flying?

Running DRBD alone works, as I've mentioned. But one curious thing I've
noticed is that running the init script "/etc/init.d/drbd start" (on this
Gentoo Linux-based system) starts each disk in the secondary/secondary
state. That is, it's necessary to subsequently run "drbdsetup /dev/drbd/0
primary" because the init script does not ask which disk should be
primary. As I understand it, this is abnormal, correct?

Notwithstanding the DRBD situation, the IP address comes first in the
haresources config and this resource is not being activated, so I'm not
sure whether DRBD is really the issue. At least not yet.

Again, thanks for any and all observations.



More information about the Linux-HA mailing list