[Linux-HA] ResourceManager NOT start haresources
=?gb2312?B?0LvB1r2t?=
linjiangxie at 126.com
Tue Mar 6 05:05:16 MST 2007
Hi, all
I got a trouble when I installed heartbeat 2.0.4 on redhat8.0 (2.4.18-14 i386) and heartbeat 2.0.8 on fedora core4 (2.6.11-1.1369smp x86_64). I tried both tar.gz and rpm installation. There's no any error in install process.
The problem is as following:
A node acquired resources, and then it should start these resources left-to-right. But I got an unexpected result, ResourceManager just start some of them. In other words, some resources will not be started by ResourceManager. Sometimes, heartbeat state is keeping as "Running /etc/ha.d/resource.d/IPaddr 10.10.21.11 start". All of these results are got from "/var/log/ha-log" "/var/log/ha-debug" and "ps ax". But when I stop heartbeat service, heartbeat stops these resources right-to-left, including stop the un-started resources. It means Stop OK but Start FAILD.
I have two nodes:
SN01 IP=10.10.21.83 OS=FedoraCore4 2.6.11-1.1369smp x86_64
SN02 IP=10.10.21.164 OS=FedoraCore4 2.6.11-1.1369smp x86_64
There are node SN02's configuration files and ha-debug in attachments.
It exists either heartbeat 2.0.4 on redhat8.0 or heartbeat 2.0.8 on fedora core4.
Maybe there are some configuration errors or missing some files. Welcome suggestion.
Thanks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ha.cf
Type: application/octet-stream
Size: 184 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20070306/bae4e47b/ha.obj
-------------- next part --------------
heartbeat[2267]: 2007/03/06_15:00:13 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[2267]: 2007/03/06_15:00:13 info: **************************
heartbeat[2267]: 2007/03/06_15:00:13 info: Configuration validated. Starting heartbeat 2.0.8
heartbeat[2268]: 2007/03/06_15:00:13 info: heartbeat: version 2.0.8
heartbeat[2268]: 2007/03/06_15:00:14 info: Heartbeat generation: 2
heartbeat[2268]: 2007/03/06_15:00:14 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[2268]: 2007/03/06_15:00:14 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[2268]: 2007/03/06_15:00:14 info: Removing /var/run/heartbeat/rsctmp failed, recreating.
heartbeat[2268]: 2007/03/06_15:00:14 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
heartbeat[2268]: 2007/03/06_15:00:14 info: glib: ucast: bound send socket to device: eth0
heartbeat[2268]: 2007/03/06_15:00:14 info: glib: ucast: bound receive socket to device: eth0
heartbeat[2268]: 2007/03/06_15:00:14 info: glib: ucast: started on port 694 interface eth0 to 10.10.21.83
heartbeat[2268]: 2007/03/06_15:00:14 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[2268]: 2007/03/06_15:00:14 info: Local status now set to: 'up'
heartbeat[2268]: 2007/03/06_15:00:16 info: Link sn01:eth0 up.
heartbeat[2268]: 2007/03/06_15:00:16 info: Status update for node sn01: status active
heartbeat[2275]: 2007/03/06_15:00:16 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[2275]: 2007/03/06_15:00:16 info: Running /etc/ha.d/rc.d/status status
heartbeat[2268]: 2007/03/06_15:00:16 info: Comm_now_up(): updating status to active
heartbeat[2268]: 2007/03/06_15:00:16 info: Local status now set to: 'active'
heartbeat[2268]: 2007/03/06_15:00:17 info: remote resource transition completed.
heartbeat[2268]: 2007/03/06_15:00:17 info: remote resource transition completed.
heartbeat[2268]: 2007/03/06_15:00:17 info: Local Resource acquisition completed. (none)
heartbeat[2268]: 2007/03/06_15:00:17 info: sn01 wants to go standby [foreign]
heartbeat[2268]: 2007/03/06_15:00:18 info: standby: acquire [foreign] resources from sn01
heartbeat[2286]: 2007/03/06_15:00:18 info: acquire local HA resources (standby).
heartbeat[2286]: 2007/03/06_15:00:18 info: local HA resource acquisition completed (standby).
heartbeat[2268]: 2007/03/06_15:00:18 info: Standby resource acquisition done [foreign].
heartbeat[2268]: 2007/03/06_15:00:18 info: Initial resource acquisition complete (auto_failback)
heartbeat[2268]: 2007/03/06_15:00:18 info: remote resource transition completed.
heartbeat[2268]: 2007/03/06_15:06:54 info: Received shutdown notice from 'sn01'.
heartbeat[2268]: 2007/03/06_15:06:54 info: Resources being acquired from sn01.
heartbeat[2268]: 2007/03/06_15:06:54 debug: StartNextRemoteRscReq(): child count 1
heartbeat[2327]: 2007/03/06_15:06:54 info: acquire local HA resources (standby).
heartbeat[2327]: 2007/03/06_15:06:54 info: local HA resource acquisition completed (standby).
heartbeat[2268]: 2007/03/06_15:06:54 info: Standby resource acquisition done [foreign].
heartbeat[2268]: 2007/03/06_15:06:54 debug: StartNextRemoteRscReq(): child count 1
heartbeat[2328]: 2007/03/06_15:06:54 info: No local resources [/usr/lib64/heartbeat/ResourceManager listkeys sn02] to acquire.
heartbeat[2347]: 2007/03/06_15:06:54 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[2347]: 2007/03/06_15:06:54 info: Running /etc/ha.d/rc.d/status status
mach_down[2357]: 2007/03/06_15:06:54 info: Taking over resource group 10.10.21.154
ResourceManager[2377]: 2007/03/06_15:06:54 info: Acquiring resource group: sn01 10.10.21.154 bwfs bwfs-monitord gmond bwfssn
IPaddr[2401]: 2007/03/06_15:06:54 INFO: Resource is stopped
ResourceManager[2377]: 2007/03/06_15:06:54 info: Running /etc/ha.d/resource.d/IPaddr 10.10.21.154 start
ResourceManager[2377]: 2007/03/06_15:06:54 debug: Starting /etc/ha.d/resource.d/IPaddr 10.10.21.154 start
IPaddr[2455]: 2007/03/06_15:06:54 INFO: Using calculated nic for 10.10.21.154: eth0
IPaddr[2455]: 2007/03/06_15:06:54 DEBUG: Using calculated netmask for 10.10.21.154: 255.255.255.0
IPaddr[2455]: 2007/03/06_15:06:54 DEBUG: Using calculated broadcast for 10.10.21.154: 10.10.21.255
IPaddr[2455]: 2007/03/06_15:06:54 INFO: eval /sbin/ifconfig eth0:0 10.10.21.154 netmask 255.255.255.0 broadcast 10.10.21.255
IPaddr[2455]: 2007/03/06_15:06:54 DEBUG: Sending Gratuitous Arp for 10.10.21.154 on eth0:0 [eth0]
IPaddr[2446]: 2007/03/06_15:06:54 INFO: Success
INFO: Success
ResourceManager[2377]: 2007/03/06_15:06:54 debug: /etc/ha.d/resource.d/IPaddr 10.10.21.154 start done. RC=0
ResourceManager[2377]: 2007/03/06_15:06:54 info: Running /etc/init.d/bwfs start
ResourceManager[2377]: 2007/03/06_15:06:54 debug: Starting /etc/init.d/bwfs start
Starting BWFS node register: [ OK ]
ResourceManager[2377]: 2007/03/06_15:06:55 debug: /etc/init.d/bwfs start done. RC=0
ResourceManager[2377]: 2007/03/06_15:06:55 info: Running /etc/init.d/bwfs-monitord start
ResourceManager[2377]: 2007/03/06_15:06:55 debug: Starting /etc/init.d/bwfs-monitord start
Starting bwfs-monitord: [ OK ]
ResourceManager[2377]: 2007/03/06_15:06:56 debug: /etc/init.d/bwfs-monitord start done. RC=0
ResourceManager[2377]: 2007/03/06_15:06:56 info: Running /etc/init.d/gmond start
ResourceManager[2377]: 2007/03/06_15:06:56 debug: Starting /etc/init.d/gmond start
Starting GANGLIA gmond: [ OK ]
ResourceManager[2377]: 2007/03/06_15:06:57 debug: /etc/init.d/gmond start done. RC=0
mach_down[2357]: 2007/03/06_15:07:01 info: /usr/lib64/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[2357]: 2007/03/06_15:07:01 info: mach_down takeover complete for node sn01.
heartbeat[2268]: 2007/03/06_15:07:01 info: mach_down takeover complete.
heartbeat[2268]: 2007/03/06_15:07:05 WARN: node sn01: is dead
heartbeat[2268]: 2007/03/06_15:07:05 info: Dead node sn01 gave up resources.
heartbeat[2268]: 2007/03/06_15:07:05 info: Link sn01:eth0 dead.
heartbeat[2268]: 2007/03/06_15:08:29 info: Heartbeat shutdown in progress. (2268)
heartbeat[2738]: 2007/03/06_15:08:29 info: Giving up all HA resources.
ResourceManager[2748]: 2007/03/06_15:08:29 info: Releasing resource group: sn01 10.10.21.154 bwfs bwfs-monitord gmond bwfssn
ResourceManager[2748]: 2007/03/06_15:08:29 info: Running /etc/init.d/bwfssn stop
ResourceManager[2748]: 2007/03/06_15:08:29 debug: Starting /etc/init.d/bwfssn stop
Stopping BWFS SN service: ResourceManager[2748]: 2007/03/06_15:08:30 debug: /etc/init.d/bwfssn stop done. RC=0
ResourceManager[2748]: 2007/03/06_15:08:30 info: Running /etc/init.d/gmond stop
ResourceManager[2748]: 2007/03/06_15:08:30 debug: Starting /etc/init.d/gmond stop
Shutting down GANGLIA gmond: [ OK ]
ResourceManager[2748]: 2007/03/06_15:08:30 debug: /etc/init.d/gmond stop done. RC=0
ResourceManager[2748]: 2007/03/06_15:08:30 info: Running /etc/init.d/bwfs-monitord stop
ResourceManager[2748]: 2007/03/06_15:08:30 debug: Starting /etc/init.d/bwfs-monitord stop
Shutting down bwfs-monitord: [ OK ]
ResourceManager[2748]: 2007/03/06_15:08:30 debug: /etc/init.d/bwfs-monitord stop done. RC=0
ResourceManager[2748]: 2007/03/06_15:08:30 info: Running /etc/init.d/bwfs stop
ResourceManager[2748]: 2007/03/06_15:08:30 debug: Starting /etc/init.d/bwfs stop
Shutting event_deal.pl: event_deal.pl: no process killed
[FAILED]
ResourceManager[2748]: 2007/03/06_15:08:30 debug: /etc/init.d/bwfs stop done. RC=0
ResourceManager[2748]: 2007/03/06_15:08:30 info: Running /etc/ha.d/resource.d/IPaddr 10.10.21.154 stop
ResourceManager[2748]: 2007/03/06_15:08:30 debug: Starting /etc/ha.d/resource.d/IPaddr 10.10.21.154 stop
SIOCDELRT: No such process
IPaddr[2898]: 2007/03/06_15:08:31 INFO: /sbin/ifconfig eth0:0 10.10.21.154 down
IPaddr[2889]: 2007/03/06_15:08:31 INFO: Success
INFO: Success
ResourceManager[2748]: 2007/03/06_15:08:31 debug: /etc/ha.d/resource.d/IPaddr 10.10.21.154 stop done. RC=0
heartbeat[2738]: 2007/03/06_15:08:31 info: All HA resources relinquished.
heartbeat[2268]: 2007/03/06_15:08:33 info: killing HBFIFO process 2271 with signal 15
heartbeat[2268]: 2007/03/06_15:08:33 info: killing HBWRITE process 2272 with signal 15
heartbeat[2268]: 2007/03/06_15:08:33 info: killing HBREAD process 2273 with signal 15
heartbeat[2268]: 2007/03/06_15:08:33 info: Core process 2271 exited. 3 remaining
heartbeat[2268]: 2007/03/06_15:08:33 info: Core process 2273 exited. 2 remaining
heartbeat[2268]: 2007/03/06_15:08:33 info: Core process 2272 exited. 1 remaining
heartbeat[2268]: 2007/03/06_15:08:33 info: sn02 Heartbeat shutdown complete.
-------------- next part --------------
SN01 10.10.21.154 bwfs bwfs-monitord gmond bwfssn
More information about the Linux-HA
mailing list