[Linux-HA] sticky resource status "transition" with hb 2.0.2

Joachim Banzhaf joachimbanzhaf at compuserve.de
Wed Oct 26 14:32:41 MDT 2005


Hi Andrew,

Am Mittwoch, 26. Oktober 2005 20:22 schrieb Andrew Beekhof:
> On 10/26/05, joachimbanzhaf at compuserve.de <joachimbanzhaf at compuserve.de> 
wrote:

> > Is it a known bug that starting heartbeat in an environment like mine
> > never leaves resource status "transition" (which means e.g. you cannot
> > stop heartbeat)? If not, I'm happy to provide more details.
>
> That is definitely not a good thing.  Can you attach logs and the
> contents of the CIB (either the cib.xml file or output from cibadmin
> -Ql)

Sure, but I dont use cib right now. At this point it is a 1.x compatibility 
setup because I wanted to start from a working setup I know and understand 
before I switch to new 2.x functionality. Logs and config are attached.

> If its at all possible, you might want to try the latest from CVS too.

if I find the time, I will.

> > I use heartbeat rpm 2.0.2 with a minimal SuSE Pro 9.3 (with current you
> > updates).
> > I have setup just one node so far.
> > haresources has two lines for two nodes, each starting one ip address.
> > Both ip adresses get started by heartbeat just fine.
> >
> > I have setup various 0.x up to 1.2.3 heartbeat clusters, even sent some

oops, seems this got lost somehow:
...patches but this is my first attempt on 2.x.

regards

Joachim Banzhaf
-------------- next part --------------
debugfile /var/log/ha-debug
logfile	/var/log/ha-log
logfacility	local0
keepalive 2
deadtime 10
warntime 5
initdead 20
#udpport  694
baud	  115200
serial	/dev/ttyS0	# Linux
bcast eth1 eth2	# Linux
ucast eth0 192.168.111.101
ucast eth0 192.168.111.102
auto_failback off
#stonith baytech /etc/ha.d/conf/stonith.baytech
#stonith_host *     baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3  rps10 /dev/ttyS1 kathy 0 
#stonith_host kathy rps10 /dev/ttyS1 ken3 0 
#	wish to load the module with the parameter "nowayout=0" or
#	compile it without CONFIG_WATCHDOG_NOWAYOUT set. Otherwise even
#watchdog /dev/watchdog
node	jobc1
node	jobc2
#respawn hacluster /usr/lib/heartbeat/ipfail
#hopfudge 1
#deadping 30
#hbgenmethod time
#realtime off
#debug 1
#		  ipfail 	(uid=HA_CCMUSER)
#		  ccm 	 	(uid=HA_CCMUSER)
#		  ping		(gid=HA_APIGROUP)
#		  cl_status	(gid=HA_APIGROUP)
apiauth ipfail uid=hacluster
apiauth ccm uid=hacluster
apiauth cms uid=hacluster
apiauth ping gid=haclient uid=root
apiauth default gid=haclient
msgfmt  netstring
#	daemon (the default is /etc/logd.cf)
use_logd yes
#conn_logd_time 60
compression	bz2
compression_threshold 2

-------------- next part --------------
debugfile /var/log/ha-debug
logfile	/var/log/ha-log
logfacility	local7
entity logd
useapphbd no
sendqlen 256 
recvqlen 256

-------------- next part --------------
logd[6744]: 2005/10/26_22:13:38 info: logd started with /etc/logd.cf.
logd[6744]: 2005/10/26_22:13:38 WARN: Core dumps could be lost if multiple dumps occur
logd[6744]: 2005/10/26_22:13:38 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
logd[6745]: 2005/10/26_22:13:38 info: G_main_add_SignalHandler: Added signal handler for signal 15
logd[6744]: 2005/10/26_22:13:38 info: G_main_add_SignalHandler: Added signal handler for signal 15
heartbeat[6820]: 2005/10/26_22:13:38 info: Enabling logging daemon 
heartbeat[6820]: 2005/10/26_22:13:38 info: logfile and debug file are those specifiedin logd config file (default /etc/logd.cf)
heartbeat[6820]: 2005/10/26_22:13:38 WARN: Core dumps could be lost if multiple dumps occur
heartbeat[6820]: 2005/10/26_22:13:38 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
heartbeat[6820]: 2005/10/26_22:13:38 info: **************************
heartbeat[6820]: 2005/10/26_22:13:38 info: Configuration validated. Starting heartbeat 2.0.2
heartbeat[6821]: 2005/10/26_22:13:38 info: heartbeat: version 2.0.2
heartbeat[6821]: 2005/10/26_22:13:38 info: Heartbeat generation: 11
heartbeat[6821]: 2005/10/26_22:13:38 info: Removing /var/run/heartbeat/rsctmp failed, recreating.
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: Starting serial heartbeat on tty /dev/ttyS0 (115200 baud)
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth2
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound send socket to device: eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound receive socket to device: eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: started on port 694 interface eth0 to 192.168.111.101
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound send socket to device: eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound receive socket to device: eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: started on port 694 interface eth0 to 192.168.111.102
heartbeat[6821]: 2005/10/26_22:13:39 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[6821]: 2005/10/26_22:13:39 info: pid 6821 locked in memory.
heartbeat[6821]: 2005/10/26_22:13:39 info: Local status now set to: 'up'
heartbeat[6828]: 2005/10/26_22:13:39 info: pid 6828 locked in memory.
heartbeat[6824]: 2005/10/26_22:13:40 info: pid 6824 locked in memory.
heartbeat[6825]: 2005/10/26_22:13:40 info: pid 6825 locked in memory.
heartbeat[6826]: 2005/10/26_22:13:40 info: pid 6826 locked in memory.
heartbeat[6827]: 2005/10/26_22:13:40 info: pid 6827 locked in memory.
heartbeat[6821]: 2005/10/26_22:13:40 info: Link jobc1:eth1 up.
heartbeat[6829]: 2005/10/26_22:13:40 info: pid 6829 locked in memory.
heartbeat[6830]: 2005/10/26_22:13:40 info: pid 6830 locked in memory.
heartbeat[6821]: 2005/10/26_22:13:40 info: Link jobc1:eth2 up.
heartbeat[6831]: 2005/10/26_22:13:40 info: pid 6831 locked in memory.
heartbeat[6832]: 2005/10/26_22:13:40 info: pid 6832 locked in memory.
heartbeat[6833]: 2005/10/26_22:13:40 info: pid 6833 locked in memory.
heartbeat[6834]: 2005/10/26_22:13:40 info: pid 6834 locked in memory.
heartbeat[6821]: 2005/10/26_22:13:59 WARN: node jobc2: is dead
heartbeat[6821]: 2005/10/26_22:13:59 info: Local status now set to: 'active'
heartbeat[6821]: 2005/10/26_22:13:59 WARN: No STONITH device configured.
heartbeat[6821]: 2005/10/26_22:13:59 WARN: Shared disks are not protected.
heartbeat[6821]: 2005/10/26_22:13:59 info: Resources being acquired from jobc2.
heartbeat[6850]: 2005/10/26_22:13:59 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[6850][6854]: 2005/10/26_22:13:59 info: Running /etc/ha.d/rc.d/status status
mach_down[6858][6881]: 2005/10/26_22:13:59 info: Taking over resource group 192.168.111.202
ResourceManager[6899][6907]: 2005/10/26_22:13:59 info: Acquiring resource group: jobc2 192.168.111.202
ResourceManager[6899][6952]: 2005/10/26_22:13:59 info: Running /etc/ha.d/resource.d/IPaddr 192.168.111.202 start
ResourceManager[6899][6953]: 2005/10/26_22:13:59 debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.111.202 start
heartbeat[6821]: 2005/10/26_22:13:59 debug: StartNextRemoteRscReq(): child count 2
heartbeat[6851]: 2005/10/26_22:13:59 info: Local Resource acquisition completed.
heartbeat[6821]: 2005/10/26_22:13:59 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat[6821]: 2005/10/26_22:13:59 debug: StartNextRemoteRscReq(): child count 1
ls: /var/run/heartbeat/rsctmp/IPaddr/eth0:*: No such file or directory
IPaddr[6954][7009]: 2005/10/26_22:14:00 info: /sbin/ifconfig eth0:0 192.168.111.202 netmask 255.255.255.0	broadcast 192.168.111.255
IPaddr[6954][7014]: 2005/10/26_22:14:00 info: Sending Gratuitous Arp for 192.168.111.202 on eth0:0 [eth0]
IPaddr[6954][7015]: 2005/10/26_22:14:00 /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.111.202 eth0 192.168.111.202 auto 192.168.111.202 ffffffffffff
ResourceManager[6899][7019]: 2005/10/26_22:14:00 debug: /etc/ha.d/resource.d/IPaddr 192.168.111.202 start done. RC=0
mach_down[6858][7020]: 2005/10/26_22:14:00 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[6858][7023]: 2005/10/26_22:14:00 info: mach_down takeover complete for node jobc2.
heartbeat[7024]: 2005/10/26_22:14:00 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[7024][7027]: 2005/10/26_22:14:00 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[7024][7030]: 2005/10/26_22:14:00 received ip-request-resp 192.168.111.201 OK yes
send_arp[7018]: 2005/10/26_22:14:00 info: Enable using logging daemon
ResourceManager[7031][7041]: 2005/10/26_22:14:00 info: Acquiring resource group: jobc1 192.168.111.201
ResourceManager[7031][7082]: 2005/10/26_22:14:00 info: Running /etc/ha.d/resource.d/IPaddr 192.168.111.201 start
ResourceManager[7031][7083]: 2005/10/26_22:14:00 debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.111.201 start
IPaddr[7084][7132]: 2005/10/26_22:14:00 info: /sbin/ifconfig eth0:1 192.168.111.201 netmask 255.255.255.0	broadcast 192.168.111.255
IPaddr[7084][7137]: 2005/10/26_22:14:00 info: Sending Gratuitous Arp for 192.168.111.201 on eth0:1 [eth0]
IPaddr[7084][7138]: 2005/10/26_22:14:00 /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.111.201 eth0 192.168.111.201 auto 192.168.111.201 ffffffffffff
send_arp[7141]: 2005/10/26_22:14:00 info: Enable using logging daemon
ResourceManager[7031][7142]: 2005/10/26_22:14:00 debug: /etc/ha.d/resource.d/IPaddr 192.168.111.201 start done. RC=0
heartbeat[6821]: 2005/10/26_22:14:10 info: Local Resource acquisition completed. (none)
heartbeat[6821]: 2005/10/26_22:14:10 info: local resource transition completed.
heartbeat[6821]: 2005/10/26_22:14:28 debug: SO_PEERCRED returned [7168, (0:0)]
heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: cred.uid=0 cred.gid=90
heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: uidptr=0x8160fb0 gidptr=0x0
heartbeat[6821]: 2005/10/26_22:14:28 debug: SO_PEERCRED returned [7168, (0:0)]
heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: cred.uid=0 cred.gid=90
heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: uidptr=0x0 gidptr=0x815edf0
heartbeat[6821]: 2005/10/26_22:14:28 debug: SO_PEERCRED returned [7168, (0:0)]
heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: cred.uid=0 cred.gid=90
heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: uidptr=0x0 gidptr=0x81482a8
-------------- next part --------------
logd[6744]: 2005/10/26_22:13:38 info: logd started with /etc/logd.cf.
logd[6744]: 2005/10/26_22:13:38 WARN: Core dumps could be lost if multiple dumps occur
logd[6744]: 2005/10/26_22:13:38 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
logd[6745]: 2005/10/26_22:13:38 info: G_main_add_SignalHandler: Added signal handler for signal 15
logd[6744]: 2005/10/26_22:13:38 info: G_main_add_SignalHandler: Added signal handler for signal 15
heartbeat[6820]: 2005/10/26_22:13:38 info: Enabling logging daemon 
heartbeat[6820]: 2005/10/26_22:13:38 info: logfile and debug file are those specifiedin logd config file (default /etc/logd.cf)
heartbeat[6820]: 2005/10/26_22:13:38 WARN: Core dumps could be lost if multiple dumps occur
heartbeat[6820]: 2005/10/26_22:13:38 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
heartbeat[6820]: 2005/10/26_22:13:38 info: **************************
heartbeat[6820]: 2005/10/26_22:13:38 info: Configuration validated. Starting heartbeat 2.0.2
heartbeat[6821]: 2005/10/26_22:13:38 info: heartbeat: version 2.0.2
heartbeat[6821]: 2005/10/26_22:13:38 info: Heartbeat generation: 11
heartbeat[6821]: 2005/10/26_22:13:38 info: Removing /var/run/heartbeat/rsctmp failed, recreating.
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: Starting serial heartbeat on tty /dev/ttyS0 (115200 baud)
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth2
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound send socket to device: eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound receive socket to device: eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: started on port 694 interface eth0 to 192.168.111.101
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound send socket to device: eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound receive socket to device: eth0
heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: started on port 694 interface eth0 to 192.168.111.102
heartbeat[6821]: 2005/10/26_22:13:39 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[6821]: 2005/10/26_22:13:39 info: pid 6821 locked in memory.
heartbeat[6821]: 2005/10/26_22:13:39 info: Local status now set to: 'up'
heartbeat[6828]: 2005/10/26_22:13:39 info: pid 6828 locked in memory.
heartbeat[6824]: 2005/10/26_22:13:40 info: pid 6824 locked in memory.
heartbeat[6825]: 2005/10/26_22:13:40 info: pid 6825 locked in memory.
heartbeat[6826]: 2005/10/26_22:13:40 info: pid 6826 locked in memory.
heartbeat[6827]: 2005/10/26_22:13:40 info: pid 6827 locked in memory.
heartbeat[6821]: 2005/10/26_22:13:40 info: Link jobc1:eth1 up.
heartbeat[6829]: 2005/10/26_22:13:40 info: pid 6829 locked in memory.
heartbeat[6830]: 2005/10/26_22:13:40 info: pid 6830 locked in memory.
heartbeat[6821]: 2005/10/26_22:13:40 info: Link jobc1:eth2 up.
heartbeat[6831]: 2005/10/26_22:13:40 info: pid 6831 locked in memory.
heartbeat[6832]: 2005/10/26_22:13:40 info: pid 6832 locked in memory.
heartbeat[6833]: 2005/10/26_22:13:40 info: pid 6833 locked in memory.
heartbeat[6834]: 2005/10/26_22:13:40 info: pid 6834 locked in memory.
heartbeat[6821]: 2005/10/26_22:13:59 WARN: node jobc2: is dead
heartbeat[6821]: 2005/10/26_22:13:59 info: Local status now set to: 'active'
heartbeat[6821]: 2005/10/26_22:13:59 WARN: No STONITH device configured.
heartbeat[6821]: 2005/10/26_22:13:59 WARN: Shared disks are not protected.
heartbeat[6821]: 2005/10/26_22:13:59 info: Resources being acquired from jobc2.
harc[6850][6854]: 2005/10/26_22:13:59 info: Running /etc/ha.d/rc.d/status status
mach_down[6858][6881]: 2005/10/26_22:13:59 info: Taking over resource group 192.168.111.202
ResourceManager[6899][6907]: 2005/10/26_22:13:59 info: Acquiring resource group: jobc2 192.168.111.202
ResourceManager[6899][6952]: 2005/10/26_22:13:59 info: Running /etc/ha.d/resource.d/IPaddr 192.168.111.202 start
heartbeat[6851]: 2005/10/26_22:13:59 info: Local Resource acquisition completed.
heartbeat[6821]: 2005/10/26_22:13:59 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[6954][7009]: 2005/10/26_22:14:00 info: /sbin/ifconfig eth0:0 192.168.111.202 netmask 255.255.255.0	broadcast 192.168.111.255
IPaddr[6954][7014]: 2005/10/26_22:14:00 info: Sending Gratuitous Arp for 192.168.111.202 on eth0:0 [eth0]
IPaddr[6954][7015]: 2005/10/26_22:14:00 /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.111.202 eth0 192.168.111.202 auto 192.168.111.202 ffffffffffff
mach_down[6858][7020]: 2005/10/26_22:14:00 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[6858][7023]: 2005/10/26_22:14:00 info: mach_down takeover complete for node jobc2.
harc[7024][7027]: 2005/10/26_22:14:00 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[7024][7030]: 2005/10/26_22:14:00 received ip-request-resp 192.168.111.201 OK yes
send_arp[7018]: 2005/10/26_22:14:00 info: Enable using logging daemon
ResourceManager[7031][7041]: 2005/10/26_22:14:00 info: Acquiring resource group: jobc1 192.168.111.201
ResourceManager[7031][7082]: 2005/10/26_22:14:00 info: Running /etc/ha.d/resource.d/IPaddr 192.168.111.201 start
IPaddr[7084][7132]: 2005/10/26_22:14:00 info: /sbin/ifconfig eth0:1 192.168.111.201 netmask 255.255.255.0	broadcast 192.168.111.255
IPaddr[7084][7137]: 2005/10/26_22:14:00 info: Sending Gratuitous Arp for 192.168.111.201 on eth0:1 [eth0]
IPaddr[7084][7138]: 2005/10/26_22:14:00 /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.111.201 eth0 192.168.111.201 auto 192.168.111.201 ffffffffffff
send_arp[7141]: 2005/10/26_22:14:00 info: Enable using logging daemon
heartbeat[6821]: 2005/10/26_22:14:10 info: Local Resource acquisition completed. (none)
heartbeat[6821]: 2005/10/26_22:14:10 info: local resource transition completed.
-------------- next part --------------
# jobc1 192.168.111.201 drbddisk::service1 Filesystem::/dev/drbd1::/ha/service1::reiserfs
jobc1 192.168.111.201
# jobc2 192.168.111.202 drbddisk::service2 Filesystem::/dev/drbd2::/ha/service2::reiserfs
jobc2 192.168.111.202


More information about the Linux-HA mailing list