[Linux-HA] help setting up drbd-8.0.0 and heartbeat-2.0.8
Kirby Bakken
kirbyb at us.ibm.com
Wed Mar 7 12:06:36 MST 2007
I've got drbd working. I can manually do drbadm secondary all on machine
'1', and drbdadm primary all on machine '2', then mount /dev/drbd0
/opt/mcp/shared on machine '2', and all is well, I've even successfully
started nfs and nfs exported/mounted /opt/mcp/shared onto another machine.
Both machines have dual ethernet ports, the eth1 port's of both are
connected to each other with a crossover cable. I have a 'third/virtual'
IP address to used for NFS access.
I have two RHEL4 (2.6.9-42.0.3.ELsmp) x86_64 (dual opteron) pc's. I have
compiled/installed drbd and heartbeat. My drbd.con, ha.cf, and
haresources files are listed below. I run BasicSanityCheck, and get
errors... I'm also confused about having to add an 'hacluster' userid,
and 'haclient' group... None of the howto's indicate this... Is this
'new' for 2.0.8?
Can anyone help me tweak my setup, and/or diagnose my problem? I would be
eternally grateful! At least until the end of the day... :-)
-------- start drb.conf:
global {
usage-count yes;
}
common {
syncer { rate 10M; }
}
resource r0 {
protocol C;
handlers {
pri-on-incon-degr "echo O > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo O > /proc/sysrq-trigger ; halt -f";
local-io-error "echo O > /proc/sysrq-trigger ; halt -f";
}
startup {
degr-wfc-timeout 120; # 2 minutes.
}
disk {
on-io-error detach;
}
net {
after-sb-0pri discard-younger-primary;
after-sb-1pri consensus;
after-sb-2pri disconnect;
}
syncer {
rate 10M;
al-extents 257;
}
on fspbro213.rchland.ibm.com {
device /dev/drbd0;
disk /dev/sdb1;
address 9.5.111.127:7788;
meta-disk internal;
}
on fspbro214.rchland.ibm.com {
device /dev/drbd0;
disk /dev/sdb1;
address 9.5.111.128:7788;
meta-disk internal;
}
}
------end drbd.conf
and here's ha.cf:
-------- start ha.cf
logfacility local0
#
keepalive 2
#
deadtime 30
#
bcast eth0 eth1
#
auto_failback on
#
node fspbro213.rchland.ibm.com
node fspbro214.rchland.ibm.com
--------end ha.cf
and haresources:
-------- start haresources
fspbro214.rchland.ibm.com IPaddr::9.5.111.125/24/eth0 drbddisk::r0
Filesystem::/dev/drbd0::/opt/mcp/shared::ext3 nfs
-------- end haresource
I then run BasicSanityCheck. I found numerous errors regarding uid/gid
hacluster/haclient, so I created a 'haclient' group and an 'hacluster'
userid that belonged to group haclient, then ran BasicSanityCheck again.
This time I got lots of errors about not having perms in
/var/lib/heartbeat and /var/run/heartbeat, so I did chown'ed all the
files/dirs in those two dir trees to hacluster:haclient, and re-ran
BasicSanityCheck. Now I get this:
CRM tests passed.
heartbeat[31787]: 2007/03/07_12:24:07 ERROR: api_process_registration_msg:
cannot add client(32069)
heartbeat[31787]: 2007/03/07_12:24:07 ERROR: api_process_registration_msg:
cannot add client(32070)
heartbeat[31787]: 2007/03/07_12:24:07 ERROR: api_process_registration_msg:
cannot add client(32071)
heartbeat[31787]: 2007/03/07_12:24:07 ERROR: api_process_registration_msg:
cannot add client(32075)
heartbeat[31787]: 2007/03/07_12:24:07 ERROR: api_process_registration_msg:
cannot add client(32079)
heartbeat[31787]: 2007/03/07_12:24:07 ERROR: api_process_registration_msg:
cannot add client(32084)
heartbeat[31787]: 2007/03/07_12:24:07 ERROR: api_process_registration_msg:
cannot add client(32089)
heartbeat[447]: 2007/03/07_12:25:05 ERROR: api_process_registration_msg:
cannot add client(737)
heartbeat[447]: 2007/03/07_12:25:05 ERROR: api_process_registration_msg:
cannot add client(738)
heartbeat[1010]: 2007/03/07_12:25:28 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v exited with return code 2.
heartbeat[1010]: 2007/03/07_12:25:28 ERROR: Respawning client
"/usr/lib64/heartbeat/mgmtd -v":
heartbeat[1010]: 2007/03/07_12:25:29 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v exited with return code 2.
heartbeat[1010]: 2007/03/07_12:25:29 ERROR: Respawning client
"/usr/lib64/heartbeat/mgmtd -v":
heartbeat[1010]: 2007/03/07_12:25:30 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v exited with return code 2.
heartbeat[1010]: 2007/03/07_12:25:30 ERROR: Respawning client
"/usr/lib64/heartbeat/mgmtd -v":
heartbeat[1010]: 2007/03/07_12:25:31 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v exited with return code 2.
heartbeat[1010]: 2007/03/07_12:25:31 ERROR: Respawning client
"/usr/lib64/heartbeat/mgmtd -v":
heartbeat[1010]: 2007/03/07_12:25:32 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v exited with return code 2.
heartbeat[1010]: 2007/03/07_12:25:32 ERROR: Respawning client
"/usr/lib64/heartbeat/mgmtd -v":
heartbeat[1010]: 2007/03/07_12:25:33 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v exited with return code 2.
heartbeat[1010]: 2007/03/07_12:25:33 ERROR: Respawning client
"/usr/lib64/heartbeat/mgmtd -v":
heartbeat[1010]: 2007/03/07_12:25:35 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v exited with return code 2.
heartbeat[1010]: 2007/03/07_12:25:35 ERROR: Respawning client
"/usr/lib64/heartbeat/mgmtd -v":
heartbeat[1010]: 2007/03/07_12:25:36 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v exited with return code 2.
heartbeat[1010]: 2007/03/07_12:25:36 ERROR: Respawning client
"/usr/lib64/heartbeat/mgmtd -v":
heartbeat[1010]: 2007/03/07_12:25:37 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v exited with return code 2.
heartbeat[1010]: 2007/03/07_12:25:37 ERROR: Respawning client
"/usr/lib64/heartbeat/mgmtd -v":
heartbeat[1010]: 2007/03/07_12:25:38 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v exited with return code 2.
heartbeat[1010]: 2007/03/07_12:25:38 ERROR: Respawning client
"/usr/lib64/heartbeat/mgmtd -v":
heartbeat[1010]: 2007/03/07_12:25:39 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v exited with return code 2.
heartbeat[1010]: 2007/03/07_12:25:39 ERROR: Client
/usr/lib64/heartbeat/mgmtd -v "respawning too fast"
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:28 ERROR: Client /usr/lib64/heartbeat/mgmtd -v exited
with return code 2.
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:28 ERROR: Respawning client "/usr/lib64/heartbeat/mgmtd
-v":
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:29 ERROR: Client /usr/lib64/heartbeat/mgmtd -v exited
with return code 2.
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:29 ERROR: Respawning client "/usr/lib64/heartbeat/mgmtd
-v":
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:30 ERROR: Client /usr/lib64/heartbeat/mgmtd -v exited
with return code 2.
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:30 ERROR: Respawning client "/usr/lib64/heartbeat/mgmtd
-v":
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:31 ERROR: Client /usr/lib64/heartbeat/mgmtd -v exited
with return code 2.
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:31 ERROR: Respawning client "/usr/lib64/heartbeat/mgmtd
-v":
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:32 ERROR: Client /usr/lib64/heartbeat/mgmtd -v exited
with return code 2.
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:32 ERROR: Respawning client "/usr/lib64/heartbeat/mgmtd
-v":
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:33 ERROR: Client /usr/lib64/heartbeat/mgmtd -v exited
with return code 2.
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:33 ERROR: Respawning client "/usr/lib64/heartbeat/mgmtd
-v":
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:35 ERROR: Client /usr/lib64/heartbeat/mgmtd -v exited
with return code 2.
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:35 ERROR: Respawning client "/usr/lib64/heartbeat/mgmtd
-v":
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:36 ERROR: Client /usr/lib64/heartbeat/mgmtd -v exited
with return code 2.
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:36 ERROR: Respawning client "/usr/lib64/heartbeat/mgmtd
-v":
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:37 ERROR: Client /usr/lib64/heartbeat/mgmtd -v exited
with return code 2.
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:37 ERROR: Respawning client "/usr/lib64/heartbeat/mgmtd
-v":
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:38 ERROR: Client /usr/lib64/heartbeat/mgmtd -v exited
with return code 2.
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:38 ERROR: Respawning client "/usr/lib64/heartbeat/mgmtd
-v":
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:39 ERROR: Client /usr/lib64/heartbeat/mgmtd -v exited
with return code 2.
Mar 07 12:26:10 fspbro214.rchland.ibm.com CTS: BadNews: heartbeat[1010]:
2007/03/07_12:25:39 ERROR: Client /usr/lib64/heartbeat/mgmtd -v
"respawning too fast"
OOPS! Looks like we had some errors come up.
4 errors. Log file is stored in /tmp/linux-ha.testlog
=======================
Kirby Bakken
ESW Build Architect
Rochester, MN
email: kirbyb at us.ibm.com
ezpage:kirbyb
507-253-4549 / Tie: 553-4549
Fax: 507-253-3495
......one more straw can't possibly matter....
More information about the Linux-HA
mailing list