[Linux-HA] Node reboots when starting heartbeat
Alexander Födisch
foedisch at eva.mpg.de
Tue Apr 29 04:12:51 MDT 2008
Hi all,
I'm new here at the mailinglist and I have a problem with my three-node cluster.
When I start heartbeat on one of the three nodes the system always does a reboot. In the log I found the error "register_with_ha:
get_uuid_by_name() failed". But I can not remember that I did some changes. I also restored all files of /var/lib/heartbeat/crm/ from backup
- w/o success... :(
~# tail /var/log/ha-debug
[...]
lrmd[19753]: 2008/04/29_10:59:59 WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum
supportability
lrmd[19753]: 2008/04/29_10:59:59 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
lrmd[19753]: 2008/04/29_10:59:59 info: G_main_add_SignalHandler: Added signal handler for signal 10
cib[19752]: 2008/04/29_10:59:59 info: log_data_element: readCibXmlFile: [on-disk] <status/>
lrmd[19753]: 2008/04/29_10:59:59 info: G_main_add_SignalHandler: Added signal handler for signal 12
cib[19752]: 2008/04/29_10:59:59 info: log_data_element: readCibXmlFile: [on-disk] </cib>
lrmd[19753]: 2008/04/29_10:59:59 info: Started.
mgmtd[19757]: 2008/04/29_10:59:59 WARN: Core dumps could be lost if multiple dumps occur.
mgmtd[19757]: 2008/04/29_10:59:59 WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum
supportability
mgmtd[19757]: 2008/04/29_10:59:59 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
mgmtd[19757]: 2008/04/29_10:59:59 info: G_main_add_SignalHandler: Added signal handler for signal 10
mgmtd[19757]: 2008/04/29_10:59:59 info: G_main_add_SignalHandler: Added signal handler for signal 12
attrd[19755]: 2008/04/29_10:59:59 WARN: get_uuid: Could not calculate UUID for zentgpfsn01
-> I think here is the reason:
attrd[19755]: 2008/04/29_10:59:59 ERROR: register_with_ha: get_uuid_by_name() failed
attrd[19755]: 2008/04/29_10:59:59 ERROR: main: HA Signon failed
attrd[19755]: 2008/04/29_10:59:59 ERROR: main: Aborting startup
heartbeat[19737]: 2008/04/29_10:59:59 WARN: Managed /usr/lib64/heartbeat/attrd process 19755 exited with return code 100.
mgmtd[19757]: 2008/04/29_10:59:59 info: init_crm
mgmtd[19757]: 2008/04/29_10:59:59 info: login to cib: 0, ret:-10
cib[19752]: 2008/04/29_10:59:59 notice: readCibXmlFile: Enabling DTD validation on the existing (sane) configuration
cib[19752]: 2008/04/29_10:59:59 info: startCib: CIB Initialization completed successfully
cib[19752]: 2008/04/29_10:59:59 info: cib_register_ha: Signing in with Heartbeat
ccm[19751]: 2008/04/29_10:59:59 ERROR: llm_add: adding same node(zentgpfsn01) twice(?)
ccm[19751]: 2008/04/29_10:59:59 ERROR: set_llm_from_heartbeat: adding node zentgpfsn01 to llm failed
ccm[19751]: 2008/04/29_10:59:59 ERROR: Initialization failed. Exit
heartbeat[19737]: 2008/04/29_10:59:59 WARN: Managed /usr/lib64/heartbeat/ccm process 19751 exited with return code 1.
cib[19752]: 2008/04/29_10:59:59 info: cib_register_ha: FSA Hostname: zentgpfsn01
-> and game over :)
heartbeat[19737]: 2008/04/29_10:59:59 EMERG: Rebooting system. Reason: /usr/lib64/heartbeat/ccm
Anybody an idea?
Thanks!
--
Mit besten Grüßen / Best Regards
Alexander Födisch
Max Planck Institute for Evolutionary Anthropology
-Central IT Department-
Deutscher Platz 6
D-04103 Leipzig
Phone: +49 (0)341 3550-168
+49 (0)341 3550-154
Fax: +49 (0)341 3550-119
Email: foedisch at eva.mpg.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5905 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080429/280eebf1/smime.bin
More information about the Linux-HA
mailing list