[Linux-HA] Node reboots when starting heartbeat
Alexander Födisch
foedisch at eva.mpg.de
Wed Apr 30 01:12:10 MDT 2008
it looks very strange. the UUID of node1 is set correct (the same like is set in cib.xml).
is this ID stored in /var/lib/heartbeat/hb_uuid?
zentgpfsn01:~ # crm_uuid
07ca44ca-1bf5-4f12-8680-21f86c2e6bca
zentgpfsn01:~ # grep zentgpfsn01 /var/lib/heartbeat/crm/cib.xml
<node uname="zentgpfsn01" type="normal" id="07ca44ca-1bf5-4f12-8680-21f86c2e6bca">
in /var/lib/heartbeat/hostcache also the right UUqIDs are set:
zentgpfsn01:~ # less /var/lib/heartbeat/hostcache
zentgpfsn01 07ca44ca-1bf5-4f12-8680-21f86c2e6bca 100
zentgpfsn02 f44cbb3e-fa3c-4f93-b433-0c9eb4bb5cba 100
zentgpfsn03 7aa4698a-a17a-4c5b-8cfe-f7226a21aee8 100
when I start heartbeat on node1, /var/lib/heartbeat/hostcache looks like following
zentgpfsn01:~ # less /var/lib/heartbeat/hostcache
zentgpfsn01 07ca44ca-1bf5-4f12-8680-21f86c2e6bca 100
zentgpfsn02 f44cbb3e-fa3c-4f93-b433-0c9eb4bb5cba 100
zentgpfsn03 7aa4698a-a17a-4c5b-8cfe-f7226a21aee8 100
zentgpfsn01 00000000-0000-0000-0000-000000000000 100
it seems as node1 can not found its own UUID and start w/o one. that may the reason for entries in logfile and the reboot:
ccm[19837]: 2008/04/30_09:02:05 ERROR: llm_add: adding same node(zentgpfsn01) twice(?)
ccm[19837]: 2008/04/30_09:02:05 ERROR: set_llm_from_heartbeat: adding node zentgpfsn01 to llm failed
ccm[19837]: 2008/04/30_09:02:05 ERROR: Initialization failed. Exit
heartbeat[19822]: 2008/04/30_09:02:05 WARN: Managed /usr/lib64/heartbeat/ccm process 19837 exited with return code 1.
heartbeat[19822]: 2008/04/30_09:02:05 EMERG: Rebooting system. Reason: /usr/lib64/heartbeat/ccm
but when the system is up again, the same UUID is still set:
zentgpfsn01:~ # crm_uuid
07ca44ca-1bf5-4f12-8680-21f86c2e6bca
anybody an idea? i'm a bit helpless....
Dominik Klein schrieb:
>> here is the cause:
>>
>>> ccm[19751]: 2008/04/29_10:59:59 ERROR: llm_add: adding same
>>> node(zentgpfsn01) twice(?)
>>> ccm[19751]: 2008/04/29_10:59:59 ERROR: set_llm_from_heartbeat:
>>> adding node
>>> zentgpfsn01 to llm failed
>>> ccm[19751]: 2008/04/29_10:59:59 ERROR: Initialization failed. Exit
>>> heartbeat[19737]: 2008/04/29_10:59:59 WARN: Managed
>>> /usr/lib64/heartbeat/ccm process 19751 exited with return code 1.
>>
>> you don't have more than one machine with the same name by any chance?
>
> I saw something like this when I recently re-installed my testcluster
> and used an old (backup) configuration file. The uuid changed and so
> every node was there twice which ended in quite a mess.
>
> Regards
> Dominik
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
--
Mit besten Grüßen / Best Regards
Alexander Födisch
Max Planck Institute for Evolutionary Anthropology
-Central IT Department-
Deutscher Platz 6
D-04103 Leipzig
Phone: +49 (0)341 3550-168
+49 (0)341 3550-154
Fax: +49 (0)341 3550-119
Email: foedisch at eva.mpg.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5905 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080430/d9b4bc0e/smime.bin
More information about the Linux-HA
mailing list