[Linux-HA] Node reboots when starting heartbeat

Alexander Födisch foedisch at eva.mpg.de
Wed Apr 30 06:07:07 MDT 2008


Hi,

yeah - I solved it. It was a stupid mistake: in /etc/ha.d/ha.cf one node was defined in the midst of comments again...
Because all nodes already are defined at top of ha.cf one node (called zentgpfsn01) exists twice in the cluster.

And I looked for wrong UUID for hours.. :)


Thanks for your help, guys!

Cheers


Junko IKEDA schrieb:
>> it looks very strange. the UUID of node1 is set correct (the same like is
> set
>> in cib.xml).
>> is this ID stored in /var/lib/heartbeat/hb_uuid?
>>
>>
>> zentgpfsn01:~ # crm_uuid
>> 07ca44ca-1bf5-4f12-8680-21f86c2e6bca
>>
>> zentgpfsn01:~ # grep zentgpfsn01 /var/lib/heartbeat/crm/cib.xml
>>       <node uname="zentgpfsn01" type="normal"
>> id="07ca44ca-1bf5-4f12-8680-21f86c2e6bca">
>>
>>
>>
>> in /var/lib/heartbeat/hostcache also the right UUqIDs are set:
>>
>> zentgpfsn01:~ # less /var/lib/heartbeat/hostcache
>> zentgpfsn01     07ca44ca-1bf5-4f12-8680-21f86c2e6bca    100
>> zentgpfsn02     f44cbb3e-fa3c-4f93-b433-0c9eb4bb5cba    100
>> zentgpfsn03     7aa4698a-a17a-4c5b-8cfe-f7226a21aee8    100
>>
>>
>> when I start heartbeat on node1, /var/lib/heartbeat/hostcache looks like
>> following
> 
> Hi,
> Could you try to remove "/var/lib/heartbeat/hostcache" before starting
> Heartbeat as Andrew says?
> It might be needed for all nodes.
> I think I encountered the similar error when I tried to replace some nodes.
> At that time, hb_delnode command, or remove hostcash was effective.
> 
> Thanks,
> Junko
> 
>> zentgpfsn01:~ # less /var/lib/heartbeat/hostcache
>> zentgpfsn01	07ca44ca-1bf5-4f12-8680-21f86c2e6bca	100
>> zentgpfsn02	f44cbb3e-fa3c-4f93-b433-0c9eb4bb5cba	100
>> zentgpfsn03	7aa4698a-a17a-4c5b-8cfe-f7226a21aee8	100
>> zentgpfsn01	00000000-0000-0000-0000-000000000000	100
>>
>>
>> it seems as node1 can not found its own UUID and start w/o one. that may
> the
>> reason for entries in logfile and the reboot:
>>
>> ccm[19837]: 2008/04/30_09:02:05 ERROR: llm_add: adding same
> node(zentgpfsn01)
>> twice(?)
>> ccm[19837]: 2008/04/30_09:02:05 ERROR: set_llm_from_heartbeat: adding node
>> zentgpfsn01 to llm failed
>> ccm[19837]: 2008/04/30_09:02:05 ERROR: Initialization failed. Exit
>> heartbeat[19822]: 2008/04/30_09:02:05 WARN: Managed
> /usr/lib64/heartbeat/ccm
>> process 19837 exited with return code 1.
>> heartbeat[19822]: 2008/04/30_09:02:05 EMERG: Rebooting system.  Reason:
>> /usr/lib64/heartbeat/ccm
>>
>>
>>
>> but when the system is up again, the same UUID is still set:
>>
>> zentgpfsn01:~ # crm_uuid
>> 07ca44ca-1bf5-4f12-8680-21f86c2e6bca
>>
>>
>>
>> anybody an idea? i'm a bit helpless....
>>
>>
>> Dominik Klein schrieb:
>>>> here is the cause:
>>>>
>>>>>  ccm[19751]: 2008/04/29_10:59:59 ERROR: llm_add: adding same
>>>>> node(zentgpfsn01) twice(?)
>>>>>  ccm[19751]: 2008/04/29_10:59:59 ERROR: set_llm_from_heartbeat:
>>>>> adding node
>>>>> zentgpfsn01 to llm failed
>>>>>  ccm[19751]: 2008/04/29_10:59:59 ERROR: Initialization failed. Exit
>>>>>  heartbeat[19737]: 2008/04/29_10:59:59 WARN: Managed
>>>>> /usr/lib64/heartbeat/ccm process 19751 exited with return code 1.
>>>> you don't have more than one machine with the same name by any chance?
>>> I saw something like this when I recently re-installed my testcluster
>>> and used an old (backup) configuration file. The uuid changed and so
>>> every node was there twice which ended in quite a mess.
>>>
>>> Regards
>>> Dominik
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA at lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>>
>> --
>> Mit besten Grüßen / Best Regards
>>
>> Alexander Födisch
>>
>> Max Planck Institute for Evolutionary Anthropology
>> -Central IT Department-
>> Deutscher Platz 6
>> D-04103 Leipzig
>>
>> Phone:  +49 (0)341 3550-168
>>      	+49 (0)341 3550-154
>> Fax:    +49 (0)341 3550-119
>> Email:  foedisch at eva.mpg.de
> 
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
> 

-- 
Mit besten Grüßen / Best Regards

Alexander Födisch

Max Planck Institute for Evolutionary Anthropology
-Central IT Department-
Deutscher Platz 6
D-04103 Leipzig

Phone:  +49 (0)341 3550-168
     	+49 (0)341 3550-154
Fax:    +49 (0)341 3550-119
Email:  foedisch at eva.mpg.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5905 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080430/53608e62/smime.bin


More information about the Linux-HA mailing list