[Linux-HA] stopping heartbeat on passive node starts all resources already started on active node

Guochun Shi gshi at ncsa.uiuc.edu
Thu Oct 13 10:43:40 MDT 2005


At 05:30 PM 10/13/2005 +0200, Alberto wrote:
>Hi,
>
>I have two node setup running heartbeat 2.0.2 rpm for rhel 3. 
>
>1) If I stop heartbeat on passive node, active one try to start resources already owned again, why?

 From the log it looks like you are using v1-style (crm disabled) resources management.
 
It is required that for a heartbeat resource agent, starting twice should be the same as start once
http://wiki.linux-ha.org/HeartbeatResourceAgent


>2) Also if there is a split-brain and both resources have all resources up when they sync again both nodes shutdown heartbeat and resources and start again resources on one node. Is this the expected behavior? shouldnt be resources just stopped in one node and left up in the other one?

We should prevent split-brain from happening. If it does happen, if the resources is shared-disk like resource, disaster happened, data corrupted and you cannot recover from that. Otherwise, restart both nodes are a better way to recover , e.g shutting down resource in one node may fail

-Guochun


>node2# /etc/init.d/heartbeat stop
>
>
>Oct 13 17:20:07 node1 heartbeat: [9050]: info: acquire local HA resources (standby).
>Oct 13 17:20:07 node1 heartbeat: [31940]: info: Received shutdown notice from 'node2'.
>Oct 13 17:20:07 node1 heartbeat: [31940]: info: Resources being acquired from node2.
>Oct 13 17:20:07 node1 ResourceManager[9070]: info: Acquiring resource group: node1 IEL IPaddr::<http://10.64.110.70/24/eth0>MailScanner warning: numerical links are often malicious:<http://10.64.110.70/24/eth0> 10.64.110.70/24/eth0
>Oct 13 17:20:08 node1 modprobe: modprobe: Can't locate module char-major-203
>Oct 13 17:20:11 node1 last message repeated 15 times
>Oct 13 17:20:15 node1 heartbeat: [9052]: info: Local Resource acquisition completed.
>Oct 13 17:20:15 node1 ResourceManager[9070]: info: Running /etc/ha.d/resource.d/IEL  start
>Oct 13 17:20:19 node1 heartbeat: [31940]: WARN: node node2: is dead
>Oct 13 17:20:19 node1 heartbeat: [31940]: info: Dead node node2 gave up resources.
>Oct 13 17:20:19 node1 ipfail: [31947]: info: Status update: Node node2 now has status dead
>Oct 13 17:20:19 node1 heartbeat: [31940]: info: Link node2:eth0 dead.
>Oct 13 17:20:19 node1 ipfail: [31947]: info: NS: We are still alive!
>Oct 13 17:20:20 node1 modprobe: modprobe: Can't locate module char-major-203
>Oct 13 17:20:20 node1 last message repeated 3 times
>Oct 13 17:20:20 node1 ipfail: [31947]: info: Link Status update: Link node2/eth0 now has status dead
>Oct 13 17:20:20 node1 ipfail: [31947]: info: Asking other side for ping node count.
>Oct 13 17:20:20 node1 ipfail: [31947]: info: Checking remote count of ping nodes.
>Oct 13 17:20:29 node1 modprobe: modprobe: Can't locate module char-major-203
>Oct 13 17:20:29 node1 last message repeated 3 times
>Oct 13 17:20:37 node1 heartbeat: [9050]: info: local HA resource acquisition completed (standby).
>Oct 13 17:20:37 node1 heartbeat: [31940]: ERROR: Ignored standby message 'done' from node1 in state 0
>Oct 13 17:20:37 node1 harc[9553]: info: Running /etc/ha.d/rc.d/status status
>Oct 13 17:20:37 node1 mach_down[9563]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
>Oct 13 17:20:37 node1 heartbeat: [31940]: info: mach_down takeover complete.
>Oct 13 17:20:37 node1 mach_down[9563]: info: mach_down takeover complete for node node2.
>Oct 13 17:20:37 node1 harc[9590]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
>Oct 13 17:20:37 node1 ip-request-resp[9590]: received ip-request-resp IEL OK yes
>Oct 13 17:20:37 node1 ResourceManager[9605]: info: Acquiring resource group: node1 IEL IPaddr::<http://10.64.110.70/24/eth0>MailScanner warning: numerical links are often malicious:<http://10.64.110.70/24/eth0> 10.64.110.70/24/eth0
>Oct 13 17:20:37 node1 modprobe: modprobe: Can't locate module char-major-203
>Oct 13 17:20:41 node1 last message repeated 7 times
>Oct 13 17:20:45 node1 ResourceManager[9605]: info: Running /etc/ha.d/resource.d/IEL  start
>Oct 13 17:20:50 node1 modprobe: modprobe: Can't locate module char-major-203
>Oct 13 17:20:58 node1 modprobe: modprobe: Can't locate module char-major-203
>
>_______________________________________________
>Linux-HA mailing list
>Linux-HA at lists.linux-ha.org
>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>See also: http://linux-ha.org/ReportingProblems 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.community.tummy.com/pipermail/linux-ha/attachments/20051013/358e912d/attachment.html


More information about the Linux-HA mailing list