[Linux-HA] about the timing of the old node takes part in a
membership again
Junko IKEDA
ikedaj at intellilink.co.jp
Fri Apr 4 02:34:04 MDT 2008
Hi,
I am running one test for a split brain like this;
(1) start Heartbeat (node-a/node-b)
(2) run Dummy resource on node-a
(3) down the interconnect LAN -> a split brain
(4) stop Heartbeat (only node-b)
It might be just a little tricky, I modified Dummy RA on node-b as it could
"sleep 10" when monitor_0 (= prove) was called.
(5) start Heartbeat (only node-b. Heartbeat keeps running on node-a, Dummy
is running on node-a)
(6) A split brain is still ongoing, so Dummy would start on node-b, despite
it's on node-a.
(7) Dummy on node-a would sleep 10 seconds before starting... I restored the
interconnect LAN at the exact moment.
There are two results:
(case-1) ... hb_report_1
Heartbeat can recover a split brain successfully.
Dummy would go to one side.
(case-2) ... hb_report_2
Heartbeat can not recover a split brain.
The each node can not let another one be added to the membership.
This is a rare case and hard to reproduce but possible.
See attached hb_report_2/node-b/ha-log.
Heartbeat noticed that the interconnect LAN was up during a split brain.
heartbeat[9216]: 2008/04/04_16:27:22 info: Link node-b:eth2 up.
heartbeat[9216]: 2008/04/04_16:27:23 info: Link node-a:eth2 up.
but it didn't consider its partner as a member...
instance ID is wrong again.
crmd[9229]: 2008/04/04_16:27:23 info: ccm_event_detail: NEW MEMBERSHIP:
trans=2, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
It's just my intuition, if "link node" and "status update" are committed to
ha-log at once,
a split brain would be recovered success.
info: Link node-b:eth2 up.
info: Status update for node node-a: status active
Is this something a timing problem?
changeset:
Heartbeat c03bb0093041
Pacemaker e2fd067da044
Best Regards,
Junko Ikeda
NTT DATA INTELLILINK CORPORATION
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Dummy
Type: application/octet-stream
Size: 4902 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080404/b6c439c2/Dummy-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report_1.tar.gz
Type: application/octet-stream
Size: 95376 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080404/b6c439c2/hb_report_1.tar-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report_2.tar.gz
Type: application/octet-stream
Size: 84578 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080404/b6c439c2/hb_report_2.tar-0001.obj
More information about the Linux-HA
mailing list