[Linux-HA] about the timing of the old node takes part in a membership again

Junko IKEDA ikedaj at intellilink.co.jp
Fri Apr 4 02:34:04 MDT 2008


Hi,

I am running one test for a split brain like this;

(1) start Heartbeat (node-a/node-b)
(2) run Dummy resource on node-a
(3) down the interconnect LAN -> a split brain
(4) stop Heartbeat (only node-b)

It might be just a little tricky, I modified Dummy RA on node-b as it could
"sleep 10" when monitor_0 (= prove) was called.

(5) start Heartbeat (only node-b. Heartbeat keeps running on node-a, Dummy
is running on node-a)
(6) A split brain is still ongoing, so Dummy would start on node-b, despite
it's on node-a.
(7) Dummy on node-a would sleep 10 seconds before starting... I restored the
interconnect LAN at the exact moment.

There are two results:
(case-1) ... hb_report_1
Heartbeat can recover a split brain successfully.
Dummy would go to one side.

(case-2) ... hb_report_2
Heartbeat can not recover a split brain.
The each node can not let another one be added to the membership.
This is a rare case and hard to reproduce but possible.

See attached hb_report_2/node-b/ha-log.
Heartbeat noticed that the interconnect LAN was up during a split brain.

heartbeat[9216]: 2008/04/04_16:27:22 info: Link node-b:eth2 up.
heartbeat[9216]: 2008/04/04_16:27:23 info: Link node-a:eth2 up.

but it didn't consider its partner as a member...
instance ID is wrong again.
crmd[9229]: 2008/04/04_16:27:23 info: ccm_event_detail: NEW MEMBERSHIP:
trans=2, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3


It's just my intuition, if "link node" and "status update" are committed to
ha-log at once,
a split brain would be recovered success.

info: Link node-b:eth2 up.
info: Status update for node node-a: status active

Is this something a timing problem?

changeset:
Heartbeat	c03bb0093041
Pacemaker	e2fd067da044


Best Regards,
Junko Ikeda

NTT DATA INTELLILINK CORPORATION

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Dummy
Type: application/octet-stream
Size: 4902 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080404/b6c439c2/Dummy-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report_1.tar.gz
Type: application/octet-stream
Size: 95376 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080404/b6c439c2/hb_report_1.tar-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report_2.tar.gz
Type: application/octet-stream
Size: 84578 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080404/b6c439c2/hb_report_2.tar-0001.obj


More information about the Linux-HA mailing list