drbd question

Alan Robertson alanr@suse.com
Tue, 04 Apr 2000 21:12:43 -0600


"Stephen C. Tweedie" wrote:
> 
> Alan Robertson wrote:
> 
> > The continuous memory scheme is simple, and quite reliable in
> > combination with quorum.
> 
> Is it?  If I have a cluster reboot in which all my nodes die and
> come back up, and I have quorum, and I have a single surviving
> drbd mirror whose history is sequential and current up to the
> point at which that mirror died, I still don't know for sure whether
> or not there are any nodes in a lost partition which may have more
> recent history.

It is reliable in that it *won't* provide bad data.  In the proposal I
described before, the machine would not come up automatically in the
circumstances I believe you have described.  Although I do confess to
not being quite sure exactly what you were describing.  If you would
describe the sequence of operations in more detail, then I would know
for sure.  I believe that the mirrors would enter the sit_and_cry() mode
of operation given the circumstances I think you've described.  At that
point, a human being could examine logs and the like to tell which
machine to make the source of the data.  [Or implement the memory
technique described below]
 
> > If you have the voting members of a cluster retain memory of the last
> > known source(s) of data for the cluster, then you can also solve the
> > cold-reboot-after-complete-crash (CRACC) problem.
> >
> Exactly, that is precisely what I have been saying!  Good, I'm
> glad we agree on this.

[I haven't intended to disagree on this.  I think there must have been
some communication error.]

Hopefully this (CRACC) is a rare occurance :-)  A complete shutdown for
some administrative reason is not too surprising, but hopefully a
complete and catastrophic failure resulting in no partition having
quorum happens only rarely.  What the memory protects against is
requiring manual intervention in this case.  If one started out without
this feature [memory], I wouldn't see it as a disaster.  Not to say that
it isn't desirable, but that it isn't (IMHO) highest priority.

I suppose you should also remember what size the cluster was at the time
of the memory, and require manual intervention if the cluster size
changed before coming back up.  If there were 9 machines before, you
*really* want to make sure you get the votes of 5 machines that were
voting members of the cluster just before you lost quorum (or
consciousness).  If you changed the cluster size to 5 before coming up,
then a former-minority vote of 3 could give you stale data.  At this
point, human judgement (for better or worse) seems needed.

	-- Alan Robertson
	   alanr@suse.com