banibrata dutta bdutta at hotmail.com
Sun Apr 9 22:30:45 MDT 2000

hi horms,

could you please elaborate on what you mean by -- letting the
nodes have knowledge of the resources held by other nodes... i
suppose what you mean is, the knowledge of resources held by
other nodes, just before THIS node lost contact with all others?

if this is done, then what do we gain ? are you saying that, if
node B was running an ACT httpd service, and node A was running
an ACT mail services, (with B being SBY for mail service, and A
being SBY for httpd service), and then node A and B lose comm.,
then they keep running their ACT services, and make sure that
the SBY services never come in effect...??

although i do not have a clear picture of a mental block i have
to this solution, but i feel that, such a solution, under some
cases might lead to HA philosophy being compromised over. of
course, under such critical failure cases (i.e. if we can call
it so...), operator assistance might be mandated, and a total
software based recovery might not be possible, and this whole
arrangement is rather acceptable!! i am not so sure... when i
think as a developer, i feel rather convinced, but when i try to
think as a customer, i do not really feel that convinced!!
what do you say ?


>On Fri, Apr 07, 2000 at 10:46:56PM -0700, banibrata dutta wrote:
> >
> > Can we be so sure. I have had instances of operators comitting the
> > "human-error" of tripping over (slightly, without them even noticing
> > it), both the ethernet cables hanging a little loose, in a production
> > environment... and leaving our HA systems ACTIVE/ACTIVE, without the
> > fix i talked about in my last mail.
>We can't be sure, but it is fair to say that there are other modes
>of failure that are much more likely. This is most definately
>a problem that needs to be solved. I agree with what you said
>in another email that it may be better to have both nodes
>inactive in the case of such a failure than active, though I believe
>that by giving nodes knowlege - or the ability to querey - of what
>nodes have what resources it should be possible to largely elimitate
>the problem.

