[Linux-HA] A clone starts again
Andrew Beekhof
beekhof at gmail.com
Mon Oct 1 06:43:50 MDT 2007
On 9/30/07, YAMAUCHI HIDEO <renayama19661014 at ybb.ne.jp> wrote:
> > On 9/25/07, YAMAUCHI HIDEO <renayama19661014 at ybb.ne.jp> wrote:
> > > Hi.
> > >
> > > > > you also neglected to mention that you're running a custom
> > > > version of heartbeat
> > > > > what patches have you applied and are you planning on sharing them?
> > > >
> > > > oh dear, i seem to have made a mistake here :-(
> > > > please accept my apologies
> > > OK.
> > >
> > > But, doubt is left.
> > > Though failcount does not go up after resource trouble of the first node
> > > happened, why does clone stop it?
> > > By your answer, clone of the first node does not seem to stop.
> > >
> > > In addition, I tried to appoint rsc_order, but the situation
> > was the same.
> > > # <rsc_order id="order_oracle_www" from="clone0" action="start"
> > type="after"
> > > to="ipaddr"/>
> > >
> > > Will not there be a problem for clone start when I appointed rsc_order?
> >
> > the clone will only be able to start once the ipaddr has started.
> > if this what you want, then the above constraint is correct
>
> I set rsc_order.
>
> But, clone started in the first node after an IPaddr resource caused a
> monitor error in all nodes.
> If my understanding is not wrong, does not the clone stop in the first node
> by this rsc_order?
it should, but you're probably hitting a bug that i fixed a couple of
weeks ago while i was writing some documentation for the various
constraints
the latest interim build has the fix included
>
> --------------------------------------------------------------------
> Node: rh44-1 (4be91ff0-c0da-46a1-b913-acf19d8fcfb6): online
> Node: rh44-2 (2f14e3d7-aa37-4e95-9634-cd44ced9bea7): online
>
> Clone Set: clone0
> clone0-diskcheck:0 (heartbeat::ocf:Dummy): Stopped
> clone0-diskcheck:1 (heartbeat::ocf:Dummy): Started rh44-1
> --------------------------------------------------------------------
>
> fail-count of all nodes goes up to 1.
> Though fail-count of ipaddr goes up if rsc_order is effective, why does the
> clone start?
>
> I attached a file just to make sure.
>
> > > Can I request improvement if this movement is current specifications?
> > > # Because I want to never use stotnith.
> >
> > why don't you like stonith?
> >
> > its actually the only reliable way to be sure that an unresponsive
> > node is really dead (and that it is now safe to start using your data
> > on another node)
>
> I do not ignore stonith.
> I want to use stonith as the last method.
> (For example I want to wait for the operator's operation without stonith. )
>
> Is the change not to start the clone resource again by this specification
> difficult( stonith = false, on_fail = fence)?
> If the change is difficult, I understand as the current specification.
it should work already in the latest interim build
More information about the Linux-HA
mailing list