[Linux-HA] heartbeat configuration help (serial, pingd,
score_attribute)
Charlie O
linux.cxo at gmail.com
Wed Sep 6 10:17:13 MDT 2006
Thanks Andrew. Sorry for the lapse in response.
I got pulled away onto some other projects. I will get you some more
information when I get a chance to re-investigate with those patches.
Charlie
On 8/30/06, Andrew Beekhof <beekhof at gmail.com> wrote:
>
> On 8/28/06, Charlie O <linux.cxo at gmail.com> wrote:
> > Andrew, John,
> >
> > I did reply with all the logs, etc. But after bzip'ing the logs (only a
> few
> > minutes of debug level 1 as well as the ha.cf file, cib.xml), the
> message
> > was about 200k (only 100k allowed until/if the moderator alows - which
> > hasn't been granted, yet).
>
> if you can get the cluster into a state where:
> - both nodes are online
> - both nodes can see each other
> - only one node can see the ping node
> - the resource isnt running on the machine that can see the ping node
>
> then run "cibadmin -Q" and attach the results.
> that will be enough to tell me roughly where the problem lies.
>
> >
> > HOWEVER, I did install your patch to util.c and crmd.c as I appeared to
> have
> > that OFFLINE symptom.
> > I made the patch, recompiled and the OFFLINE & pingd stopping issues
> goes
> > away! HOWEVER, now failover doesn't work (while serial cable is
> attached)
> > unless I shut heartbeat down fully on one node.
> >
> > Until (if) the other message I posted gets past the moderator, I
> attached a
> > simple gif image of what I am trying to accomplish.
> >
> > Thank you.
> >
> > Charlie
> >
> >
> > On 8/25/06, Charlie O <linux.cxo at gmail.com> wrote:
> > >
> > >
> > > Andrew, John,
> > >
> > > A recap of what I WANT to do...
> > >
> > > I want two servers to use heartbeat to control/failover a single IP
> > > address ( 89.1.1.234 in my example). I want the two nodes ("one" and
> > > "two") to be connected via a serial connection to determine each
> others'
> > > health and use an auxilary ping node ( 89.1.1.253) for resource
> control
> > > (in addition to the other's health). So, if a network issue happens
> (bad
> > > switch, bad ethernet cable, bad whatever) both nodes can stay up and
> running
> > > and talking to each other but will figure out which node should own
> the IP
> > > address they share.
> > > I have just about figured things out. I can pull the plug on one
> ethernet
> > > connection ("two" for this discussion) and everything fails over to
> node
> > > "one". However, the GUI and crm_mon then shows node "two" as OFFLINE
> and
> > > thhe pingd-child:1 as stopped. I can ultimately get things started on
> "two"
> > > again by hitting the "cleanup resource" button from the GUI (when
> > > highligting the pingd clone). Then all is as expected again.
> > > What I would like is to have in this case is for HA to not stop pingd
> on
> > > "two" and to not put node "two" OFFLINE, OR, once the network
> connection
> > > comes back online (i.e. plugging the ethernet cable back in) that node
> > > "two" wakes up automatically and rejoins and restarts the pingd child
> > > without any manual intervention.
> > >
> > > Attached is the ha.cf (same on both systems), the cib.xml file, and
> the
> > > ha-log and ha-debug from both nodes as well as a GIF image of the
> > > schematics. I have upgraded and am now running this on a RHEL 4 u4 OS
> and
> > > HA 2.0.7.
> > >
> > > Any guidance would be appreciated. This may actually be fixed in that
> > > "OFFLINE" thread I just read about (with your newset patch)?
> > >
> > > Thanks,
> > >
> > > Charlie
> > >
> > > On 8/24/06, Andrew Beekhof <beekhof at gmail.com> wrote:
> > > >
> > > > On 8/24/06, Charlie O <linux.cxo at gmail.com > wrote:
> > > > > Thanks, John for the pointer.
> > > > > I have done many iterations and pulled out enough hair. I truly
> do
> > > > believe
> > > > > I understand the concepts but there does appear to be some
> discrepancy
> > > > in
> > > > > the documentation from one page to the next.
> > > > > At anyrate, I believve I have gotten further in my efforts.
> > > > > I can now use pingd's score attribute and have the resorce
> failover if
> > > > I
> > > > > pull the ethernet cable and keep the serial cable going. My next
> > > > problem is
> > > > > that the instance of pingd stops and will not restart on the one
> node
> > > > until
> > > > > I refresh despite setting restart_type to "restart". And, if that
> > > > doesn't
> > > > > restart, the node still appears down to the cluster. Some fencing
> is
> > > > going
> > > > > on, but I am not sure where it is (note, my
> > > > default/resource_stickiness are
> > > > > now set to 0/0).
> > > > >
> > > > > Any further thoughts/ideas?
> > > >
> > > > If you attach your complete logs I can take a look.
> > > >
> > > > I cant really imagine why your nodes are being fenced nor why pingd
> > > > would stop let alone fail to restart... the logs should help though.
> > > >
> > > > >
> > > > > Thanks again,
> > > > >
> > > > > Charlie
> > > > >
> > > > > On 8/16/06, John R Mocho <jmocho at royaldc.com > wrote:
> > > > > >
> > > > > >
> > > > > > I hope you don't mind me making a suggestion, but you might want
> to
> > > > > > look at http://www.linux-ha.com/v2/faq/forced_failover
> > > > > >
> > > > > > This is where I found the pot of gold I was searching for with
> > > > regards to
> > > > > > scores and stickiness.
> > > > > >
> > > > > >
> > > > > > On Wed, 16 Aug 2006, Charlie O wrote:
> > > > > >
> > > > > > > Date: Wed, 16 Aug 2006 16:51:35 -0400
> > > > > > > From: Charlie O <linux.cxo at gmail.com>
> > > > > > > Reply-To: General Linux-HA mailing list <
> > > > linux-ha at lists.linux-ha.org>
> > > > > > > To: linux-ha at lists.linux-ha.org
> > > > > > > Subject: [Linux-HA] heartbeat configuration help (serial,
> pingd,
> > > > > > > score_attribute)
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > Please excuse this possibly simple configuration question, but
> I
> > > > was
> > > > > > hoping
> > > > > > > to get some help in setting up Heartbeat.
> > > > > > > I am currently running v2.0.6 on a RHEL 4 pair of systems.
> > > > > > > The compiling went fine and it appears stable.
> > > > > > >
> > > > > > > What I want to do is have two nodes monitor each other and
> > > > failover an
> > > > > > IP
> > > > > > > address to the better node.
> > > > > > > I was able to do this simply enough, but when trying to do
> some
> > > > added
> > > > > > > functionality I am falling a bit short on my knowledge.
> > > > > > >
> > > > > > > In addition to checking the health of each node via and
> Ethernet
> > > > > > connection
> > > > > > > (mcast or ucast), I have installed a serial cable. So, if I
> pull
> > > > the
> > > > > > > ethernet cable from the system the nodes notice the
> disconnect,
> > > > but
> > > > > > failover
> > > > > > > doesn't happen (as the serial link between them is still
> > > > stable). If I
> > > > > > pull
> > > > > > > the serial cable AND ethernet cable or shut down heartbeat on
> one
> > > > node,
> > > > > > the
> > > > > > > failover will occur (well, both systems will take over the
> > > > resource).
> > > > > > >
> > > > > > > What I have been trying to do is set up pingd to ping an
> > > > additional IP
> > > > > > > address and have that be a deciding factor. I have set up the
> > > > clone and
> > > > > > > pingd seems to do as expected, however if I pull the ethernet
> > > > cable on
> > > > > > one
> > > > > > > of the nodes, the failover still does not take place. I have
> > > > tried many
> > > > > > > possibilities of setting resource stickyness, but still no
> luck.
> > > > > > >
> > > > > > > Attached is the ha.cf file (identical on both nodes) as well
> as
> > > > the most
> > > > > > > recent cib.xml file.
> > > > > > >
> > > > > > > Can someone tell me
> > > > > > >
> > > > > > > 1) Is what I am proposing possible
> > > > > > > 2) Where I might be going wrong
> > > > > > >
> > > > > > > Many thanks,
> > > > > > >
> > > > > > > Charlie O'Brien
> > > > > > >
> > > > > > _______________________________________________
> > > > > > Linux-HA mailing list
> > > > > > Linux-HA at lists.linux-ha.org
> > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > > See also: http://linux-ha.org/ReportingProblems
> > > > > >
> > > > > _______________________________________________
> > > > > Linux-HA mailing list
> > > > > Linux-HA at lists.linux-ha.org
> > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > See also: http://linux-ha.org/ReportingProblems
> > > > >
> > > > _______________________________________________
> > > > Linux-HA mailing list
> > > > Linux-HA at lists.linux-ha.org
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > > >
> > >
> > >
> > >
> >
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> >
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list