[Linux-HA] heartbeat configuration help (serial, pingd, score_attribute)

Andrew Beekhof beekhof at gmail.com
Wed Sep 6 13:12:18 MDT 2006


On 9/6/06, Charlie O <linux.cxo at gmail.com> wrote:
> Thanks Andrew.  Sorry for the lapse in response.
> I got pulled away onto some other projects.  I will get you some more
> information when I get a chance to re-investigate with those patches.

i think the answer has been found...

the other person seeing this had some old pingd values in the nodes
section. removing them made everything behave as expected.

>
> Charlie
>
>
>
> On 8/30/06, Andrew Beekhof <beekhof at gmail.com> wrote:
> >
> > On 8/28/06, Charlie O <linux.cxo at gmail.com> wrote:
> > > Andrew, John,
> > >
> > > I did reply with all the logs, etc.  But after bzip'ing the logs (only a
> > few
> > > minutes of debug level 1 as well as the ha.cf file, cib.xml), the
> > message
> > > was about 200k (only 100k allowed until/if the moderator alows - which
> > > hasn't been granted, yet).
> >
> > if you can get the cluster into a state where:
> > - both nodes are online
> > - both nodes can see each other
> > - only one node can see the ping node
> > - the resource isnt running on the machine that can see the ping node
> >
> > then run "cibadmin -Q" and attach the results.
> > that will be enough to tell me roughly where the problem lies.
> >
> > >
> > > HOWEVER, I did install your patch to util.c and crmd.c as I appeared to
> > have
> > > that OFFLINE symptom.
> > > I made the patch, recompiled and the OFFLINE & pingd stopping issues
> > goes
> > > away!  HOWEVER, now failover doesn't work (while serial cable is
> > attached)
> > > unless I shut heartbeat down fully on one node.
> > >
> > > Until (if) the other message I posted gets past the moderator, I
> > attached a
> > > simple gif image of what I am trying to accomplish.
> > >
> > > Thank you.
> > >
> > > Charlie
> > >
> > >
> > > On 8/25/06, Charlie O <linux.cxo at gmail.com> wrote:
> > > >
> > > >
> > > > Andrew, John,
> > > >
> > > > A recap of what I WANT to do...
> > > >
> > > > I want two servers to use heartbeat to control/failover a single IP
> > > > address ( 89.1.1.234 in my example).  I want the two nodes ("one" and
> > > > "two") to be connected via a serial connection to determine each
> > others'
> > > > health and use an auxilary ping node ( 89.1.1.253) for resource
> > control
> > > > (in addition to the other's health).  So, if a network issue happens
> > (bad
> > > > switch, bad ethernet cable, bad whatever) both nodes can stay up and
> > running
> > > > and talking to each other but will figure out which node should own
> > the IP
> > > > address they share.
> > > > I have just about figured things out.  I can pull the plug on one
> > ethernet
> > > > connection ("two" for this discussion) and everything fails over to
> > node
> > > > "one".  However, the GUI and crm_mon then shows node "two" as OFFLINE
> > and
> > > > thhe pingd-child:1 as stopped.  I can ultimately get things started on
> > "two"
> > > > again by hitting the "cleanup resource" button from the GUI (when
> > > > highligting the pingd clone).  Then all is as expected again.
> > > > What I would like is to have in this case is for HA to not stop pingd
> > on
> > > > "two" and to not put node "two" OFFLINE, OR, once the network
> > connection
> > > > comes back online (i.e. plugging the ethernet cable back in) that node
> > > > "two" wakes up automatically and rejoins and restarts the pingd child
> > > > without any manual intervention.
> > > >
> > > > Attached is the ha.cf (same on both systems), the cib.xml file, and
> > the
> > > > ha-log and ha-debug from both nodes as well as a GIF image of the
> > > > schematics.  I have upgraded and am now running this on a RHEL 4 u4 OS
> > and
> > > > HA 2.0.7.
> > > >
> > > > Any guidance would be appreciated.  This may actually be fixed in that
> > > > "OFFLINE" thread I just read about (with your newset patch)?
> > > >
> > > > Thanks,
> > > >
> > > > Charlie
> > > >
> > > > On 8/24/06, Andrew Beekhof <beekhof at gmail.com> wrote:
> > > > >
> > > > > On 8/24/06, Charlie O <linux.cxo at gmail.com > wrote:
> > > > > > Thanks, John for the pointer.
> > > > > > I have done many iterations and pulled out enough hair.  I truly
> > do
> > > > > believe
> > > > > > I understand the concepts but there does appear to be some
> > discrepancy
> > > > > in
> > > > > > the documentation from one page to the next.
> > > > > > At anyrate, I believve I have gotten further in my efforts.
> > > > > > I can now use pingd's score attribute and have the resorce
> > failover if
> > > > > I
> > > > > > pull the ethernet cable and keep the serial cable going.  My next
> > > > > problem is
> > > > > > that the instance of pingd stops and will not restart on the one
> > node
> > > > > until
> > > > > > I refresh despite setting restart_type to "restart".  And, if that
> > > > > doesn't
> > > > > > restart, the node still appears down to the cluster.  Some fencing
> > is
> > > > > going
> > > > > > on, but I am not sure where it is (note, my
> > > > > default/resource_stickiness are
> > > > > > now set to 0/0).
> > > > > >
> > > > > > Any further thoughts/ideas?
> > > > >
> > > > > If you attach your complete logs I can take a look.
> > > > >
> > > > > I cant really imagine why your nodes are being fenced nor why pingd
> > > > > would stop let alone fail to restart... the logs should help though.
> > > > >
> > > > > >
> > > > > > Thanks again,
> > > > > >
> > > > > > Charlie
> > > > > >
> > > > > > On 8/16/06, John R Mocho <jmocho at royaldc.com > wrote:
> > > > > > >
> > > > > > >
> > > > > > > I hope you don't mind me making a suggestion, but you might want
> > to
> > > > > > > look at http://www.linux-ha.com/v2/faq/forced_failover
> > > > > > >
> > > > > > > This is where I found the pot of gold I was searching for with
> > > > > regards to
> > > > > > > scores and stickiness.
> > > > > > >
> > > > > > >
> > > > > > > On Wed, 16 Aug 2006, Charlie O wrote:
> > > > > > >
> > > > > > > > Date: Wed, 16 Aug 2006 16:51:35 -0400
> > > > > > > > From: Charlie O <linux.cxo at gmail.com>
> > > > > > > > Reply-To: General Linux-HA mailing list <
> > > > > linux-ha at lists.linux-ha.org>
> > > > > > > > To: linux-ha at lists.linux-ha.org
> > > > > > > > Subject: [Linux-HA] heartbeat configuration help (serial,
> > pingd,
> > > > > > > >     score_attribute)
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > Please excuse this possibly simple configuration question, but
> > I
> > > > > was
> > > > > > > hoping
> > > > > > > > to get some help in setting up Heartbeat.
> > > > > > > > I am currently running v2.0.6 on a RHEL 4 pair of systems.
> > > > > > > > The compiling went fine and it appears stable.
> > > > > > > >
> > > > > > > > What I want to do is have two nodes monitor each other and
> > > > > failover an
> > > > > > > IP
> > > > > > > > address to the better node.
> > > > > > > > I was able to do this simply enough, but when trying to do
> > some
> > > > > added
> > > > > > > > functionality I am falling a bit short on my knowledge.
> > > > > > > >
> > > > > > > > In addition to checking the health of each node via and
> > Ethernet
> > > > > > > connection
> > > > > > > > (mcast or ucast), I have installed a serial cable.  So, if I
> > pull
> > > > > the
> > > > > > > > ethernet cable from the system the nodes notice the
> > disconnect,
> > > > > but
> > > > > > > failover
> > > > > > > > doesn't happen (as the serial link between them is still
> > > > > stable).  If I
> > > > > > > pull
> > > > > > > > the serial cable AND ethernet cable or shut down heartbeat on
> > one
> > > > > node,
> > > > > > > the
> > > > > > > > failover will occur (well, both systems will take over the
> > > > > resource).
> > > > > > > >
> > > > > > > > What I have been trying to do is set up pingd to ping an
> > > > > additional IP
> > > > > > > > address and have that be a deciding factor.  I have set up the
> > > > > clone and
> > > > > > > > pingd seems to do as expected, however if I pull the ethernet
> > > > > cable on
> > > > > > > one
> > > > > > > > of the nodes, the failover still does not take place.  I have
> > > > > tried many
> > > > > > > > possibilities of setting resource stickyness, but still no
> > luck.
> > > > > > > >
> > > > > > > > Attached is the ha.cf file (identical on both nodes) as well
> > as
> > > > > the most
> > > > > > > > recent cib.xml file.
> > > > > > > >
> > > > > > > > Can someone tell me
> > > > > > > >
> > > > > > > > 1) Is what I am proposing possible
> > > > > > > > 2) Where I might be going wrong
> > > > > > > >
> > > > > > > > Many thanks,
> > > > > > > >
> > > > > > > > Charlie O'Brien
> > > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Linux-HA mailing list
> > > > > > > Linux-HA at lists.linux-ha.org
> > > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > > > See also: http://linux-ha.org/ReportingProblems
> > > > > > >
> > > > > > _______________________________________________
> > > > > > Linux-HA mailing list
> > > > > > Linux-HA at lists.linux-ha.org
> > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > > See also: http://linux-ha.org/ReportingProblems
> > > > > >
> > > > > _______________________________________________
> > > > > Linux-HA mailing list
> > > > > Linux-HA at lists.linux-ha.org
> > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > See also: http://linux-ha.org/ReportingProblems
> > > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> > >
> > >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list