[Linux-HA] Failover affecting other services on HA hosts

David Lee t.d.lee at durham.ac.uk
Thu Oct 11 03:20:04 MDT 2007

On Wed, 10 Oct 2007, Robin Bowes wrote:

> We have a simple heartbeat setup failing over between two hosts running
> CentOS.
> Essentially, we have two mysql server instances. maindb runs on db0 and
> leafdb runs on db1. The system is configured so that if db0 fails, db1
> takes over maindb and if db1 fails, db0 takes over leafdb.
> haresources looks like this:
> db0 IPaddr2:: Filesystem::-Lsan0::/mnt/san0::ext3 mysql-main
> db1 IPaddr2:: IPaddr2::
> Filesystem::-Lsan1::/mnt/san1::ext3 mysql-leaf
> On the same machines, we also have instances of dnscache and tinydns
> running on different IP addresses. dnscache runs on on
> machine db0 and on db1.
> What we are seeing is that the DNS service stops working after a
> failover until the dnscache/tinydns services are restarted.
> I have no idea why this might be - any ideas?
> [...]

By default the 'bind' DNS daemon seems (please correct me if I'm wrong!)
to find its local interfaces at start-up and listen explicitly and only on
those.  If other things happen, such as heartbeat adding and removing
interfaces, then that doesn't get picked up by 'bind'.  (I think this
behaviour of bind is a deliberate feature, not a bug.)  So although
'heartbeat' might migrate that public IP address for you onto a new
machines, 'bind' (by default) won't listen on it.

There is a 'bind' option called 'interface-interval' which tells it to
re-scan for changes in interfaces every 'n' minutes.  If heartbeat adjusts
the interfaces (e.g.  importing the 'public' IP address) then 'bind'
should pick it up within that interval.

It might be worth investigating that.

> Regardless, I can think of a couple of options we can implement to
> prevent this scenario:
> 1. Have heartbeat restart dnscache and tinydns after a failover. How can
> I do this?
> 2. Add a new resource failover over the dnscache/tinydns services.
> 3. Move dnscache and tinydns off the HA hosts onto a couple of "normal"
> boxes.

Another aspect to consider is this:  Suppose you have two public IP
addresses (/etc/resolv.conf on your clients); then host one address on
each machine in your pair, active/active, and let heartbeat handle both
addresses.  Perhaps include failback, so that when normal service resumes
it reverts to active/active.

If one is master and the other secondary (as distinct from two
secondaries), then presumably some sort of resource would be needed to
restart with the revised named.conf configuration (which would need to be
maintained and available across both machines).

When you get it working, it might be worth writing it up as an example for
the heartbeat wiki or other documentation.

Hope that helps.


:  David Lee                                I.T. Service          :
:  Senior Systems Programmer                Computer Centre       :
:  UNIX Team Leader                         Durham University     :
:                                           South Road            :
:  http://www.dur.ac.uk/t.d.lee/            Durham DH1 3LE        :
:  Phone: +44 191 334 2752                  U.K.                  :

More information about the Linux-HA mailing list