[Linux-HA] Re: Failover affecting other services on HA hosts

Robin Bowes robin-lists at robinbowes.com
Thu Oct 11 06:58:22 MDT 2007


David Lee wrote:
> On Wed, 10 Oct 2007, Robin Bowes wrote:
> 
>> We have a simple heartbeat setup failing over between two hosts running
>> CentOS.
>>
>> Essentially, we have two mysql server instances. maindb runs on db0 and
>> leafdb runs on db1. The system is configured so that if db0 fails, db1
>> takes over maindb and if db1 fails, db0 takes over leafdb.
>>
>> haresources looks like this:
>>
>> db0 IPaddr2::172.28.28.10/32 Filesystem::-Lsan0::/mnt/san0::ext3 mysql-main
>> db1 IPaddr2::172.28.28.9/32 IPaddr2::172.28.28.11/32
>> Filesystem::-Lsan1::/mnt/san1::ext3 mysql-leaf
>>
>> On the same machines, we also have instances of dnscache and tinydns
>> running on different IP addresses. dnscache runs on 172.28.28.6 on
>> machine db0 and 172.28.28.5 on db1.
>>
>> What we are seeing is that the DNS service stops working after a
>> failover until the dnscache/tinydns services are restarted.
>>
>> I have no idea why this might be - any ideas?
>> [...]
> 
> By default the 'bind' DNS daemon seems (please correct me if I'm wrong!)
> to find its local interfaces at start-up and listen explicitly and only on
> those.  If other things happen, such as heartbeat adding and removing
> interfaces, then that doesn't get picked up by 'bind'.  (I think this
> behaviour of bind is a deliberate feature, not a bug.)  So although
> 'heartbeat' might migrate that public IP address for you onto a new
> machines, 'bind' (by default) won't listen on it.
> 
> There is a 'bind' option called 'interface-interval' which tells it to
> re-scan for changes in interfaces every 'n' minutes.  If heartbeat adjusts
> the interfaces (e.g.  importing the 'public' IP address) then 'bind'
> should pick it up within that interval.
> 
> It might be worth investigating that.

David,

Thanks for the reply.

We're actually using djb's dnscache and tinydns.

dnscache is on a different IP/interface to the one used for the mysql
failover. tinydns is on 127.0.0.1

So, I'm guessing that there must be something happening when heartbeat
re-jigs the interfaces when it fails-over that screws up the interface
on which dnscache is running.

> 
> 
>> Regardless, I can think of a couple of options we can implement to
>> prevent this scenario:
>>
>> 1. Have heartbeat restart dnscache and tinydns after a failover. How can
>> I do this?
>>
>> 2. Add a new resource failover over the dnscache/tinydns services.
>>
>> 3. Move dnscache and tinydns off the HA hosts onto a couple of "normal"
>> boxes.
> 
> Another aspect to consider is this:  Suppose you have two public IP
> addresses (/etc/resolv.conf on your clients); then host one address on
> each machine in your pair, active/active, and let heartbeat handle both
> addresses.  Perhaps include failback, so that when normal service resumes
> it reverts to active/active.
> 
> If one is master and the other secondary (as distinct from two
> secondaries), then presumably some sort of resource would be needed to
> restart with the revised named.conf configuration (which would need to be
> maintained and available across both machines).

We are considering something like that, i.e. having the interfaces on
which dnscache runs managed by heartbeat.

> When you get it working, it might be worth writing it up as an example for
> the heartbeat wiki or other documentation.

Sure - I'll see what I can do.

R.




More information about the Linux-HA mailing list