[Linux-HA] arp not resolving to backup node after takeover

Steven A. Sullam steve at stevesullam.com
Tue Feb 27 12:06:34 MST 2007


Hi,

I just thought I would throw this out there and see what comes back.

I am using *heartbeat *with two nodes running fedora. Everything seems
to be working the way it should.

When I shutdown heartbeat on the primary node, the backup node
immediately adds the ip address of the main node and broadcasts arps
telling lan nodes to find the ip address at the new mac address. Yet
when I access my network from the internet my router sends me to the
dead main node when it should be sending me to the backup node. 

How can see if the arp table has been updated on my consumer grade
linksys router? I am running a program called linksysmon, but this only
show ip addresses coming in and out.

Here is my setup:

backup node
192.168.1.11----------\
172.16.1.2               \
        ^                       \
         |                         \----------------------lan
interface--router ------wan interface
        ^                        /
172.16.1.1                /
192.168.1.10-----------/
main node

Here is an excerpt from the log on the backup log when I shutdown
heartbeat on the main node. There are entries from both nodes, because I
am using remote logging. wolf01 is the backup node.

 
 11:24:41 wolf00 atd: atd shutdown failed 
 11:24:54 wolf01 heartbeat: [7112]: info: Received shutdown notice from 'mydomain.com'.
 11:24:54 wolf01 heartbeat: [7112]: info: Resources being acquired from mydomain.com.
 11:24:55 wolf01 heartbeat: [7495]: info: acquire local HA resources (standby).
 11:24:55 wolf01 heartbeat: [7496]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys wolf01] to acquire.
 11:24:55 wolf01 heartbeat: [7495]: info: local HA resource acquisition completed (standby).
 11:24:55 wolf01 heartbeat: [7112]: info: Standby resource acquisition done [foreign].
 11:24:55 wolf01 harc[7515]: [7518]: info: Running /etc/ha.d/rc.d/status status
 11:24:55 wolf01 mach_down[7521]: [7536]: info: Taking over resource group 192.168.1.10/24/eth0/192.168.1.255
 11:24:55 wolf01 ResourceManager[7537]: [7545]: info: Acquiring resource group: mydomain.com 192.168.1.10/24/eth0/192.168.1.255 atd
 11:24:55 wolf01 ResourceManager[7537]: [7583]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.10/24/eth0/192.168.1.255 start
 11:24:56 wolf01 IPaddr[7585]: [7633]: info: /sbin/ifconfig eth0:0 192.168.1.10  netmask 255.255.255.0    broadcast 192.168.1.255
 11:24:56 wolf01 avahi-daemon[2145]: Registering new address record for 192.168.1.10 on eth0.
 11:24:56 wolf01 IPaddr[7585]: [7638]: info: Sending Gratuitous Arp for 192.168.1.10 on eth0:0 [eth0]
 11:24:56 wolf01 IPaddr[7585]: [7639]: /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.1.10 eth0 192.168.1.10 auto 192.168.1.10 ffffffffffff
 11:24:56 wolf01 send_arp: [7642]: info: Enable using logging daemon
 11:24:56 wolf01 ResourceManager[7537]: [7664]: info: Running /etc/init.d/atd  start
 11:24:56 wolf01 mach_down[7521]: [7674]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
 11:24:56 wolf01 heartbeat: [7112]: info: mach_down takeover complete.
 11:24:56 wolf01 mach_down[7521]: [7677]: info: mach_down takeover complete for node mydomain.com.
 11:24:58 wolf00 last message repeated 10 times 
 11:24:58 wolf00 logd: [26666]: info: logd_term_write_action: received SIGTERM 
 11:24:58 wolf00 logd: [26666]: info: ha_logd: Exiting write process 
 11:24:58 wolf00 logd: [29041]: info: Waiting for pid=26665 to exit 
 11:24:59 wolf00 logd: [29041]: info: Pid 26665 exited 
 11:25:27 wolf01 ipfail: [7136]: info: Status update: Node mydomain.com now has status dead
 11:25:27 wolf01 heartbeat: [7112]: WARN: node mydomain.com: is dead
 11:25:27 wolf01 heartbeat: [7112]: info: Dead node mydomain.com gave up resources.
 11:25:27 wolf01 heartbeat: [7112]: info: Link mydomain.com:eth2 dead.
 11:25:27 wolf01 ipfail: [7136]: info: NS: We are still alive!
 11:25:28 wolf01 ipfail: [7136]: info: Link Status update: Link mydomain.com/eth2 now has status dead
 11:25:28 wolf01 ipfail: [7136]: info: Asking other side for ping node count.
 11:25:28 wolf01 ipfail: [7136]: info: Checking remote count of ping nodes.


Thanks much in advance!!




More information about the Linux-HA mailing list