[Linux-HA] Linux HA - DRBD - NFS

Karl Kloppenborg karl at crucialp.com
Mon Oct 4 17:44:53 MDT 2010


Hi Linux HA users,

We have setup a two node DRBD cluster running DRBD / Heartbeat and NFS.

Every now and again the Linux ha will suddenly NFS crash and people will  not be able to read / write to the shared folders, when it goes into this mode, running rpcinfo -p on the float IP shows NFS daemons not RPC accessible like shown...

----------------WHEN EVERYTHING IS FINE---------------------------
[someServer]# rpcinfo -p X.X.X.X (Float IP)
   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp    744  status
    100024    1   tcp    747  status
    100011    1   udp    808  rquotad
    100011    2   udp    808  rquotad
    100011    1   tcp    811  rquotad
    100011    2   tcp    811  rquotad
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100021    1   udp  58872  nlockmgr
    100021    3   udp  58872  nlockmgr
    100021    4   udp  58872  nlockmgr
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100021    1   tcp  36993  nlockmgr
    100021    3   tcp  36993  nlockmgr
    100021    4   tcp  36993  nlockmgr
    100005    1   udp    847  mountd
    100005    1   tcp    850  mountd
    100005    2   udp    847  mountd
    100005    2   tcp    850  mountd
    100005    3   udp    847  mountd
    100005    3   tcp    850  mountd


------------------------------------------------------------------------------------------

Then when it fails:
------------------------------------------------------------------------------------------
[SomeServer] rcpinfo -p X.X.X.X (Float IP)

   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp    694  status
    100024    1   tcp    697  status

------------------------------------------------------------------------------------------

I try restarting the daemons and all processes I can think of but unless I reboot it doesn't come back up... 

My system setup is as follows:

2x 64bit Centos5 bare bones standard install without Dialup Networking support.

DRBD83 with the kernel module

---------------------ha.cf------------------
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 5
initdead 20
bcast eth1
udpport 694
auto_failback on
node storage1.clusterfarm.net.au
node storage2.clusterfarm.net.au
---------------------------------------------

-------------HARESOURCES-----------------
storage1.clusterfarm.net.au IPaddr::[FLOATIP]/24/eth1 drbddisk::repdata Filesystem::/dev/drbd0::/storage::ext3  portmap   nfslock  nfs rpcidmapd
-----------------------------------------------------


----------DRBD CONF---------------
global { usage-count yes; }
common { syncer { rate 100M; al-extents 257; } }
resource repdata {
        protocol C;
        handlers { pri-on-incon-degr "halt -f"; }
        disk { on-io-error detach; }
        startup { degr-wfc-timeout 60; wfc-timeout 60; }

        on storage1.clusterfarm.net.au {
                address X.X.X.X:7788;
                device /dev/drbd0;
                disk /dev/sda6;
                meta-disk internal;
        }

        on storage2.clusterfarm.net.au {
                address X.X.X.X:7788;
                device /dev/drbd0;
                disk /dev/sda6;
                meta-disk internal;
        }
}
------------------------------------------------------


If anyone can shed light on this issue I would be MOST appreciative!


Thank you in advance,

Karl Kloppenborg
Head of Development
Phone: 1300 884 839 (AU Only - Business Hours)
Website: AU http://www.crucial.com.au| US http://www.crucialp.com






More information about the Linux-HA mailing list