[Linux-HA] Can't get NFS locks to survive failovers

Dave Dykstra dwdha at drdykstra.us
Thu Jul 21 12:53:21 MDT 2005


A few weeks ago Alan asked me to revise the http://linux-ha/HaNFS web page
but I haven't done much on it yet mainly because I'm struggling with how
to present all that info about migrating locks from one machine to another
because I haven't had the same experience that the web page talks about.

First of all, every time I try to failover when a lock is being held the
active server refuses to unmount the file system and forces a self-induced
reboot -- that's alluded to on the page as being something that sometimes
happens, but it happens every time for me.  More significantly, although
I've tried to follow the instructions on the page I've never seen a lock
successfully migrated from the active server to the standby server after
the latter takes over.  It recently occurred to me that I've probably been
trying only clients that use NFS-over-TCP.  I've got one client running
NFS-over-UDP, maybe trying that one would make the difference although
one would hope not.  I'll try that when I get a chance during off hours
when nobody else is using the cluster.

If anybody has any other suggestions I'd appreciate hearing them.
I'm using the NFS kernel server on Debian Linux with kernel 2.6.11.10
on the HA servers, and have tried both the same Linux kernel and Solaris
9 as clients.  If any of you want to try some experiments, below is the
program I wrote to test the locking.  I call it on one client with a long
delay, see that a second client fails to obtain the lock on the same file,
do a failover, and then try the lock again on the second client and always
see that it is then able to obtain the lock.

- Dave

#include "unistd.h"
#include "stdio.h"
#include "errno.h"
#include "fcntl.h"

int main(int argc, char **argv)
{
    int fd;
    struct flock fl;
    int delay = 0;

    if ((argc != 2) && (argc != 3)) {
        fprintf(stderr,"Usage: fcntlsetlk filename [delay]\n");
        return(2);
    }
    if (argc == 3)
        delay = atoi(argv[2]);
    fd = open(argv[1], O_RDWR, 0666);
    if (fd == -1) {
        perror(argv[1]);
        return(1);
    }
    fl.l_type = F_WRLCK;
    fl.l_whence = SEEK_SET;
    fl.l_start = 0;
    fl.l_len = 2147483647;
    fl.l_pid = 0;

    if (fcntl(fd, F_SETLK, &fl) == -1) {
        perror("flock");
        return(1);
    }
    sleep(delay);
    return(0);
}




More information about the Linux-HA mailing list