[Bug 1274678] Re: Unable to unmount nfsv3 when server is inaccessible

John Gilmore 1274678 at bugs.launchpad.net
Wed Jun 17 21:04:39 UTC 2015


This error occurs on an NFS client machine when one of its NFS servers
stops responding (e.g. is powered-off). The umount command provides a -l
(lazy) option that is supposed to disconnect the mount point from the
system so that no future commands that access the file system will hang
due to the unresponsive NFS server. This is supposed to work even when
the NFS server is not responding. The problem is that a sub-library used
by the umount command is doing a readlink() on the filesystem, which
causes a hang before umount can actually unmount the filesystem.

The umount program uses a helper program called /sbin/umount.nfs (which
is a symlink to /sbin/mount.nfs, and both are part of the nfs-common
package), and that's where the bug lies. When you do:

  umount -l /images

and /images is an NFS mount, umount invokes:

  /sbin/umount.nfs /images -l

and /sbin/umount.nfs does the readlink, which can easily be verified by
running the umount command under strace -f. Here is a GDB backtrace of
/sbin/umount.nfs when it hangs:

(gdb) run /images -l
Starting program: /sbin/umount.nfs /images -l
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

Breakpoint 1, readlink () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 readlink () at ../sysdeps/unix/syscall-template.S:81
#1 0xb7768a0a in ?? () from /lib/i386-linux-gnu/libmount.so.1
#2 0xb7756073 in mnt_resolve_path () from /lib/i386-linux-gnu/libmount.so.1
#3 0xb7762887 in ?? () from /lib/i386-linux-gnu/libmount.so.1
#4 0xb77666f0 in mnt_context_prepare_umount ()
   from /lib/i386-linux-gnu/libmount.so.1
#5 0x0804afc1 in ?? ()
#6 0xb75b9a83 in __libc_start_main (main=0x804ad10, argc=3, argv=0xbfcc2e34,
    init=0x8057b40, fini=0x8057bb0, rtld_fini=0xb77cc180 <_dl_fini>,
    stack_end=0xbfcc2e2c) at libc-start.c:287
#7 0x0804b4fc in ?? ()

The actual readlink() call seems to occur in libmount, from a function under mnt_resolve_path. If, under GDB, I cause
the readlink function to artificially return -1 rather than do the system call, the rest of the program succeeds in
lazily unmounting the hung filesystem.

I don't know if the proper fix is for /sbin/umount.nfs to avoid calling mnt_resolve_path, or to pass it a parameter that says, "Don't touch that filesystem while trying to resolve the path!!!". I will leave that to the maintainers. All I know is that it hangs
forever if you let the readlink system call occur, but it does the job it's supposed to do if you breakpoint at the readlink and
do "return (int)-1" and "continue" in the debugger.

** Summary changed:

- Unable to unmount nfsv3 when server is inaccessible 
+ Unable to lazy unmount nfsv3 when server is inaccessible

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/1274678

Title:
  Unable to lazy unmount nfsv3 when server is inaccessible

Status in nfs-utils package in Ubuntu:
  Confirmed

Bug description:
  Afer haveing upgraded to Ubuntu 13.10 (3.11.0-15-generic) I an unable
  to unmount  nfsv3.

  Setting: My NAS is only running a few hours per day. I have a script
  mounting the NAS share when the NAS becomes available on the network.
  The script also tries to  unmount the shares when the NAS shut downs.
  This configuration was running without any problems under Ubuntu
  12.10. After upgrading to 13.10 the unmounting is not working anymore
  and the computer is even not shuting down completely. I used to mount
  "hard" without any problems now I tried "soft" but no success.
  Unmounting works perfect as long as NAS is accessible.

  
  georg at InaUbuntu:~$ mount
  ...
  192.168.1.10:/media on /mnt/NAS/media type nfs (rw,soft,intr,tcp,actimeo=3,addr=192.168.1.10)
  192.168.1.10:/backup on /mnt/NAS/backup type nfs (rw,soft,intr,tcp,actimeo=3,addr=192.168.1.10)

  
  == NAS is turned off ===

  georg at InaUbuntu:~$ lsof /mnt/NAS/backup

  "stalled"

  georg at InaUbuntu:~$ fuser -v /mnt/NAS/backup

  "stalled"

  georg at InaUbuntu:~$ sudo umount -l /mnt/NAS/backup
  [sudo] password for georg:

  "stalled"

  georg at InaUbuntu:~$ sudo umount -f /mnt/NAS/backup

  "stalled"

  georg at InaUbuntu:~$ sudo umount -l -f /mnt/NAS/backup

  "stalled"

  /Var/syslog & dmesg remains "clean", no nfs related messages

  
  georg at InaUbuntu:~$ nfsstat -m
  /mnt/NAS/backup from 192.168.1.10:/backup
   Flags:	rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,acregmax=3,acdirmin=3,acdirmax=3,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.10,mountvers=3,mountport=3244,mountproto=tcp,local_lock=none,addr=192.168.1.10

  /mnt/NAS/media from 192.168.1.10:/media
   Flags:	rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,acregmax=3,acdirmin=3,acdirmax=3,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.10,mountvers=3,mountport=3244,mountproto=tcp,local_lock=none,addr=192.168.1.10

  
  Any comments & help are highly appreciated. I cannot use nfs4 as it is not supported by NAS.

  Best regards,

  Georg

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1274678/+subscriptions



More information about the foundations-bugs mailing list