[3.11.y.z extended stable] Patch "vfs: allow umount to handle mountpoints without revalidating them" has been added to staging queue
NeilBrown
neilb at suse.de
Thu Aug 7 22:59:16 UTC 2014
On Thu, 7 Aug 2014 14:32:17 +0100 Luis Henriques
<luis.henriques at canonical.com> wrote:
> This is a note to let you know that I have just added a patch titled
>
> vfs: allow umount to handle mountpoints without revalidating them
>
> to the linux-3.11.y-queue branch of the 3.11.y.z extended stable tree
> which can be found at:
>
> http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.11.y-queue
>
> If you, or anyone else, feels it should not be added to this tree, please
> reply to this email.
>
> For more information about the 3.11.y.z tree, see
> https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable
>
> Thanks.
> -Luis
>
> ------
>
> >From 41ee9e50729fa43da90a04691dec3cabbb3171fe Mon Sep 17 00:00:00 2001
> From: Jeff Layton <jlayton at redhat.com>
> Date: Fri, 26 Jul 2013 06:23:25 -0400
> Subject: vfs: allow umount to handle mountpoints without revalidating them
>
> commit 8033426e6bdb2690d302872ac1e1fadaec1a5581 upstream.
>
> Christopher reported a regression where he was unable to unmount a NFS
> filesystem where the root had gone stale. The problem is that
> d_revalidate handles the root of the filesystem differently from other
> dentries, but d_weak_revalidate does not. We could simply fix this by
> making d_weak_revalidate return success on IS_ROOT dentries, but there
> are cases where we do want to revalidate the root of the fs.
>
> A umount is really a special case. We generally aren't interested in
> anything but the dentry and vfsmount that's attached at that point. If
> the inode turns out to be stale we just don't care since the intent is
> to stop using it anyway.
>
> Try to handle this situation better by treating umount as a special
> case in the lookup code. Have it resolve the parent using normal
> means, and then do a lookup of the final dentry without revalidating
> it. In most cases, the final lookup will come out of the dcache, but
> the case where there's a trailing symlink or !LAST_NORM entry on the
> end complicates things a bit.
>
> Cc: Neil Brown <neilb at suse.de>
> Reported-by: Christopher T Vogan <cvogan at us.ibm.com>
> Signed-off-by: Jeff Layton <jlayton at redhat.com>
> Signed-off-by: Al Viro <viro at zeniv.linux.org.uk>
> Cc: Chris Dunlop <chris at onthe.net.au>
> Signed-off-by: Luis Henriques <luis.henriques at canonical.com>
> ---
> fs/namei.c | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/namespace.c | 2 +-
> include/linux/namei.h | 1 +
> 3 files changed, 184 insertions(+), 1 deletion(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 2a2d0236f82a..a10bd2f8b66b 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2185,6 +2185,188 @@ user_path_parent(int dfd, const char __user *path, struct nameidata *nd,
> return s;
> }
>
> +/**
> + * umount_lookup_last - look up last component for umount
> + * @nd: pathwalk nameidata - currently pointing at parent directory of "last"
> + * @path: pointer to container for result
> + *
> + * This is a special lookup_last function just for umount. In this case, we
> + * need to resolve the path without doing any revalidation.
> + *
> + * The nameidata should be the result of doing a LOOKUP_PARENT pathwalk. Since
> + * mountpoints are always pinned in the dcache, their ancestors are too. Thus,
> + * in almost all cases, this lookup will be served out of the dcache. The only
> + * cases where it won't are if nd->last refers to a symlink or the path is
> + * bogus and it doesn't exist.
> + *
> + * Returns:
> + * -error: if there was an error during lookup. This includes -ENOENT if the
> + * lookup found a negative dentry. The nd->path reference will also be
> + * put in this case.
> + *
> + * 0: if we successfully resolved nd->path and found it to not to be a
> + * symlink that needs to be followed. "path" will also be populated.
> + * The nd->path reference will also be put.
> + *
> + * 1: if we successfully resolved nd->last and found it to be a symlink
> + * that needs to be followed. "path" will be populated with the path
> + * to the link, and nd->path will *not* be put.
> + */
> +static int
> +umount_lookup_last(struct nameidata *nd, struct path *path)
> +{
> + int error = 0;
> + struct dentry *dentry;
> + struct dentry *dir = nd->path.dentry;
> +
> + if (unlikely(nd->flags & LOOKUP_RCU)) {
> + WARN_ON_ONCE(1);
> + error = -ECHILD;
> + goto error_check;
> + }
> +
> + nd->flags &= ~LOOKUP_PARENT;
> +
> + if (unlikely(nd->last_type != LAST_NORM)) {
> + error = handle_dots(nd, nd->last_type);
> + if (!error)
> + dentry = dget(nd->path.dentry);
> + goto error_check;
> + }
> +
> + mutex_lock(&dir->d_inode->i_mutex);
> + dentry = d_lookup(dir, &nd->last);
> + if (!dentry) {
> + /*
> + * No cached dentry. Mounted dentries are pinned in the cache,
> + * so that means that this dentry is probably a symlink or the
> + * path doesn't actually point to a mounted dentry.
> + */
> + dentry = d_alloc(dir, &nd->last);
> + if (!dentry) {
> + error = -ENOMEM;
> + } else {
> + dentry = lookup_real(dir->d_inode, dentry, nd->flags);
> + if (IS_ERR(dentry))
> + error = PTR_ERR(dentry);
> + }
> + }
> + mutex_unlock(&dir->d_inode->i_mutex);
> +
> +error_check:
> + if (!error) {
> + if (!dentry->d_inode) {
> + error = -ENOENT;
> + dput(dentry);
> + } else {
> + path->dentry = dentry;
> + path->mnt = mntget(nd->path.mnt);
> + if (should_follow_link(dentry->d_inode,
> + nd->flags & LOOKUP_FOLLOW))
> + return 1;
> + follow_mount(path);
> + }
Above code is buggy and fixed by
commit 295dc39d941dc2ae53d5c170365af4c9d5c16212
Author: Vasily Averin <vvs at parallels.com>
Date: Mon Jul 21 12:30:23 2014 +0400
fs: umount on symlink leaks mnt count
so be sure to include that patch too.
NeilBrown
> + }
> + terminate_walk(nd);
> + return error;
> +}
> +
> +/**
> + * path_umountat - look up a path to be umounted
> + * @dfd: directory file descriptor to start walk from
> + * @name: full pathname to walk
> + * @flags: lookup flags
> + * @nd: pathwalk nameidata
> + *
> + * Look up the given name, but don't attempt to revalidate the last component.
> + * Returns 0 and "path" will be valid on success; Retuns error otherwise.
> + */
> +static int
> +path_umountat(int dfd, const char *name, struct path *path, unsigned int flags)
> +{
> + struct file *base = NULL;
> + struct nameidata nd;
> + int err;
> +
> + err = path_init(dfd, name, flags | LOOKUP_PARENT, &nd, &base);
> + if (unlikely(err))
> + return err;
> +
> + current->total_link_count = 0;
> + err = link_path_walk(name, &nd);
> + if (err)
> + goto out;
> +
> + /* If we're in rcuwalk, drop out of it to handle last component */
> + if (nd.flags & LOOKUP_RCU) {
> + err = unlazy_walk(&nd, NULL);
> + if (err) {
> + terminate_walk(&nd);
> + goto out;
> + }
> + }
> +
> + err = umount_lookup_last(&nd, path);
> + while (err > 0) {
> + void *cookie;
> + struct path link = *path;
> + err = may_follow_link(&link, &nd);
> + if (unlikely(err))
> + break;
> + nd.flags |= LOOKUP_PARENT;
> + err = follow_link(&link, &nd, &cookie);
> + if (err)
> + break;
> + err = umount_lookup_last(&nd, path);
> + put_link(&nd, &link, cookie);
> + }
> +out:
> + if (base)
> + fput(base);
> +
> + if (nd.root.mnt && !(nd.flags & LOOKUP_ROOT))
> + path_put(&nd.root);
> +
> + return err;
> +}
> +
> +/**
> + * user_path_umountat - lookup a path from userland in order to umount it
> + * @dfd: directory file descriptor
> + * @name: pathname from userland
> + * @flags: lookup flags
> + * @path: pointer to container to hold result
> + *
> + * A umount is a special case for path walking. We're not actually interested
> + * in the inode in this situation, and ESTALE errors can be a problem. We
> + * simply want track down the dentry and vfsmount attached at the mountpoint
> + * and avoid revalidating the last component.
> + *
> + * Returns 0 and populates "path" on success.
> + */
> +int
> +user_path_umountat(int dfd, const char __user *name, unsigned int flags,
> + struct path *path)
> +{
> + struct filename *s = getname(name);
> + int error;
> +
> + if (IS_ERR(s))
> + return PTR_ERR(s);
> +
> + error = path_umountat(dfd, s->name, path, flags | LOOKUP_RCU);
> + if (unlikely(error == -ECHILD))
> + error = path_umountat(dfd, s->name, path, flags);
> + if (unlikely(error == -ESTALE))
> + error = path_umountat(dfd, s->name, path, flags | LOOKUP_REVAL);
> +
> + if (likely(!error))
> + audit_inode(s, path->dentry, 0);
> +
> + putname(s);
> + return error;
> +}
> +
> /*
> * It's inline, so penalty for filesystems that don't use sticky bit is
> * minimal.
> diff --git a/fs/namespace.c b/fs/namespace.c
> index a45ba4f267fe..ad8ea9bc2518 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1318,7 +1318,7 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
> if (!(flags & UMOUNT_NOFOLLOW))
> lookup_flags |= LOOKUP_FOLLOW;
>
> - retval = user_path_at(AT_FDCWD, name, lookup_flags, &path);
> + retval = user_path_umountat(AT_FDCWD, name, lookup_flags, &path);
> if (retval)
> goto out;
> mnt = real_mount(path.mnt);
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 5a5ff57ceed4..cd09751c71a0 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -58,6 +58,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
>
> extern int user_path_at(int, const char __user *, unsigned, struct path *);
> extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
> +extern int user_path_umountat(int, const char __user *, unsigned int, struct path *);
>
> #define user_path(name, path) user_path_at(AT_FDCWD, name, LOOKUP_FOLLOW, path)
> #define user_lpath(name, path) user_path_at(AT_FDCWD, name, 0, path)
> --
> 1.9.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20140808/e0e1bd0f/attachment.sig>
More information about the kernel-team
mailing list