NACK: [SRU][UNSTABLE/FOCAL/BIONIC][PATCH v2] UBUNTU: SAUCE: shiftfs: fix dentry revalidation

Christian Brauner christian.brauner at ubuntu.com
Tue May 19 09:15:50 UTC 2020


On Tue, May 19, 2020 at 10:52:10AM +0200, Stefan Bader wrote:
> On 19.05.20 08:02, Christian Brauner wrote:
> > On Tue, May 19, 2020 at 07:43:21AM +0200, Stefan Bader wrote:
> >> On 18.05.20 20:18, Christian Brauner wrote:
> >>> BugLink: https://bugs.launchpad.net/bugs/1879196
> >>>
> >>> While fixing [1] we introduced a regression and triggered [2] whereby
> >>> the underlay and overlay go out of sync because they disagree about the
> >>> dcache. Such situation easily arise when the underlay is modified
> >>> directly. This means the overlay needs to revalidate the dentry in the
> >>> dcache to catchup with the underlay.
> >>>
> >>> Note, in general it's not advisable to directly modify the underlay
> >>> while a shiftfs mount is on top. In some way this means we need to keep
> >>> two caches in sync and it's hard enough to keep a single cache happy.
> >>> But shiftfs' use-case is inherently prone to be used for exactly that.
> >>> So this is something we have to navigate carefully and honestly we have
> >>> no full model upstream that does the same. Overlayfs has the copy-up
> >>> behavior which let's it get around most of the issues but we don't have
> >>> it and ecryptfs is broken in such scenarios which we verified quite a
> >>> while back.
> >>> In any case, I built a kernel with this patch and re-ran all regressions
> >>> that are related to this that we have so far (cf.  [1], [2], and [3]).
> >>> None of them were reproducible with this patch here. So we still fix the
> >>> ESTALE issue but also keep underlay and overlay in sync.
> >>
> >> This test kernel should be provided to the bug reporter, so it can be confirmed
> >> to fix the issue there as well.
> > 
> > See
> > https://drive.google.com/open?id=19iTwaFSYNS95_I-gD_rvFoV9cMAfy6io
> > Elmo is also Cced here. I don't have any other place I can upload this,
> > I think.
> 
> Since it is James, you could use private-fileshare.c.c, otherwise people.c.c for
> providing test kernels. The reason I would like to see the reporter (did not
> look close enough to see its James initially, but even better) confirm the fix
> is that while you did regression test against the known issues, this might
> introduce something new. And there some unknown use pattern not directly aimed
> at something adds value.

It might be worth gathering all the regression tests I've posted into
the launchpad bugs into an out of Ubuntu-kernel-tree test-suite
somewhere but I frankly lack the time to do that at the moment. But I'll
keep it in mind.

The LXD test-suite is pretty good at detecting these regressions
usually but all bugs here are use-cases that are very rare. The first
bug was re-opening a deleted file through /proc/<pid>/fd/<nr>, the
second was unlinking a device file and then stat()ing the fd. In both cases
they take specific kernel paths. For the re-opening case it's the
LOOKUP_JUMPED path that hits d_weak_revalidate and for device files the
pinning works differently in the vfs as it does go through the device
layer (in convoluted ways).
We had Elmo's issue originally in one of the first test version of
shiftfs so we introduced agressive revalidation which then led to
scenarios where we caused the vfs to think a file had gone ESTALE. This
specifically was true for temporary files. That ESTALE is what prevented
re-openeing through /proc/<pid>/fd/<nr>.
The original change back from this agressive style or revalidation was
made under the assumption that we can essentially simply trust the
underlay's cache but that presupposes that the underlay isn't modified
directly but that squares with directory sharing.

So the new version is aggressively revalidating but handles the
d_weak_revalidate case which is hit for LOOKUP_JUMPED (and thus for
going through proc magic links) by not invalidating a dentry simply
because it's i_nlink count is 0. Where i_nlink count == 0 means that it
was deleted while we still held it open which is a fine thing to do and
backed by posix fs semantics of course. The problem with the original
version was - if my analysis is correct - that d_weak_revalidate() for
shiftfs returned 0 when i_nlink was 0. But that obviously doesn't matter
since someone can still hold a reference to the inode.

Christian



More information about the kernel-team mailing list