[PATCH 1/2][SRU][Disco] UBUNTU: SAUCE: shiftfs: add O_DIRECT support
Christian Brauner
christian.brauner at ubuntu.com
Tue Aug 13 11:48:17 UTC 2019
On Mon, Aug 12, 2019 at 09:45:19AM -0500, Seth Forshee wrote:
> On Mon, Aug 12, 2019 at 02:31:38PM +0200, Stefan Bader wrote:
> > On 26.07.19 14:59, Seth Forshee wrote:
> > > On Fri, Jul 26, 2019 at 10:29:38AM +0200, Christian Brauner wrote:
> > >> On Thu, Jul 25, 2019 at 03:26:01PM -0500, Seth Forshee wrote:
> > >>> On Mon, Jul 22, 2019 at 05:05:08PM +0200, Christian Brauner wrote:
> > >>>> On Fri, Jul 19, 2019 at 03:02:21PM -0500, Seth Forshee wrote:
> > >>>>> On Fri, Jul 19, 2019 at 05:50:46PM +0200, Christian Brauner wrote:
> > >>>>>> BugLink: https://bugs.launchpad.net/bugs/1837223
> > >>>>>>
> > >>>>>> This enabled O_DIRECT support for shiftfs if the underlay supports it.
> > >>>>>>
> > >>>>>> Currently shiftfs does not handle O_DIRECT if the underlay supports it.
> > >>>>>> This is blocking dqlite - an essential part of LXD - from profiting from
> > >>>>>> the performance benefits of O_DIRECT on suitable filesystems when used
> > >>>>>> with async io such as aio or io_uring.
> > >>>>>> Overlayfs cannot support this directly since the upper filesystem in
> > >>>>>> overlay can be any filesystem. So if the upper filesystem does not
> > >>>>>> support O_DIRECT but the lower filesystem does you're out of luck.
> > >>>>>> Shiftfs does not suffer from the same problem since there is not concept
> > >>>>>> of an upper filesystem in the same way that overlayfs has it.
> > >>>>>> Essentially, shiftfs is a transparent shim relaying everything to the
> > >>>>>> underlay while overlayfs' upper layer is not (completely).
> > >>>>>
> > >>>>> I get concerned any time shiftfs just copies up some non-trivial data
> > >>>>> from the lower filesystem, that shiftfs is going to get bypassed and
> > >>>>> some important metadata will not get propoerly updated in shiftfs. The
> > >>>>> question that immediately comes to mind in this case is whether i_size
> > >>>>> in the shiftfs indoe gets updated following an O_DIRECT write, and I
> > >>>>> suspect tha there are other similar cases to worry about. Have you
> > >>>>
> > >>>> I'm not following, what are "other similar cases to worry about" and are
> > >>>> those already exposed?
> > >>>
> > >>> I just mean anything else which might change in the underlay as a result
> > >>> of I/O, e.g. timestamps.
> > >>>
> > >>>> Re: i_size for O_DIRECT
> > >>>> I have not seen prior art for filesystems that use iomap for O_DIRECT
> > >>>> that would fiddle with i_size so I didn't want to get into that game.
> > >>>> That is to say, shiftfs only supports O_DIRECT if the underlying
> > >>>> filesystem is iomap based. This was the same approach that overlayfs
> > >>>> wanted to use.
> > >>>
> > >>> Are you saying that an O_DIRECT write that extends a file does not
> > >>> result in i_size being updated? That doesn't sound right ...
> > >>
> > >> For example, when overlayfs intended to introduce support for this
> > >> (which they didn't because of their version of upper and lower) they did
> > >> not mess with i_size. That's one of the reasons I didn't. Do you prefer
> > >> to copy up attributes to make sure we're not missing anything?
> > >
> > > I think that something like this:
> > >
> > > fd = open(path, O_RDWR|O_DIRECT);
> > > lseek(fd, 0, SEEK_END);
> > > write(fd, data, size);
> > > fstat(fd, &stat);
> > > printf("%ld\n", (long)stat.st_size);
> > >
> > > Should print out the correct size (and similarly correct data for other
> > > attributes). Maybe it will with this patch, I'm just not certain that it
> > > will, so really I'm just asking whether you've tested this sort of
> > > thing. If it doesn't work, then it seems like we probably do need to
> > > copy up the attributes.
> > >
> > > Seth
> > >
> > With the long discussion seemingly stopping unresolved I am not sure this
> > patchset is ready for SRU or not.
>
> I think we are still waiting for Christian to get back with some test
> results here, so these patches should wait.
The test outlined above shows that the correct file size is reported.
Christian
More information about the kernel-team
mailing list