NACK/Cmnt: [SRU][kinetic:master][focal:hwe-5.13-next/hwe-5.15-next][PATCH] UBUNTU: SAUCE: overlayfs: fix incorrect mnt_id of files opened from map_files

Alexander Mikhalitsyn alexander.mikhalitsyn at virtuozzo.com
Thu Aug 4 08:05:09 UTC 2022


On Thu, 4 Aug 2022 09:53:35 +0200
Stefan Bader <stefan.bader at canonical.com> wrote:

> On 03.08.22 18:52, Alexander Mikhalitsyn wrote:
> > On Wed, 3 Aug 2022 16:48:17 +0200
> > Stefan Bader <stefan.bader at canonical.com> wrote:
> > 
> >> On 03.08.22 14:19, Alexander Mikhalitsyn wrote:
> >>> From: Alexander Mikhalitsyn <alexander at mihalicyn.com>
> >>>
> >>> BugLink: https://bugs.launchpad.net/bugs/1967924
> >>> BugLink: https://bugs.launchpad.net/bugs/1857257
> >>>
> >>> The hack was introduced in ("UBUNTU: SAUCE: overlayfs: allow with
> >>> shiftfs as underlay") and it broke checkpoint/restore of docker
> >>> contains:
> >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857257
> >>>
> >>> The following script can be used to trigger the issue:
> >>>     #!/bin/bash
> >>>
> >>>     cat > test.py << EOF
> >>>     import sys
> >>>
> >>>     f = open("/proc/self/maps")
> >>>
> >>>     for l in f.readlines():
> >>>       if "python" not in l:
> >>>         continue
> >>>       print(l)
> >>>       s = l.split()
> >>>       start, end = s[0].split("-")
> >>>       fname = s[-1]
> >>>       print(start, end, fname)
> >>>       break
> >>>     else:
> >>>       sys.exit(1)
> >>>
> >>>     test_file1 = open(fname)
> >>>     test_file2 = open("/proc/self/map_files/%s-%s" % (start, end))
> >>>
> >>>     fdinfo1 = open("/proc/self/fdinfo/%d" % test_file1.fileno()).read()
> >>>     fdinfo2 = open("/proc/self/fdinfo/%d" % test_file2.fileno()).read()
> >>>
> >>>     if fdinfo1 != fdinfo2:
> >>>       print("FAIL")
> >>>       print(test_file1)
> >>>       print(fdinfo1)
> >>>       print(test_file2)
> >>>       print(fdinfo2)
> >>>       sys.exit(1)
> >>>     print("PASS")
> >>>     EOF
> >>>     sudo docker run -it --privileged --rm -v `pwd`:/mnt python python /mnt/test.py
> >>>
> >>> Thanks to Andrei Vagin for the reproducer and investigation of this problem.
> >>>
> >>> Cc: Andrei Vagin <avagin at gmail.com>
> >>> Cc: Adrian Reber <areber at redhat.com>
> >>> Cc: Christian Brauner <christian.brauner at ubuntu.com>
> >>> Cc: Stefan Bader <stefan.bader at canonical.com>
> >>> Cc: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> >>>
> >>> Fixes: d24b8a5 ("UBUNTU: SAUCE: overlayfs: allow with shiftfs as underlay")
> >>> Signed-off-by: Alexander Mikhalitsyn <alexander at mihalicyn.com>
> >>> Acked-by: Seth Forshee <seth.forshee at canonical.com>
> >>> Acked-by: Stefan Bader <stefan.bader at canonical.com>
> >>> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski at canonical.com>
> >>> Signed-off-by: Kelsey Skunberg <kelsey.skunberg at canonical.com>
> >>> [small refactoring to add the dependency of AUFS for vma->vm_prfile]
> >>> Signed-off-by: Andrea Righi <andrea.righi at canonical.com>
> >>> Signed-off-by: Paolo Pisati <paolo.pisati at canonical.com>
> >>>
> >>> UBUNTU: SAUCE: overlayfs: prevent dereferencing struct file in ovl_vm_prfile_set()
> >>>
> >>> BugLink: https://bugs.launchpad.net/bugs/1973620
> >>>
> >>> With the following commit we re-introduced a SAUCE patch that has been
> >>> dropped starting with 5.13:
> >>>
> >>>    37e9bac9203b ("UBUNTU: SAUCE: overlayfs: fix incorrect mnt_id of files opened from map_files")
> >>>
> >>> However the forward-ported patch introduced a potential NULL pointer
> >>> dereference bug:
> >>>
> >>> BUG: kernel NULL pointer dereference, address: 0000000000000008
> >>> [  447.039738] #PF: supervisor read access in kernel mode
> >>> [  447.040369] #PF: error_code(0x0000) - not-present page
> >>> [  447.041002] PGD 0 P4D 0
> >>> [  447.041325] Oops: 0000 [#1] SMP NOPTI
> >>> [  447.041798] CPU: 0 PID: 73766 Comm: sudo Not tainted 5.15.0-28-generic #29~20.04.1-Ubuntu
> >>> [  447.042800] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Ubuntu-1.8.2-1ubuntu1+esm1 04/01/2014
> >>> [  447.043979] RIP: 0010:aa_file_perm+0x3a/0x470
> >>> [  447.044565] Code: 54 53 48 83 ec 68 48 89 7d 80 89 4d 8c 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 63 05 01 0a 19 01 48 03 82 c0 00 00 00 <4c> 8b 68 08 f6 46 40 02 0f 85 d0 00 00 00 41 f6 45 40 02 0f 85 c5
> >>> [  447.046837] RSP: 0018:ffffaefe80a4bca8 EFLAGS: 00010246
> >>> [  447.047481] RAX: 0000000000000000 RBX: ffff96e4038abd01 RCX: 0000000000000004
> >>> [  447.048351] RDX: ffff96e4038abd00 RSI: ffff96e401215eb8 RDI: ffffffff9c22a2ac
> >>> [  447.049241] RBP: ffffaefe80a4bd38 R08: 0000000000000000 R09: 0000000000000000
> >>> [  447.050121] R10: 0000000000000000 R11: 0000000000000000 R12: ffff96e401215eb8
> >>> [  447.051040] R13: ffff96e4038abd00 R14: ffffffff9c22a2ac R15: 0000000000000004
> >>> [  447.051942] FS:  00007eff3c0f8c80(0000) GS:ffff96e45e400000(0000) knlGS:0000000000000000
> >>> [  447.052981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [  447.053696] CR2: 0000000000000008 CR3: 0000000002be2000 CR4: 00000000003506f0
> >>> [  447.054571] Call Trace:
> >>> [  447.054883]  <TASK>
> >>> [  447.055154]  ? unlock_page_memcg+0x2f/0x40
> >>> [  447.055668]  ? page_remove_rmap+0x4b/0x320
> >>> [  447.056180]  common_file_perm+0x72/0x170
> >>> [  447.056669]  apparmor_file_permission+0x1c/0x20
> >>> [  447.057237]  security_file_permission+0x30/0x1a0
> >>> [  447.057898]  rw_verify_area+0x35/0x60
> >>> [  447.058392]  vfs_read+0x6d/0x1a0
> >>> [  447.058842]  ksys_read+0xb1/0xe0
> >>> [  447.059276]  __x64_sys_read+0x1a/0x20
> >>> [  447.059732]  do_syscall_64+0x5c/0xc0
> >>> [  447.060183]  ? __set_current_blocked+0x3b/0x60
> >>> [  447.060738]  ? exit_to_user_mode_prepare+0x3d/0x1c0
> >>> [  447.061434]  ? syscall_exit_to_user_mode+0x27/0x50
> >>> [  447.062099]  ? do_syscall_64+0x69/0xc0
> >>> [  447.062603]  ? irqentry_exit_to_user_mode+0x9/0x20
> >>> [  447.063210]  ? irqentry_exit+0x19/0x30
> >>> [  447.063678]  ? exc_page_fault+0x89/0x160
> >>> [  447.064165]  ? asm_exc_page_fault+0x8/0x30
> >>> [  447.064675]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>> [  447.065298] RIP: 0033:0x7eff3c2cb002
> >>>
> >>> This panic happens only when AUFS is enabled (that is required to
> >>> "activates" this feature).
> >>>
> >>> This bug happens because we don't need to decrement anymore the refcount
> >>> for the previous vm_file value in ovl_vm_prfile_set(). So make sure to
> >>> drop the offending fput() to prevent the kernel panic above.
> >>>
> >>> Signed-off-by: Andrea Righi <andrea.righi at canonical.com>
> >>> Acked-by: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> >>> Acked-by: Stefan Bader <stefan.bader at canonical.com>
> >>> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> >>>
> >>> UBUNTU: SAUCE: overlayfs: remove CONFIG_AUFS_FS dependency
> >>>
> >>> BugLink: https://bugs.launchpad.net/bugs/1967924
> >>> BugLink: https://bugs.launchpad.net/bugs/1857257
> >>>
> >>> BugLink: https://bugs.launchpad.net/bugs/1967924
> >>>
> >>> Right now we have a fix:
> >>> b07bc17b ("UBUNTU: SAUCE: overlayfs: fix incorrect mnt_id of files opened from map_files")
> >>> in master branch of Jammy kernel, but only formaly. Because these kernels compiled without
> >>> CONFIG_AUFS_FS set, so this fix just disabled. There is no need to make it dependent on
> >>> CONFIG_AUFS_FS option, because in all cases we have mm/prfile.c compiled-in.
> >>>
> >>> Cc: Andrei Vagin <avagin at gmail.com>
> >>> Cc: Adrian Reber <areber at redhat.com>
> >>> Cc: Christian Brauner <christian at brauner.io>
> >>> Cc: Stefan Bader <stefan.bader at canonical.com>
> >>> Cc: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> >>> Cc: Andrea Righi <andrea.righi at canonical.com>
> >>>
> >>> Fixes: b07bc17b ("UBUNTU: SAUCE: overlayfs: fix incorrect mnt_id of files opened from map_files")
> >>>
> >>> Fixups for ("UBUNTU: SAUCE: overlayfs: fix incorrect mnt_id of files opened from map_files"):
> >>> - ("UBUNTU: SAUCE: overlayfs: remove CONFIG_AUFS_FS dependency")
> >>> - ("UBUNTU: SAUCE: overlayfs: prevent dereferencing struct file in ovl_vm_prfile_set()")
> >>> was squashed. It should make ubuntu kernel maintainers work a little bit easier :-)
> >>>
> >>> I believe that we need to make something similar for jammy/linux too.
> >>>
> >>> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
> >>
> >> For one this commit message seems to get out of bounds and includes multiple
> >> signed-off-by areas completely. I am getting lost there. Second hwe-5.13 is EOL
> > 
> > Got it, let's omit hwe-5.13.
> > 
> > Understand your point, it's because it's a squash of a few fixups. From my point of view
> > we can just remove all commit descriptions from fixup patches and keep only the original one.
> > 
> >> and neither should hwe-5.15 get patched if something should be done directly to
> >> jammy. And lastly master/-next are branch names or part of it and are irrelevant
> >> for submissions. This should be kinetic/jammy nothing else.
> > 
> > Yep, I understand from your and Andrea explanations that hwe-5.15 branch is inherited from the jammy. But for some reason at this point of time hwe-5.15 branch doesn't contain
> > ("UBUNTU: SAUCE: overlayfs: fix incorrect mnt_id of files opened from map_files")
> > patch at all. But jammy kernel (which is the parent for this branch AFAIU) has this patch!
> > 
> > I'm not sure how to handle it properly. All that I want is to be fully clear about which patches should be applied to which kernels to prevent any possible degradations or problems.
> > 
> > Sorry for inconvenience and wrong patch headers :)
> 
> No worries. Just things need to be arranged in a way that is sensible when 
> looking at it tomorrow. I think all the pieces required technically are there 
> and Andrea did volunteer to try getting this into the formal form we need. That 
> would include all parts missing in jammy so hwe-5.15 gets fixed up naturally. IT 
> could be a little time depending on other tasks.

I understand you guys, kernel maintainance is really hard work. And keeping patches description and metadata in a good fit safes from many problems with rebasing.

>That 
> would include all parts missing in jammy so hwe-5.15 gets fixed up naturally.

Great!

Regards,
Alex

> 
> Thanks,
> Stefan
> > 
> > Regards,
> > Alex
> > 
> >>
> >> -Stefan
> >>
> >> PS: Not sure all on cc: really are interested in seeing this. And some never
> >> will because the are no longer working where they used to work.
> >>
> >>> ---
> >>>    fs/overlayfs/file.c | 30 ++++++++++++++++++++++++++++++
> >>>    1 file changed, 30 insertions(+)
> >>>
> >>> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> >>> index d0c96ca0202e..7024771dcae8 100644
> >>> --- a/fs/overlayfs/file.c
> >>> +++ b/fs/overlayfs/file.c
> >>> @@ -490,6 +490,32 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> >>>    	return ret;
> >>>    }
> >>>    
> >>> +/*
> >>> + * In map_files_get_link() (fs/proc/base.c)
> >>> + * we need to determine correct path from overlayfs.
> >>> + * But real_mount(realfile->f_path.mnt) may be not
> >>> + * equal to real_mount(file->f_path.mnt). In such case
> >>> + * fdinfo of the same file which was opened from
> >>> + * /proc/<pid>/map_files/... and "usual" path
> >>> + * will show different mnt_id.
> >>> + *
> >>> + * We solve issue like in aufs by using additional
> >>> + * field on struct vm_area_struct called "vm_prfile"
> >>> + * which is used only for fdinfo/"printing" needs.
> >>> + *
> >>> + * See also mm/prfile.c
> >>> + */
> >>> +static void ovl_vm_prfile_set(struct vm_area_struct *vma,
> >>> +			      struct file *file)
> >>> +{
> >>> +	get_file(file);
> >>> +	vma->vm_prfile = file;
> >>> +#ifndef CONFIG_MMU
> >>> +	get_file(file);
> >>> +	vma->vm_region->vm_prfile = file;
> >>> +#endif
> >>> +}
> >>> +
> >>>    static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> >>>    {
> >>>    	struct file *realfile = file->private_data;
> >>> @@ -507,6 +533,10 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> >>>    	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> >>>    	ret = call_mmap(vma->vm_file, vma);
> >>>    	revert_creds(old_cred);
> >>> +
> >>> +	if (!ret)
> >>> +		ovl_vm_prfile_set(vma, file);
> >>> +
> >>>    	ovl_file_accessed(file);
> >>>    
> >>>    	return ret;
> >>
> > 
> 




More information about the kernel-team mailing list