[Bug 1979885] Re: /etc.nfs.conf fails for nfsv4 server / blkmapd dumps core
Andreas Hasenack
1979885 at bugs.launchpad.net
Thu Oct 20 18:32:22 UTC 2022
** Description changed:
[ Impact ]
Under certain conditions, blkmapd can crash due to calling free() on a
pointer that wasn't malloc()ed. The reproducer went as far as isolating
it to having LVM Logical Volumes on SCSI disks, but the code flaw is
clear.
The struct bl_serial *serial structure is allocated via
bl_create_scsi_string() which does a malloc for it, but the code later
on was doing a free() on the data element of this structure and only
then on the structure itself. That first free() is incorrect, as the
data element was never malloc()ed separatedly.
This was first brought up by lixiaokeng via
https://www.spinics.net/lists/linux-nfs/msg87598.html, but not
acknowledged back then. The patch selected for this SRU is slightly
simpler and more suited for an SRU.
[ Test Plan ]
Create a VM for the ubuntu release under test. What's important is that
this VM has a SCSI device, not VIRTIO. You can add one after the VM is
created, as it must not be the root disk because we will use it as an
LVM volume group, i.e., all data on it will be erased.
You may have to install the kernel extra modules package for the scsi
device to appear:
sudo apt install linux-modules-extra-$(uname -r)
After a reboot, locate the scsi device. In this example, we will use
/dev/sda.
Partition it:
sudo sgdisk -Z /dev/sda
Create an LVM group and volume:
sudo pvcreate /dev/sda
sudo vgcreate vg0 /dev/sda
sudo lvcreate -ntest -L100M vg0
Install nfs-kernel-server:
sudo apt install nfs-kernel-server
The status of the nfs-blkmap service should already show a failure:
systemctl status nfs-blkmap.service
...
Oct 20 18:12:12 j-blkmapd-crash systemd[1]: nfs-blkmap.service: Main process exited, code=dumped, status=6/ABRT
Oct 20 18:12:12 j-blkmapd-crash systemd[1]: nfs-blkmap.service: Failed with result 'core-dump'.
To confirm, run it interactively:
$ sudo blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
double free or corruption (out)
Aborted
-
- With the fixed packages, it should be running after install. It can also be tried out interactively again just to be sure:
+ With the fixed packages, it should be running after install. It can also
+ be tried out interactively again just to be sure:
sudo systemctl stop nfs-blkmap
sudo blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
The failure to open the blocklayout file is not a problem in this case,
and is unrelated to the bug this SRU is fixing.
+ [ Where problems could occur ]
+ Restarting an NFS server can be tricky: connected clients might experience a "blip" in the service, or even hang in the worst case. Also depending on the NFS version being served (3 or 4), multiple services are involved, and the restart can expose a bug in the ordering in which these services are stopped and come back online.
- [ Where problems could occur ]
-
- * Think about what the upload changes in the software. Imagine the change is
- wrong or breaks something else: how would this show up?
-
- * It is assumed that any SRU candidate patch is well-tested before
- upload and has a low overall risk of regression, but it's important
- to make the effort to think about what ''could'' happen in the
- event of a regression.
-
- * This must '''never''' be "None" or "Low", or entirely an argument as to why
- your upload is low risk.
-
- * This both shows the SRU team that the risks have been considered,
- and provides guidance to testers in regression-testing the SRU.
[ Other Info ]
-
- * Anything else you think is useful to include
- * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board
- * and address these questions in advance
+ I didn't continue the investigation about other scenarios where this could be happening, or why it did not happen with a VIRTIO device, as the SCSI case was enough to reproduce the problem and show where the bug was.
[Original Description]
When using the 22.04 /etc/nfs.conf an nfsv4 server fails to operate
It kind of works but some clients fail and try nfsv3 ports
symptoms:
on boot:
× nfs-blkmap.service - pNFS block layout mapping daemon
Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; enabled; vendor preset: enabled)
Active: failed (Result: core-dump) since Sat 2022-06-25 07:14:34 PDT; 27min ago
journalctl --catalog --pager-end --unit=nfs-blkmap.service
Jun 25 07:14:34 c68z blkmapd[2386154]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
on systemctl restart nfs-server.service:
○ rpc-svcgssd.service - RPC security service for NFS server
Loaded: loaded (/lib/systemd/system/rpc-svcgssd.service; static)
Active: inactive (dead) since Fri 2022-06-24 19:07:31 PDT; 12h ago
after boot it was:
● rpc-svcgssd.service - RPC security service for NFS server
Loaded: loaded (/lib/systemd/system/rpc-svcgssd.service; static)
Active: active (running) since Sat 2022-06-25 08:27:27 PDT; 2min 7s ago
Some clients tries to access port 111 which is not used by nfs4 on the
network
ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.15.0-40-generic 5.15.0-40.43
ProcVersionSignature: Ubuntu 5.15.0-40.43-generic 5.15.35
Uname: Linux 5.15.0-40-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl icp
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D2', '/dev/snd/pcmC0D10p', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
Date: Sat Jun 25 08:37:48 2022
HibernationDevice: RESUME=none
MachineType: Apple Inc. Macmini8,1
ProcEnviron:
SHELL=/bin/bash
LANG=en_US.UTF-8
TERM=screen
PATH=(custom, no user)
ProcFB: 0 i915drmfb
ProcKernelCmdLine: root=ZFS=rpool/ROOT/ubuntu_mc4at7 ro initrd=EFI\hostname\initrd.img
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
linux-restricted-modules-5.15.0-40-generic N/A
linux-backports-modules-5.15.0-40-generic N/A
linux-firmware 20220329.git681281e4-0ubuntu3.2
RfKill:
0: hci0: Bluetooth
Soft blocked: no
Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/24/2022
dmi.bios.release: 0.1
dmi.bios.vendor: Apple Inc.
dmi.bios.version: 1731.120.10.0.0 (iBridge: 19.16.15071.0.0,0)
dmi.board.name: Mac-7BA5B2DFE22DDD8C
dmi.board.vendor: Apple Inc.
dmi.board.version: Macmini8,1
dmi.chassis.type: 9
dmi.chassis.vendor: Apple Inc.
dmi.chassis.version: Mac-7BA5B2DFE22DDD8C
dmi.modalias: dmi:bvnAppleInc.:bvr1731.120.10.0.0(iBridge19.16.15071.0.0,0):bd04/24/2022:br0.1:svnAppleInc.:pnMacmini8,1:pvr1.0:rvnAppleInc.:rnMac-7BA5B2DFE22DDD8C:rvrMacmini8,1:cvnAppleInc.:ct9:cvrMac-7BA5B2DFE22DDD8C:sku:
dmi.product.family: Mac mini
dmi.product.name: Macmini8,1
dmi.product.version: 1.0
dmi.sys.vendor: Apple Inc.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/1979885
Title:
/etc.nfs.conf fails for nfsv4 server / blkmapd dumps core
Status in nfs-utils package in Ubuntu:
In Progress
Bug description:
[ Impact ]
Under certain conditions, blkmapd can crash due to calling free() on a
pointer that wasn't malloc()ed. The reproducer went as far as
isolating it to having LVM Logical Volumes on SCSI disks, but the code
flaw is clear.
The struct bl_serial *serial structure is allocated via
bl_create_scsi_string() which does a malloc for it, but the code later
on was doing a free() on the data element of this structure and only
then on the structure itself. That first free() is incorrect, as the
data element was never malloc()ed separatedly.
This was first brought up by lixiaokeng via
https://www.spinics.net/lists/linux-nfs/msg87598.html, but not
acknowledged back then. The patch selected for this SRU is slightly
simpler and more suited for an SRU.
[ Test Plan ]
Create a VM for the ubuntu release under test. What's important is
that this VM has a SCSI device, not VIRTIO. You can add one after the
VM is created, as it must not be the root disk because we will use it
as an LVM volume group, i.e., all data on it will be erased.
You may have to install the kernel extra modules package for the scsi
device to appear:
sudo apt install linux-modules-extra-$(uname -r)
After a reboot, locate the scsi device. In this example, we will use
/dev/sda.
Partition it:
sudo sgdisk -Z /dev/sda
Create an LVM group and volume:
sudo pvcreate /dev/sda
sudo vgcreate vg0 /dev/sda
sudo lvcreate -ntest -L100M vg0
Install nfs-kernel-server:
sudo apt install nfs-kernel-server
The status of the nfs-blkmap service should already show a failure:
systemctl status nfs-blkmap.service
...
Oct 20 18:12:12 j-blkmapd-crash systemd[1]: nfs-blkmap.service: Main process exited, code=dumped, status=6/ABRT
Oct 20 18:12:12 j-blkmapd-crash systemd[1]: nfs-blkmap.service: Failed with result 'core-dump'.
To confirm, run it interactively:
$ sudo blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
double free or corruption (out)
Aborted
With the fixed packages, it should be running after install. It can
also be tried out interactively again just to be sure:
sudo systemctl stop nfs-blkmap
sudo blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
The failure to open the blocklayout file is not a problem in this
case, and is unrelated to the bug this SRU is fixing.
[ Where problems could occur ]
Restarting an NFS server can be tricky: connected clients might experience a "blip" in the service, or even hang in the worst case. Also depending on the NFS version being served (3 or 4), multiple services are involved, and the restart can expose a bug in the ordering in which these services are stopped and come back online.
[ Other Info ]
I didn't continue the investigation about other scenarios where this could be happening, or why it did not happen with a VIRTIO device, as the SCSI case was enough to reproduce the problem and show where the bug was.
[Original Description]
When using the 22.04 /etc/nfs.conf an nfsv4 server fails to operate
It kind of works but some clients fail and try nfsv3 ports
symptoms:
on boot:
× nfs-blkmap.service - pNFS block layout mapping daemon
Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; enabled; vendor preset: enabled)
Active: failed (Result: core-dump) since Sat 2022-06-25 07:14:34 PDT; 27min ago
journalctl --catalog --pager-end --unit=nfs-blkmap.service
Jun 25 07:14:34 c68z blkmapd[2386154]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
on systemctl restart nfs-server.service:
○ rpc-svcgssd.service - RPC security service for NFS server
Loaded: loaded (/lib/systemd/system/rpc-svcgssd.service; static)
Active: inactive (dead) since Fri 2022-06-24 19:07:31 PDT; 12h ago
after boot it was:
● rpc-svcgssd.service - RPC security service for NFS server
Loaded: loaded (/lib/systemd/system/rpc-svcgssd.service; static)
Active: active (running) since Sat 2022-06-25 08:27:27 PDT; 2min 7s ago
Some clients tries to access port 111 which is not used by nfs4 on the
network
ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.15.0-40-generic 5.15.0-40.43
ProcVersionSignature: Ubuntu 5.15.0-40.43-generic 5.15.35
Uname: Linux 5.15.0-40-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl icp
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D2', '/dev/snd/pcmC0D10p', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
Date: Sat Jun 25 08:37:48 2022
HibernationDevice: RESUME=none
MachineType: Apple Inc. Macmini8,1
ProcEnviron:
SHELL=/bin/bash
LANG=en_US.UTF-8
TERM=screen
PATH=(custom, no user)
ProcFB: 0 i915drmfb
ProcKernelCmdLine: root=ZFS=rpool/ROOT/ubuntu_mc4at7 ro initrd=EFI\hostname\initrd.img
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
linux-restricted-modules-5.15.0-40-generic N/A
linux-backports-modules-5.15.0-40-generic N/A
linux-firmware 20220329.git681281e4-0ubuntu3.2
RfKill:
0: hci0: Bluetooth
Soft blocked: no
Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/24/2022
dmi.bios.release: 0.1
dmi.bios.vendor: Apple Inc.
dmi.bios.version: 1731.120.10.0.0 (iBridge: 19.16.15071.0.0,0)
dmi.board.name: Mac-7BA5B2DFE22DDD8C
dmi.board.vendor: Apple Inc.
dmi.board.version: Macmini8,1
dmi.chassis.type: 9
dmi.chassis.vendor: Apple Inc.
dmi.chassis.version: Mac-7BA5B2DFE22DDD8C
dmi.modalias: dmi:bvnAppleInc.:bvr1731.120.10.0.0(iBridge19.16.15071.0.0,0):bd04/24/2022:br0.1:svnAppleInc.:pnMacmini8,1:pvr1.0:rvnAppleInc.:rnMac-7BA5B2DFE22DDD8C:rvrMacmini8,1:cvnAppleInc.:ct9:cvrMac-7BA5B2DFE22DDD8C:sku:
dmi.product.family: Mac mini
dmi.product.name: Macmini8,1
dmi.product.version: 1.0
dmi.sys.vendor: Apple Inc.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1979885/+subscriptions
More information about the foundations-bugs
mailing list