[Bug 1979885] Re: /etc.nfs.conf fails for nfsv4 server / blkmapd dumps core
Andreas Hasenack
1979885 at bugs.launchpad.net
Wed Nov 16 19:22:28 UTC 2022
Jammy verification
Reproducing the bug:
root at j-nfs-blkmapd-crash:~# apt-cache policy nfs-kernel-server
nfs-kernel-server:
Installed: 1:2.6.1-1ubuntu1.1
Candidate: 1:2.6.1-1ubuntu1.1
Version table:
*** 1:2.6.1-1ubuntu1.1 500
500 http://br.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
100 /var/lib/dpkg/status
Service has crashed already:
root at j-nfs-blkmapd-crash:~# systemctl status nfs-blkmap.service
× nfs-blkmap.service - pNFS block layout mapping daemon
Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; enabled; vendor preset: enabled)
Active: failed (Result: core-dump) since Wed 2022-11-16 19:20:14 UTC; 23s ago
Main PID: 1778 (code=dumped, signal=ABRT)
CPU: 3ms
Nov 16 19:20:13 j-nfs-blkmapd-crash systemd[1]: Starting pNFS block layout mapping daemon...
Nov 16 19:20:14 j-nfs-blkmapd-crash systemd[1]: Started pNFS block layout mapping daemon.
Nov 16 19:20:14 j-nfs-blkmapd-crash systemd[1]: nfs-blkmap.service: Main process exited, code=dumped, status=6/ABRT
Nov 16 19:20:14 j-nfs-blkmapd-crash systemd[1]: nfs-blkmap.service: Failed with result 'core-dump'.
Updating to the package in proposed:
root at j-nfs-blkmapd-crash:~# apt-cache policy nfs-kernel-server
nfs-kernel-server:
Installed: 1:2.6.1-1ubuntu1.2
Candidate: 1:2.6.1-1ubuntu1.2
Version table:
*** 1:2.6.1-1ubuntu1.2 500
500 http://br.archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages
100 /var/lib/dpkg/status
Service is running already:
root at j-nfs-blkmapd-crash:~# systemctl status nfs-blkmap.service
● nfs-blkmap.service - pNFS block layout mapping daemon
Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2022-11-16 19:21:31 UTC; 22s ago
Main PID: 2960 (blkmapd)
Tasks: 1 (limit: 1082)
Memory: 304.0K
CPU: 2ms
CGroup: /system.slice/nfs-blkmap.service
└─2960 /usr/sbin/blkmapd
Nov 16 19:21:31 j-nfs-blkmapd-crash systemd[1]: Starting pNFS block layout mapping daemon...
Nov 16 19:21:31 j-nfs-blkmapd-crash systemd[1]: Started pNFS block layout mapping daemon.
Stopping and starting without forking to verify again:
root at j-nfs-blkmapd-crash:~# systemctl stop nfs-blkmap.service
root at j-nfs-blkmapd-crash:~# blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
Service runs without crashing.
Jammy verification succeeded.
** Tags removed: verification-needed-jammy
** Tags added: verification-done-jammy
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/1979885
Title:
/etc.nfs.conf fails for nfsv4 server / blkmapd dumps core
Status in nfs-utils package in Ubuntu:
Fix Released
Status in nfs-utils source package in Jammy:
Fix Committed
Status in nfs-utils source package in Kinetic:
Fix Committed
Status in nfs-utils package in Debian:
Confirmed
Bug description:
[ Impact ]
Under certain conditions, blkmapd can crash due to calling free() on a
pointer that wasn't malloc()ed. The reproducer went as far as
isolating it to having LVM Logical Volumes on SCSI disks, but the code
flaw is clear.
The struct bl_serial *serial structure is allocated via
bl_create_scsi_string() which does a malloc for it, but the code later
on was doing a free() on the data element of this structure and only
then on the structure itself. That first free() is incorrect, as the
data element was never malloc()ed separatedly.
This was first brought up by lixiaokeng via
https://www.spinics.net/lists/linux-nfs/msg87598.html, but not
acknowledged back then. The patch selected for this SRU is slightly
simpler and more suited for an SRU.
[ Test Plan ]
Create a VM for the ubuntu release under test. What's important is
that this VM has a SCSI device, not VIRTIO. You can add one after the
VM is created, as it must not be the root disk because we will use it
as an LVM volume group, i.e., all data on it will be erased.
You may have to install the kernel extra modules package for the scsi
device to appear:
sudo apt install linux-modules-extra-$(uname -r)
After a reboot, locate the scsi device. In this example, we will use
/dev/sda.
Partition it:
sudo sgdisk -Z /dev/sda
Create an LVM group and volume:
sudo pvcreate /dev/sda
sudo vgcreate vg0 /dev/sda
sudo lvcreate -ntest -L100M vg0
Install nfs-kernel-server:
sudo apt install nfs-kernel-server
The status of the nfs-blkmap service should already show a failure:
systemctl status nfs-blkmap.service
...
Oct 20 18:12:12 j-blkmapd-crash systemd[1]: nfs-blkmap.service: Main process exited, code=dumped, status=6/ABRT
Oct 20 18:12:12 j-blkmapd-crash systemd[1]: nfs-blkmap.service: Failed with result 'core-dump'.
To confirm, run it interactively:
$ sudo blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
double free or corruption (out)
Aborted
With the fixed packages, it should be running after install. It can
also be tried out interactively again just to be sure:
sudo systemctl stop nfs-blkmap
sudo blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
The failure to open the blocklayout file is not a problem in this
case, and is unrelated to the bug this SRU is fixing.
[ Where problems could occur ]
Restarting an NFS server can be tricky: connected clients might experience a "blip" in the service, or even hang in the worst case. Also depending on the NFS version being served (3 or 4), multiple services are involved, and the restart can expose a bug in the ordering in which these services are stopped and come back online.
In terms of the patch and code, it's C code dealing with pointers and
memory allocation. Things can easily go wrong here, and since this is
a daemon, memory leaks can have bigger consequences.
[ Other Info ]
I didn't continue the investigation about other scenarios where this could be happening, or why it did not happen with a VIRTIO device, as the SCSI case was enough to reproduce the problem and show where the bug was.
The previous SRU for nfs-utils
(https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1977745) was
stopped by phasing because it detected
(https://errors.ubuntu.com/?release=Ubuntu%2022.04&package=nfs-
utils&period=week&version=1%3A2.6.1-1ubuntu1.1) the crash from this
bug here during the restart of blkmapd.
[Original Description]
When using the 22.04 /etc/nfs.conf an nfsv4 server fails to operate
It kind of works but some clients fail and try nfsv3 ports
symptoms:
on boot:
× nfs-blkmap.service - pNFS block layout mapping daemon
Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; enabled; vendor preset: enabled)
Active: failed (Result: core-dump) since Sat 2022-06-25 07:14:34 PDT; 27min ago
journalctl --catalog --pager-end --unit=nfs-blkmap.service
Jun 25 07:14:34 c68z blkmapd[2386154]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
on systemctl restart nfs-server.service:
○ rpc-svcgssd.service - RPC security service for NFS server
Loaded: loaded (/lib/systemd/system/rpc-svcgssd.service; static)
Active: inactive (dead) since Fri 2022-06-24 19:07:31 PDT; 12h ago
after boot it was:
● rpc-svcgssd.service - RPC security service for NFS server
Loaded: loaded (/lib/systemd/system/rpc-svcgssd.service; static)
Active: active (running) since Sat 2022-06-25 08:27:27 PDT; 2min 7s ago
Some clients tries to access port 111 which is not used by nfs4 on the
network
ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.15.0-40-generic 5.15.0-40.43
ProcVersionSignature: Ubuntu 5.15.0-40.43-generic 5.15.35
Uname: Linux 5.15.0-40-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl icp
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D2', '/dev/snd/pcmC0D10p', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
Date: Sat Jun 25 08:37:48 2022
HibernationDevice: RESUME=none
MachineType: Apple Inc. Macmini8,1
ProcEnviron:
SHELL=/bin/bash
LANG=en_US.UTF-8
TERM=screen
PATH=(custom, no user)
ProcFB: 0 i915drmfb
ProcKernelCmdLine: root=ZFS=rpool/ROOT/ubuntu_mc4at7 ro initrd=EFI\hostname\initrd.img
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
linux-restricted-modules-5.15.0-40-generic N/A
linux-backports-modules-5.15.0-40-generic N/A
linux-firmware 20220329.git681281e4-0ubuntu3.2
RfKill:
0: hci0: Bluetooth
Soft blocked: no
Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/24/2022
dmi.bios.release: 0.1
dmi.bios.vendor: Apple Inc.
dmi.bios.version: 1731.120.10.0.0 (iBridge: 19.16.15071.0.0,0)
dmi.board.name: Mac-7BA5B2DFE22DDD8C
dmi.board.vendor: Apple Inc.
dmi.board.version: Macmini8,1
dmi.chassis.type: 9
dmi.chassis.vendor: Apple Inc.
dmi.chassis.version: Mac-7BA5B2DFE22DDD8C
dmi.modalias: dmi:bvnAppleInc.:bvr1731.120.10.0.0(iBridge19.16.15071.0.0,0):bd04/24/2022:br0.1:svnAppleInc.:pnMacmini8,1:pvr1.0:rvnAppleInc.:rnMac-7BA5B2DFE22DDD8C:rvrMacmini8,1:cvnAppleInc.:ct9:cvrMac-7BA5B2DFE22DDD8C:sku:
dmi.product.family: Mac mini
dmi.product.name: Macmini8,1
dmi.product.version: 1.0
dmi.sys.vendor: Apple Inc.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1979885/+subscriptions
More information about the foundations-bugs
mailing list