[Bug 1927745] Re: Non-thread-safe functions used in multi-threaded rpc.gssd
Andreas Hasenack
1927745 at bugs.launchpad.net
Mon May 10 21:08:01 UTC 2021
** Description changed:
[Impact]
rpc-gssd can hang or crash when a kerberos nfsv4 mount point is accessed by multiple users simultaneously. The problem happens because the daemon uses the strtok() function which is not thread safe.
The fix from upstream removes strtok() and uses strsep() instead. These
patches are already applied in focal and later, via merges from debian.
[Test Plan]
As with all race conditions, this test case may take a while to reproduce the problem.
# Create a bionic VM. It seems to help if it's created with multiple cpus/cores. I had more success with 4 cores and 1GiB of RAM.
+ # if using lxd to launch a VM, you can run this before: "lxc config set vm-name limits.cpu=4". Just don't forget to undo it, or set to your normal number of CPUs, after the test
# Login and get its ip, and take note of it:
export IP=$(ip route get default 8.8.8.8 | grep ^8 | awk '{print $7}')
echo $IP
# adjust /etc/hosts:
echo "$IP $(hostname).example.com" | sudo tee -a /etc/hosts
# adjust /etc/resolv.conf:
echo "search example.com" | sudo tee -a /etc/resolv.conf
# verify hostname -f returns the fqdn of the vm, i.e., a name with the .example.com domain:
hostname -f
+
+ # If you still don't get the correct FQDN, try the below, adjusting for your hostname:
+ sudo hostnamectl set-hostname <put-host-here>.example.com
# Run these commands, and when asked:
# - for realm: EXAMPLE.COM
# - for kdc and admin server: use the vm's IP
sudo apt update && sudo apt install nfs-server krb5-kdc krb5-admin-
server krb5-user gcc
# create a kerberos realm. When prompted, use any password you want:
sudo krb5_newrealm
# create an nfs service ticket, and store it in the keytab
sudo kadmin.local -q "addprinc -randkey nfs/$(hostname -f)"
sudo kadmin.local -q "ktadd nfs/$(hostname -f)"
# create test directories
sudo mkdir -p /mnt/test_krb5/
sudo mkdir -p /export
sudo touch /export/foo
# adjust nfs config and restart the nfs server:
sudo sed -r -i "s,^NEED_SVCGSSD=.*,NEED_SVCGSSD=\"yes\"," /etc/default/nfs-kernel-server
sudo sed -r -i "s,^NEED_GSSD=.*,NEED_GSSD=\"yes\"," /etc/default/nfs-common
sudo systemctl restart nfs-server
# configure an nfs export:
echo "/export *(sec=krb5,rw,sync,no_subtree_check)" | sudo tee -a /etc/exports
sudo exportfs -rva
# confirm it's available
sudo showmount -e localhost
# mount it
sudo mount $(hostname -f):/export /mnt/test_krb5/
sudo ls -la /mnt/test_krb5
# download bug attachments
wget -ct0 https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1927745/+attachment/5496166/+files/stat_as.c https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1927745/+attachment/5496167/+files/bz1419280_test_threads
chmod +x bz1419280_test_threads
# build reproducer
gcc stat_as.c -o stat_as
# run test script as root. It may take a few minutes to trigger the bug
sudo ./bz1419280_test_threads
# wait
# Once you get the confirmation:
calling stat on '/mnt/test_krb5/foo' with uids 9995 through 10035
reproduced the bug after 114 iterations
# Check for a "stat_as" D state process:
$ ps axw|grep stat_as
17814 pts/1 D 0:00 ./stat_as /mnt/test_krb5/foo 9995 10035
# To restore functionality, restart rpc-gssd:
sudo systemctl restart rpc-gssd.service
With the updated packages, the script will not detect the bug and never
stop.
[Where problems could occur]
NFS v4 services are more complex than earlier versions, and are comprised of several services/daemons. It's possible for the restart done after the automatic package upgrade to show up as a regression due to several factors:
- not all needed services were restarted (bug, but not introduced by this change)
- depending on mount options, client mount points may appear as hung and take a while to recover
- configuration errors on the server which were up until now not noticed, and only manifest themselves after a restart
- some sites, due to the lack of configuration options in /etc/default/nfs-*, might have overriden systemd service files and hardcoded other command line options there. If not done properly (i.e., not done in /etc/systemd via overrides), these local changes will be lost after the package upgrade. I know of at least rpc-gssd, which has no command-line options available in /etc/default/nfs-*, and I know of users who have tweaked this service in many different ways to add things like -v or -n to its command line option.
[Other Info]
The upstream patches have been applied since February 2017 and have not been changed or reverted. They are also applied in Debian and Fedora, and ubuntu since focal at least.
There is an additional patch, but part of the fix, which dupes the
string for appropriate logging. Its memory is also freed.
It may be hard to reproduce this bug in a test environment. I've gotten
to the error in as little as a few seconds, but other times it took
hundreds of attempts. YMMV.
[Original Description]
Fixed in focal and later, due to sync from debian
Bionic affected.
I'll add a proper description in a moment.
RH: https://bugzilla.redhat.com/show_bug.cgi?id=1419280
Debian BTS: https://bugs.debian.org/895381
ML: http://www.spinics.net/lists/linux-nfs/msg62111.html
ML: http://www.spinics.net/lists/linux-nfs/msg62099.html
** Description changed:
[Impact]
rpc-gssd can hang or crash when a kerberos nfsv4 mount point is accessed by multiple users simultaneously. The problem happens because the daemon uses the strtok() function which is not thread safe.
The fix from upstream removes strtok() and uses strsep() instead. These
patches are already applied in focal and later, via merges from debian.
[Test Plan]
As with all race conditions, this test case may take a while to reproduce the problem.
# Create a bionic VM. It seems to help if it's created with multiple cpus/cores. I had more success with 4 cores and 1GiB of RAM.
# if using lxd to launch a VM, you can run this before: "lxc config set vm-name limits.cpu=4". Just don't forget to undo it, or set to your normal number of CPUs, after the test
# Login and get its ip, and take note of it:
export IP=$(ip route get default 8.8.8.8 | grep ^8 | awk '{print $7}')
echo $IP
# adjust /etc/hosts:
echo "$IP $(hostname).example.com" | sudo tee -a /etc/hosts
# adjust /etc/resolv.conf:
echo "search example.com" | sudo tee -a /etc/resolv.conf
# verify hostname -f returns the fqdn of the vm, i.e., a name with the .example.com domain:
hostname -f
- # If you still don't get the correct FQDN, try the below, adjusting for your hostname:
- sudo hostnamectl set-hostname <put-host-here>.example.com
+ # If you still don't get the correct FQDN, try the below, adjusting for your hostname if $(hostname) isn't working properly:
+ sudo hostnamectl set-hostname $(hostname).example.com
# Run these commands, and when asked:
# - for realm: EXAMPLE.COM
# - for kdc and admin server: use the vm's IP
sudo apt update && sudo apt install nfs-server krb5-kdc krb5-admin-
server krb5-user gcc
# create a kerberos realm. When prompted, use any password you want:
sudo krb5_newrealm
# create an nfs service ticket, and store it in the keytab
sudo kadmin.local -q "addprinc -randkey nfs/$(hostname -f)"
sudo kadmin.local -q "ktadd nfs/$(hostname -f)"
# create test directories
sudo mkdir -p /mnt/test_krb5/
sudo mkdir -p /export
sudo touch /export/foo
# adjust nfs config and restart the nfs server:
sudo sed -r -i "s,^NEED_SVCGSSD=.*,NEED_SVCGSSD=\"yes\"," /etc/default/nfs-kernel-server
sudo sed -r -i "s,^NEED_GSSD=.*,NEED_GSSD=\"yes\"," /etc/default/nfs-common
sudo systemctl restart nfs-server
# configure an nfs export:
echo "/export *(sec=krb5,rw,sync,no_subtree_check)" | sudo tee -a /etc/exports
sudo exportfs -rva
# confirm it's available
sudo showmount -e localhost
# mount it
sudo mount $(hostname -f):/export /mnt/test_krb5/
sudo ls -la /mnt/test_krb5
# download bug attachments
wget -ct0 https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1927745/+attachment/5496166/+files/stat_as.c https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1927745/+attachment/5496167/+files/bz1419280_test_threads
chmod +x bz1419280_test_threads
# build reproducer
gcc stat_as.c -o stat_as
# run test script as root. It may take a few minutes to trigger the bug
sudo ./bz1419280_test_threads
# wait
# Once you get the confirmation:
calling stat on '/mnt/test_krb5/foo' with uids 9995 through 10035
reproduced the bug after 114 iterations
# Check for a "stat_as" D state process:
$ ps axw|grep stat_as
17814 pts/1 D 0:00 ./stat_as /mnt/test_krb5/foo 9995 10035
# To restore functionality, restart rpc-gssd:
sudo systemctl restart rpc-gssd.service
With the updated packages, the script will not detect the bug and never
stop.
[Where problems could occur]
NFS v4 services are more complex than earlier versions, and are comprised of several services/daemons. It's possible for the restart done after the automatic package upgrade to show up as a regression due to several factors:
- not all needed services were restarted (bug, but not introduced by this change)
- depending on mount options, client mount points may appear as hung and take a while to recover
- configuration errors on the server which were up until now not noticed, and only manifest themselves after a restart
- some sites, due to the lack of configuration options in /etc/default/nfs-*, might have overriden systemd service files and hardcoded other command line options there. If not done properly (i.e., not done in /etc/systemd via overrides), these local changes will be lost after the package upgrade. I know of at least rpc-gssd, which has no command-line options available in /etc/default/nfs-*, and I know of users who have tweaked this service in many different ways to add things like -v or -n to its command line option.
[Other Info]
The upstream patches have been applied since February 2017 and have not been changed or reverted. They are also applied in Debian and Fedora, and ubuntu since focal at least.
There is an additional patch, but part of the fix, which dupes the
string for appropriate logging. Its memory is also freed.
It may be hard to reproduce this bug in a test environment. I've gotten
to the error in as little as a few seconds, but other times it took
hundreds of attempts. YMMV.
[Original Description]
Fixed in focal and later, due to sync from debian
Bionic affected.
I'll add a proper description in a moment.
RH: https://bugzilla.redhat.com/show_bug.cgi?id=1419280
Debian BTS: https://bugs.debian.org/895381
ML: http://www.spinics.net/lists/linux-nfs/msg62111.html
ML: http://www.spinics.net/lists/linux-nfs/msg62099.html
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/1927745
Title:
Non-thread-safe functions used in multi-threaded rpc.gssd
Status in nfs-utils package in Ubuntu:
Fix Released
Status in nfs-utils source package in Bionic:
In Progress
Status in nfs-utils package in Debian:
Fix Released
Status in nfs-utils package in Fedora:
Fix Released
Bug description:
[Impact]
rpc-gssd can hang or crash when a kerberos nfsv4 mount point is accessed by multiple users simultaneously. The problem happens because the daemon uses the strtok() function which is not thread safe.
The fix from upstream removes strtok() and uses strsep() instead.
These patches are already applied in focal and later, via merges from
debian.
[Test Plan]
As with all race conditions, this test case may take a while to reproduce the problem.
# Create a bionic VM. It seems to help if it's created with multiple cpus/cores. I had more success with 4 cores and 1GiB of RAM.
# if using lxd to launch a VM, you can run this before: "lxc config set vm-name limits.cpu=4". Just don't forget to undo it, or set to your normal number of CPUs, after the test
# Login and get its ip, and take note of it:
export IP=$(ip route get default 8.8.8.8 | grep ^8 | awk '{print $7}')
echo $IP
# adjust /etc/hosts:
echo "$IP $(hostname).example.com" | sudo tee -a /etc/hosts
# adjust /etc/resolv.conf:
echo "search example.com" | sudo tee -a /etc/resolv.conf
# verify hostname -f returns the fqdn of the vm, i.e., a name with the .example.com domain:
hostname -f
# If you still don't get the correct FQDN, try the below, adjusting for your hostname if $(hostname) isn't working properly:
sudo hostnamectl set-hostname $(hostname).example.com
# Run these commands, and when asked:
# - for realm: EXAMPLE.COM
# - for kdc and admin server: use the vm's IP
sudo apt update && sudo apt install nfs-server krb5-kdc krb5-admin-
server krb5-user gcc
# create a kerberos realm. When prompted, use any password you want:
sudo krb5_newrealm
# create an nfs service ticket, and store it in the keytab
sudo kadmin.local -q "addprinc -randkey nfs/$(hostname -f)"
sudo kadmin.local -q "ktadd nfs/$(hostname -f)"
# create test directories
sudo mkdir -p /mnt/test_krb5/
sudo mkdir -p /export
sudo touch /export/foo
# adjust nfs config and restart the nfs server:
sudo sed -r -i "s,^NEED_SVCGSSD=.*,NEED_SVCGSSD=\"yes\"," /etc/default/nfs-kernel-server
sudo sed -r -i "s,^NEED_GSSD=.*,NEED_GSSD=\"yes\"," /etc/default/nfs-common
sudo systemctl restart nfs-server
# configure an nfs export:
echo "/export *(sec=krb5,rw,sync,no_subtree_check)" | sudo tee -a /etc/exports
sudo exportfs -rva
# confirm it's available
sudo showmount -e localhost
# mount it
sudo mount $(hostname -f):/export /mnt/test_krb5/
sudo ls -la /mnt/test_krb5
# download bug attachments
wget -ct0 https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1927745/+attachment/5496166/+files/stat_as.c https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1927745/+attachment/5496167/+files/bz1419280_test_threads
chmod +x bz1419280_test_threads
# build reproducer
gcc stat_as.c -o stat_as
# run test script as root. It may take a few minutes to trigger the bug
sudo ./bz1419280_test_threads
# wait
# Once you get the confirmation:
calling stat on '/mnt/test_krb5/foo' with uids 9995 through 10035
reproduced the bug after 114 iterations
# Check for a "stat_as" D state process:
$ ps axw|grep stat_as
17814 pts/1 D 0:00 ./stat_as /mnt/test_krb5/foo 9995 10035
# To restore functionality, restart rpc-gssd:
sudo systemctl restart rpc-gssd.service
With the updated packages, the script will not detect the bug and
never stop.
[Where problems could occur]
NFS v4 services are more complex than earlier versions, and are comprised of several services/daemons. It's possible for the restart done after the automatic package upgrade to show up as a regression due to several factors:
- not all needed services were restarted (bug, but not introduced by this change)
- depending on mount options, client mount points may appear as hung and take a while to recover
- configuration errors on the server which were up until now not noticed, and only manifest themselves after a restart
- some sites, due to the lack of configuration options in /etc/default/nfs-*, might have overriden systemd service files and hardcoded other command line options there. If not done properly (i.e., not done in /etc/systemd via overrides), these local changes will be lost after the package upgrade. I know of at least rpc-gssd, which has no command-line options available in /etc/default/nfs-*, and I know of users who have tweaked this service in many different ways to add things like -v or -n to its command line option.
[Other Info]
The upstream patches have been applied since February 2017 and have not been changed or reverted. They are also applied in Debian and Fedora, and ubuntu since focal at least.
There is an additional patch, but part of the fix, which dupes the
string for appropriate logging. Its memory is also freed.
It may be hard to reproduce this bug in a test environment. I've
gotten to the error in as little as a few seconds, but other times it
took hundreds of attempts. YMMV.
[Original Description]
Fixed in focal and later, due to sync from debian
Bionic affected.
I'll add a proper description in a moment.
RH: https://bugzilla.redhat.com/show_bug.cgi?id=1419280
Debian BTS: https://bugs.debian.org/895381
ML: http://www.spinics.net/lists/linux-nfs/msg62111.html
ML: http://www.spinics.net/lists/linux-nfs/msg62099.html
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1927745/+subscriptions
More information about the foundations-bugs
mailing list