[Bug 1723350] Re: sssd offline on boot, stays offline forever
Andreas Hasenack
andreas at canonical.com
Wed May 13 21:17:56 UTC 2020
** Description changed:
+ [Impact]
+ sssd can switch to an offline mode of operation when it cannot reach the authentication or id backend. It uses several methods to assess the situation, and one of them is monitoring the /etc/resolv.conf file for changes.
+
+ In ubuntu that file is a symlink to /run/systemd/resolve/stub-
+ resolv.conf, but the target doesn't exist at all times during boot. It's
+ expected that symlink to be broken for a while during boot.
+
+ Turns out that the monitoring that sssd was doing on /etc/resolv.conf
+ didn't take into consideration that what could change was the *target*
+ of the symlink. it completely ignored that fact, and didn't notice when
+ the resolv.conf contents actually changed in this scenario, which
+ resulted in sssd staying in the offline mode when it shouldn't.
+
+ There are two fixes being pulled in for this SRU:
+ a) fix the monitoring of the target of the /etc/resolv.conf symlink
+ b) change the fallback polling code to keep trying, instead of giving up right away
+
+ [Test Case]
+ It's recommended to test this in a lxd container, or a vm.
+
+ Preparation steps:
+ $ sudo apt install sssd-ldap sssd-tools sssd-dbus slapd ldap-utils dnsmasq
+
+ Become root:
+ $ sudo su -
+
+ Detect your ip:
+ # export interface=$(ip route | grep default | sed -r 's,^default via .* dev ([a-z0-9]+) .*,\1,')
+ # export ip=$(ip addr show dev $interface | grep "inet [0-9]" | awk '{print $2}' | cut -d / -f 1)
+
+ Confirm the $ip variable is correct for your case:
+ # echo $ip
+
+ Create /etc/dnsmasq.d/sssd-test.conf using your real ip:
+ # cat > /etc/dnsmasq.d/sssd-test.conf <<EOF
+ host-record=ldap01.example.com,$ip
+ listen-address=$ip
+ EOF
+
+ restart dnsmasq
+ # systemctl restart dnsmasq
+
+
+ a) inotify test
+ Create /etc/sssd/sssd.conf:
+ # cat > /etc/sssd/sssd.conf <<EOF
+ [sssd]
+ config_file_version = 2
+ services = nss, pam, ifp
+ domains = LDAP
+ #debug_level = 6
+
+ [domain/LDAP]
+ id_provider = ldap
+ ldap_uri = ldap://ldap01.example.com
+ cache_credentials = True
+ ldap_search_base = dc=example,dc=com
+ EOF
+
+ # chmod 0600 /etc/sssd/sssd.conf
+
+ # rm /etc/resolv.conf
+ # ln -s /etc/resolv.conf.target /etc/resolv.conf
+
+ create good resolv.conf:
+ # echo "nameserver $ip" > /etc/resolv.conf.good
+
+ Confirm /etc/resolv.conf is a broken symlink:
+ # ll /etc/resolv.conf*
+ lrwxrwxrwx 1 root root 23 May 13 20:48 /etc/resolv.conf -> /etc/resolv.conf.target
+ -rw-r--r-- 1 root root 24 May 13 20:48 /etc/resolv.conf.good
+
+ Start sssd
+ # systemctl restart sssd
+
+ Repeat the sssctl call until it shows the offline mode persistently:
+ # sssctl domain-status LDAP
+ Online status: Offline
+
+ Active servers:
+ LDAP: not connected
+
+ Discovered LDAP servers:
+ - ldap01.example.com
+
+ "Unbreak" the symlink:
+ # cp /etc/resolv.conf.good /etc/resolv.conf.target
+
+ Run sssctl again, it should almost immediately switch to online:
+ # sssctl domain-status LDAP
+ Online status: Online
+
+ Active servers:
+ LDAP: ldap01.example.com
+
+ Discovered LDAP servers:
+ - ldap01.example.com
+
+
+ [Regression Potential]
+
+ * discussion of how regressions are most likely to manifest as a result
+ of this change.
+
+ * It is assumed that any SRU candidate patch is well-tested before
+ upload and has a low overall risk of regression, but it's important
+ to make the effort to think about what ''could'' happen in the
+ event of a regression.
+
+ * This both shows the SRU team that the risks have been considered,
+ and provides guidance to testers in regression-testing the SRU.
+
+ [Other Info]
+
+ * Anything else you think is useful to include
+ * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board
+ * and address these questions in advance
+
+ [Original Description]
+
SSSD 1.15.3-2ubuntu1 on 17.10/artful (previous versions on artful were
also affected) is offline on boot and seems to stay offline forever (I
waited over 20 minutes).
sssd_nss.log:
(Fri Oct 13 09:49:50 2017) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]
(Fri Oct 13 09:49:51 2017) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]
(Fri Oct 13 09:49:51 2017) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]
(Fri Oct 13 09:49:51 2017) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]
...
SSSD immediately returns to normal operation after restarting it or
after sending SIGUSR2.
A workaround for the problem is creating the file /etc/systemd/system/sssd.service.d/override.conf with contents
[Unit]
Requires=network-online.target
After=network-online.target
** Description changed:
- [Impact]
+ [Impact]
sssd can switch to an offline mode of operation when it cannot reach the authentication or id backend. It uses several methods to assess the situation, and one of them is monitoring the /etc/resolv.conf file for changes.
In ubuntu that file is a symlink to /run/systemd/resolve/stub-
resolv.conf, but the target doesn't exist at all times during boot. It's
expected that symlink to be broken for a while during boot.
Turns out that the monitoring that sssd was doing on /etc/resolv.conf
didn't take into consideration that what could change was the *target*
of the symlink. it completely ignored that fact, and didn't notice when
the resolv.conf contents actually changed in this scenario, which
resulted in sssd staying in the offline mode when it shouldn't.
There are two fixes being pulled in for this SRU:
a) fix the monitoring of the target of the /etc/resolv.conf symlink
b) change the fallback polling code to keep trying, instead of giving up right away
[Test Case]
It's recommended to test this in a lxd container, or a vm.
Preparation steps:
$ sudo apt install sssd-ldap sssd-tools sssd-dbus slapd ldap-utils dnsmasq
Become root:
$ sudo su -
Detect your ip:
# export interface=$(ip route | grep default | sed -r 's,^default via .* dev ([a-z0-9]+) .*,\1,')
# export ip=$(ip addr show dev $interface | grep "inet [0-9]" | awk '{print $2}' | cut -d / -f 1)
Confirm the $ip variable is correct for your case:
# echo $ip
Create /etc/dnsmasq.d/sssd-test.conf using your real ip:
# cat > /etc/dnsmasq.d/sssd-test.conf <<EOF
host-record=ldap01.example.com,$ip
listen-address=$ip
EOF
restart dnsmasq
# systemctl restart dnsmasq
-
a) inotify test
Create /etc/sssd/sssd.conf:
# cat > /etc/sssd/sssd.conf <<EOF
[sssd]
config_file_version = 2
services = nss, pam, ifp
domains = LDAP
#debug_level = 6
[domain/LDAP]
id_provider = ldap
ldap_uri = ldap://ldap01.example.com
cache_credentials = True
ldap_search_base = dc=example,dc=com
EOF
# chmod 0600 /etc/sssd/sssd.conf
# rm /etc/resolv.conf
# ln -s /etc/resolv.conf.target /etc/resolv.conf
create good resolv.conf:
# echo "nameserver $ip" > /etc/resolv.conf.good
Confirm /etc/resolv.conf is a broken symlink:
# ll /etc/resolv.conf*
lrwxrwxrwx 1 root root 23 May 13 20:48 /etc/resolv.conf -> /etc/resolv.conf.target
-rw-r--r-- 1 root root 24 May 13 20:48 /etc/resolv.conf.good
Start sssd
# systemctl restart sssd
Repeat the sssctl call until it shows the offline mode persistently:
# sssctl domain-status LDAP
Online status: Offline
Active servers:
LDAP: not connected
Discovered LDAP servers:
- ldap01.example.com
"Unbreak" the symlink:
# cp /etc/resolv.conf.good /etc/resolv.conf.target
Run sssctl again, it should almost immediately switch to online:
# sssctl domain-status LDAP
Online status: Online
Active servers:
LDAP: ldap01.example.com
Discovered LDAP servers:
- ldap01.example.com
-
- [Regression Potential]
-
- * discussion of how regressions are most likely to manifest as a result
- of this change.
-
- * It is assumed that any SRU candidate patch is well-tested before
- upload and has a low overall risk of regression, but it's important
- to make the effort to think about what ''could'' happen in the
- event of a regression.
-
- * This both shows the SRU team that the risks have been considered,
- and provides guidance to testers in regression-testing the SRU.
+ [Regression Potential]
+ TBD
[Other Info]
-
- * Anything else you think is useful to include
- * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board
- * and address these questions in advance
+ Not at this time.
[Original Description]
SSSD 1.15.3-2ubuntu1 on 17.10/artful (previous versions on artful were
also affected) is offline on boot and seems to stay offline forever (I
waited over 20 minutes).
sssd_nss.log:
(Fri Oct 13 09:49:50 2017) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]
(Fri Oct 13 09:49:51 2017) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]
(Fri Oct 13 09:49:51 2017) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]
(Fri Oct 13 09:49:51 2017) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]
...
SSSD immediately returns to normal operation after restarting it or
after sending SIGUSR2.
A workaround for the problem is creating the file /etc/systemd/system/sssd.service.d/override.conf with contents
[Unit]
Requires=network-online.target
After=network-online.target
--
You received this bug notification because you are a member of Ubuntu
Server, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1723350
Title:
sssd offline on boot, stays offline forever
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/sssd/+bug/1723350/+subscriptions
More information about the Ubuntu-server-bugs
mailing list