[Bug 1641875] Re: Scheduled events (e.g. LDAP connection retries) are affected by clock adjustments

Wed Feb 15 16:38:49 UTC 2017

** Patch added: "zesty_sssd_1.15.0-3ubuntu3.debdiff"
   https://bugs.launchpad.net/ubuntu/+source/sssd/+bug/1641875/+attachment/4819666/+files/zesty_sssd_1.15.0-3ubuntu3.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1641875

Title:
  Scheduled events (e.g. LDAP connection retries) are affected by clock
  adjustments

Status in sssd package in Ubuntu:
  New
Status in sssd source package in Trusty:
  New
Status in sssd source package in Xenial:
  New
Status in sssd source package in Yakkety:
  New
Status in sssd source package in Zesty:
  New

Bug description:
  [Impact]

   * SSSD is affected by clock shifts because it's built on libtevent, that doesn't use monotonic clocks. 
   * After an event has been scheduled, if the time drifts to the past SSSD won't recover and the event will have to wait the shifted time to be executed.

  [Test Case]

   * Modify the /etc/hosts file to force a failed resolution and restart sssd.
   * Within the first ~1 minute retry window, change the date (date --set) to a previous time (e.g. 2h).
   * "Tail" the log file. The event will be executed at the original schedule, past those 2h.

  [Regression Potential]

   * None

  [Other Info]

   * Version 1.15 implements a watchdog that detects time shifts and resets itselft but doesn't reschedule the scheduled events.
   * There's a network monitor based on netlink that detects interface changes and restarts the providers, rescheduling the scheduled events.
   * Such restart can be triggered with SIGUSR2. Sending that signal after the watchdog restarts fixes this issue.
   * Upstream bug: https://fedorahosted.org/sssd/ticket/3285 

  [Original Description]

  When SSSD fails to connect to a provider (LDAP, for instance) it
  creates a timed event with tevent_add_timer() in order to retry in ~1
  min. Tevent relies on CLOCK_REALTIME, using absolute epoch time, so
  when the time changes (e.g. NTP sync) the scheduled event is affected.

  Reproducer:

  1. Modify the /etc/hosts file to force a failed resolution and restart
  sssd

  (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'ldap' in DNS
  (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [resolv_gethostbyname_done] (0x0040): querying hosts database failed [5]: Input/output error
  (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
  ...
  (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

  2. Within the ~1 minute window, change the date (date --set) to a
  previous time (2 hour in my example). Note: if /etc/resolv.conf or a
  network interface is modified, SSSD providers will restart, but the
  scheduled retry will be kept.

  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] (0x0100): Initializing new c-ares channel
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] (0x0100): Destroying the old c-ares channel
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'name not resolved'
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ldap' as 'neutral'
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
  ...
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'not working'
  ...
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

  3. Once it gets to the programmed date (2 hours later), it retries:

  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'name not resolved'
  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ldap' as 'neutral'
  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
  ...
  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'not working'
  ...
  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/sssd/+bug/1641875/+subscriptions