Packaging policy discussion: After=network-online.target

Fri May 14 14:14:27 UTC 2021

Expanding on this Seth...

On 5/13/21 6:51 PM, Seth Arnold wrote:
> In the last week I've seen four different conversations about how to
> properly start a service only 'after the network is up', and the different
> people had different ideas of what this meant for their service:
>
> - one wanted LAN up and working, nothing fancy
> - one wanted to wait until DNS resolution was working
> - one wanted to wait until an ospf daemon had negotiated routing
>    tables and installed a default route
> - one waited to wait until ntp had synced (not just started, but
>    actually synced)

I think this is the 'can of worms' I just mentioned in #ubuntu-devel on 
IRC.  Each and every one of these specific cases would need its own 
network target or SystemD target for all those cases.  We also have the 
case (from a 2017 bug that Server Team "Won't Fix"'d) that someone wants 
NGINX to start only after DNS works and the network is 'up' (and 
routable).  There's no special targets for those 'special cases' at a 
SystemD level.

Not sure there's a way to actually handle all these cases.

We also have to be careful here: "Network Online" is, by FreeDesktop 
standards: "...the definition of "up" is defined by the network 
management software."

None of those other components mentioned (DNS, OSPF, NTP) are **network 
management** software.  Unless there's a standard for "online" written 
somewhere that isn't "your system has a routable IP assigned and the 
link status of the interface is UP", I don't think we can handle all the 
edge cases.

To be more specific, I think we need to step back from the 
semantics/argument regarding the target, and examine all the individual 
cases from the perspective of "Has the sysadmin of these systems changed 
the system services and configurations from the default such that it's 
an edge case we cannot predict or adapt to for out of the box setups?"

In the case of Case #1 in your email, that's just network-online.target 
as written.  But 'their service' can be overridden in SystemD to be 
customized to have that target.

Case #2 would require the application to have some kind of start-script 
that can check DNS and not fail on DNS resolution failure.  (Or, exit in 
a way that SystemD would retry it again after a delay - exit code != 0 
and not a sigkill, etc., with a restart delay of, say, 15-30 seconds 
while depending on network.target or network-online.target.)  NGINX fits 
this case, because if you use a DNS name in there and it doesn't 
resolve, it causes a bit of an error at startup.

Case #3 requires more than LAN up, and like case #2 would need its own 
configuration / script to check that ospf is populated and such - 
there's nothing in SystemD that governs this.

Case #4 is like Case #3 and #2, except that you have to have a tie in to 
NTP.  Which, in more modern deployments, is `systemd-timesyncd` which 
handles NTP sync.  (Or Chrony, if you're like me and run an NTP server - 
chrony provides the granularity I need for the NTP server side of things).

---

Case #2, #3, and #4 fit the standard of "This is a non-standard 
configuration that you as the sysadmin have implemented.  If you need 
special case handling for these services, that's beyond the scope of the 
'default' packaging/service configuration goals." Using this argument as 
a basis, we could then go back to $SYSADMIN and say "We're not going to 
fix this in the packaging, because your use case goes beyond the goal of 
network-online.target as defined by FreeDesktop:  {Quote Goes Here of 
network-online.target data from 
https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/}. 
Pursuant to this quote, if you want to rely on something other than the 
network management software's definition of 'online', you will need to 
change your services on your end and how they work. This, however, is 
not a general packaging change that we are going to adopt for ${PACKAGE} 
at this time."

That, actually, is what we did with NGINX - back in 2017 it was 
requested to change the target, and after digging and discussion we 
decided we weren't going to make that change because there is no 
standard definition of what "online" is beyond the fact that "it's 
dependent on the network management software on system to identify what 
'online' means".  And that was too vague a definition for us to support 
changing NGINX to network-online.target.  This argument continues to 
hold today.

I have several opinions on this, but my primary opinion is, for now, 
until there's a 'standard' defined at the SystemD level for what 
"online" should be and how that's reported back, we should reject these 
requests with a message like above, stating that "this is a non-standard 
change from the defaults, and if you need network-online.target put it 
in your service overrides.  if you need more than network-online.target 
that your network management software that configures your interfaces 
needs, then you will need to customize the service more yourself, and 
that's beyond the scope of the packaging done by Ubuntu."

I do agree we need a standard defined for "What does 'online' mean, and 
everyone has to accept that as the definition for what SystemD's 
'network-online' state is.  But for now, until that standard is defined 
somewhere upstream, we have to accept that there is no standard, and we 
aren't going to adapt to everyone's needs and edge cases once they steer 
away from the 'default' configurations and network stack/configuration 
requirements of "the interface is up and has an IP" that seems to be the 
'usual' for network management software (as stated in the SystemD 
documentation).

Thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20210514/d95bd8a1/attachment.html>