Packaging policy discussion: After=network-online.target

Dan Streetman ddstreet at canonical.com
Sat May 15 13:52:35 UTC 2021


On Wed, May 12, 2021 at 3:52 AM Christopher James Halse Rogers
<raof at ubuntu.com> wrote:
>
> Hello everyone,
>
> There's an nfs-utils SRU¹ hanging around waiting for a policy decision
> on use of the After=network-online.target systemd unit dependency. I'm
> not an expert here, but it looks like part of my SRU rotation today is
> starting the discussion on this so we can resolve it one way or another!

Goodness this email thread has a lot of different directions.

Just a few observations that might help:

1) what does it actually mean for systemd-networkd to consider
networking 'online'?

To be specific, 'network-online.target' simply calls
'systemd-networkd-wait-online' which has its own man page which is
very descriptive about what exactly it waits for.

To briefly summarize, It means all systemd-networkd managed interfaces
that are 'required for online' have reached a setup state of
'configured' or 'failed' and at least one managed interface has
reached operational state of 'degraded' or higher. Any interfaces that
should not be required should have their .network file include
'RequiredForOnline=no' in their [LINK] section (see man
systemd.network). The 'degraded' state of an interface means it has
carrier and a valid local link address (the next step up is 'routable'
which means it has a routable address configured).

Note that systemd-networkd isn't the only provider of network
management; NetworkManager also does and it also has a service
implementing (or more accurately WantedBy) network-online.target,
which is NetworkManager-wait-online.service. That very likely has a
different definition of exactly what it means for the network to be
'online'.

2) what's the downside of something requiring network-online.target?

The only downside is the delay of the service(s) that is/are
configured with After=network-online.target. Any such service will be
delayed at boot until the network manager (whatever it is) decides the
network is "up" (as mentioned above). However, that of course also
delays any services which order themselves after the delayed
service(s).

To the end user, this typically is seen as a 'hang' during boot. The
specific reason is there are services/targets that order themselves
after network-online.target, that also are ordered before services
that provide user login. In a default cloud image system, the specific
packages that introduce this problem are cloud-init and open-iscsi.

For example here is the startup plot of a plain hirsute cloud-init vm,
with the only modification being adding a second interface (with no
connection to anything) and adding systemd-networkd config to start
dhcp on the second interface (which of course will delay the network
starting since the dhcp will never get an answer). You can see that
systemd-user-sessions is delayed until after network-online.target,
which 'hangs' the boot:
https://people.canonical.com/~ddstreet/startups/startup-plain.svg

And here is the same vm, with cloud-init and open-iscsi removed. Note
that network-online.target isn't in the units started at boot, so
there is no delay for anything.
https://people.canonical.com/~ddstreet/startups/startup-without-cloud-init-open-iscsi.svg

And again, but with a simple service 'dummy.service' that does nothing
and has Wants=network-online.target and WantedBy=multi-user.target
(this service pulls network-online.target into the units started at
boot). This shows that systemd-user-sessions isn't delayed, and so
login is not delayed and there is no 'hang' during boot, but the
network-online target is delayed, as expected; it just has no impact
on how long boot takes to reach user login.
https://people.canonical.com/~ddstreet/startups/startup-without-cloud-init-open-iscsi-with-network-online.svg

Finally to illustrate the boot ordering problems that open-iscsi
introduces, the dummy.service is changed to want network-online.target
and remote-fs-pre.target, and order itself between those, just as
open-iscsi does (specifically, After=network-online.target and
Before=remote-fs-pre.target):
https://people.canonical.com/~ddstreet/startups/startup-without-cloud-init-open-iscsi-dummy-delay.svg

Note that this isn't *necessarily* a bug in open-iscsi, as it kind of
makes sense; if user login does in fact require an iscsi-mounted
directory, then systemd-user-sessions should be ordered after
open-iscsi, and of course open-iscsi requires networking to work.
However, there clearly is subtlety in the reality of the dependency
chain that the current implementation doesn't have, for example even
if there are no iscsi mounts at all, open-iscsi adds this boot
ordering that delays user login until after network-online.

The cloud-init package introduces a similar ordering, but is much more
blunt about it; the cloud-init.service includes
After=systemd-network-wait-online.service and
Before=systemd-user-sessions.service.

To clarify, *any* package with systemd services/targets might
introduce unit ordering similar to this at boot time, so this isn't
necessarily just a problem added by open-iscsi and cloud-init, I just
used those packages for examples since they're included in the cloud
images by default.

3) So just adding After=network-online.target will cause the delay if
the network doesn't start up?

No, as shown in the second plot above, the network-online.target is
not part of the boot units by default, and it will only cause a delay
if some (enabled) service or target actually Wants= it. If no
(enabled) service/target Want=network-online.target, then it doesn't
matter how many services order themselves after it with
After=network-online.target, it won't be considered during boot and
there will be no delay (due to networking).

Additionally, by default the 'user login' stage of boot isn't ordered
after network-online, so it also requires some service or target to
introduce ordering between network-online and systemd-user-session,
similar to what open-iscsi and cloud-init do.

4) Should nfs-utils use network-online.target?

IMHO yes, but it should be careful about its overall ordering of
services. For example it shouldn't introduce a boot ordering
dependency on the network if no NFS mounts are defined.

5) Is this a systemd problem?

IMHO I don't think systemd is doing anything wrong by providing
network-online; however as some people mentioned in this thread,
network-online.target is about the most blunt instrument you could
think of, and there certainly seems to be a need for systemd to
provide more fine-grained network dependency controls. That's
certainly something that could be proposed/discussed with upstream
systemd. However it may also be something that is only implemented if
using systemd-networkd for networking. I have no idea how
NetworkManager does anything, or even if it properly implements
NetworkManager-wait-online.service in the same way systemd-networkd
does.



>
> I am not an expert in this area, but as I understand it, the tradeoff
> here is:
> 1. Without a dependency on After=network-online.target there is no
> guarantee that the network interface(s) will be usable at the time the
> nfs-utils unit triggers, and nfs-utils will fail if the relevant ntwork
> interface is not usable, or
> 2. With a dependency on After=network-online.target nfs-utils will
> reliably start, but if there are any interfaces which are configured
> but do not come up this will result in the boot hanging until the
> timeout is hit.
>
> In mitigation of (2), there are apparently a number of default packages
> which already have a dependency on After=network-online.target, so boot
> hanging if interfaces are down is the status quo?
>
> The obvious thing to do here would be to follow Debian, but as far as I
> can tell there is not currently a Debian policy about this - the best I
> can find is an ancient draft of a best-practises-guide² suggesting
> that pacakages SHOULD handle networking dynamically, but if they do not
> MUST have a dependency on After=network-online.target
>
> As far I understand it, handling networking dynamically requires
> upstream code changes (although maybe fairly simple code changes?).
>
> It seems unlikely that, whatever we decide, we'll immediately do a full
> sweep of the archive and fix everything, so it looks like our choice is
> between:
>
> 1. The long-term goal is to have no After=network-online.target
> dependencies in default boot (stretch goal: in main). Whenever we run
> into a package-fails-if-network-is-not-yet-up bug, we patch the code
> and submit upstream. Over time we audit existing users of
> After=network-online.target and patch them for dynamic networking, as
> time permits.
>
> 2. We don't expect to be able to reach no After=network-online.target
> dependencies in the default boot, so it's not a priority to avoid them.
> Whenever we run into a package-fails-if-network-is-not-yet-up bug, we
> add an After=network-online.target dependency.
>
> Option (1) seems to be the technically superior option (and is
> recommended by systemd upstream³), but appears to require more work. I
> have limited insight into how much work that would be; someone from
> Foundations or Server probably needs to weigh in on that.
>
> Option (2) seems to be formalisation of the status-quo, so would seem
> to be less work.
>
> Let the discussion begin!
>
> ¹: https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1918141
> ²:
> https://github.com/ajtowns/debian-init-policy/blob/master/systemd-best-practices.pad
> ³: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
>
>
>
> --
> ubuntu-devel mailing list
> ubuntu-devel at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel



More information about the ubuntu-devel mailing list