[PATCH 0/1][N/U] Enable lowlatency settings in the generic kernel

Fri Jan 26 19:25:48 UTC 2024

On Fri, 26 Jan 2024 at 16:19, Andrea Righi <andrea.righi at canonical.com> wrote:
>
> BugLink: https://bugs.launchpad.net/bugs/2051342
>
> [Impact]
>
> Ubuntu provides the "lowlatency" kernel: a kernel optimized for
> applications that have special "low latency" requirements.
>
> Currently, this kernel does not include any specific UBUNTU SAUCE
> patches to improve the extra "low latency" requirements, but the only
> difference is a small subset of .config options.
>
> Almost all these options are now configurable either at boot-time or
> even at run-time, with the only exception of CONFIG_HZ (250 in the
> generic kernel vs 1000 in the lowlatency kernel).
>
> Maintaining a separate kernel for a single config option seems a bit
> overkill and it is a significant cost of engineering hours, build time,
> regression testing time and resources. Not to mention the risk of the
> low-latency kernel falling behind and not being perfectly in sync with
> the latest generic kernel.
>
> Enabling the low-latency settings in the generic kernel has been
> evaluated before, but it has been never finalized due to the potential
> risk of performance regressions in CPU-intensive applications
> (increasing HZ from 250 to 1000 may introduce more kernel jitter in
> number crunching workloads). The outcome of the original proposal
> resulted in a re-classification of the lowlatency kernel as a
> desktop-oriented kernel, enabling additional low latency features (LP:
> #2023007).
>
> As we are approaching the release of the new Ubuntu 24.04 we may want to
> re-consider merging the low-latency settings in the generic kernel
> again.
>
> Following a detailed analysis of the specific low-latency features:
>
> - CONFIG_NO_HZ_FULL=y: enable access to "Full tickless mode" (shutdown
>   clock tick when possible across all the enabled CPUs if they are
>   either idle or running 1 task - reduce kernel jitter of running tasks
>   due to the periodic clock tick, must be enabled at boot time passing
>   `nohz_full=<cpu_list>`); this can actually help CPU-intensive
>   workloads and it could provide much more benefits than the CONFIG_HZ
>   difference (since it can potentially shutdown any kernel jitter on
>   specific CPUs), this one should really be enabled anyway, considering
>   that it is configurable at boot time
>
> - CONFIG_RCU_NOCB_CPU=y: move RCU callbacks from softirq context to
>   kthread context (reduce time spent in softirqs with preemption
>   disabled to improve the overall system responsiveness, at the cost of
>   introducing a potential performance penalty, because RCU callbacks are
>   not processed by kernel threads); this should be enabled as well,
>   since it is configurable at boot time (via the rcu_nocbs=<cpu_list>
>   parameter)
>
>  - CONFIG_RCU_LAZY=y: batch RCU callbacks and then flush them after a
>    timed delay instead of executing them immediately (c'an provide 5~10%
>    power-savings for idle or lightly-loaded systems, this is extremely
>    useful for laptops / portable devices -
>    https://lore.kernel.org/lkml/20221016162305.2489629-3-joel@joelfernandes.org/);
>    this has the potential to introduce significant performance
>    regressions, but in the Noble kernel we already have a SAUCE patch
>    that allows to enable/disable this option at boot time (see LP:
>    #2045492), and by default it will be disabled
>    (CONFIG_RCU_LAZY_DEFAULT_OFF=y)
>
>  - CONFIG_HZ=1000 last but not least, the only option that is *only*
>    tunable at compile time. As already mentioned there is a potential
>    risk of regressions for CPU-intensive applications, but they can be
>    mitigated (and maybe they could even outperformed) with NO_HZ_FULL.
>    On the other hand, HZ=1000 can improve system responsiveness, that
>    means most of the desktop and server applications will benefit from
>    this (the largest part of the server workloads is I/O bound, more
>    than CPU-bound, so they can benefit from having a kernel that can
>    react faster at switching tasks), not to mention the benefit for the
>    typical end users applications (gaming, live conferencing,
>    multimedia, etc.).
>
> With all of that in place we can provide a kernel that has the
> flexibility to be more responsive, more performant and more power
> efficient (therefore more "generic"), simply by tuning run-time and
> boot-time options.
>
> Moreover, once these changes are applied we will be able to deprecate
> the lowlatency kernel, saving engineering time and also reducing power
> consumption (required to build the kernel and do all the testing).
>
> Optionally, we can also provide the optimal "lowlatency" settings as a
> user-space package that would set the proper options in the kernel boot
> command line (GRUB, or similar).

Also it is no longer clear what the target workload is. Because for
many use cases we have been receiving requests for lowlatency kernels
in the server space, and in the public cloud space. And our cloud
kernels are used a lot for cloud desktop, cloud gaming, cloud
streaming solutions. As in they are very much interactive.

I do wish HZ was not a build-time constant macro, but a runtime
tunable - or like at least boot-time set once variable. As that indeed
would let us half the builds, and still allow cpu bound workloads
succeed.

>From an observability engineering point of view - everyone cares to
ssh into their nodes and expect the cursor to move. Imho on the
balance of probabilities it is worth to try this by default for all,
including setting hz to 1000.

>
> [Test case]
>
> There are plenty of benchmarks that can prove the validity of each one
> of the setting mentioned above, providing huge benefits in terms of
> system responsive.
>
> However, our main goal here is to mitigate as much as possible the risk
> of regression for CPU-intensive applications, so the test case should
> only be focused on this particular aspect, to evaluate the impact of
> this change in the worst case scenario.
>
> Test case (CPU-intensive stress test):
>
>  - stress-ng --matrix $(getconf _NPROCESSORS_ONLN) --timeout 5m --metrics-brief
>
> Metrics:
>
>  - measure the bogo ops printed to stdout (not a great metric for
>    real-world applications, but in this case it can show the impact of
>    the additional kernel jitter introduced by the different CONFIG_HZ)
>
> Results (linux-unstable 6.8.0-2.2, avg of 10 runs of 5min each):
>
>  - CONFIG_HZ=250            : 17415.60 bogo ops/s
>  - CONFIG_HZ=1000           : 14866.05 bogo ops/s
>  - CONFIG_HZ=1000+nohz_full : 18505.52 bogo ops/s
>
> Results confirm the theory about the performance drop of CPU-intensive
> workloads (-~14%), but also confirms the benefit of NO_HZ_FULL (+~6%)
> compared to the current HZ settings.
>
> Let's also keep in mind that this is the worst case scenario and a very
> specific one, where only HPC / scientific applications can be affected,
> and even in this case we can always compensate and actually get a better
> level performance exploiting the nohz_full capability.
>
> [Fix]
>
> Enable the .config options mentioned above in the generic kernel (only
> on amd64 and arm64 for now).
>
> [Regression potential]
>
> As already covered we may experience performance regressions in
> CPU-intensive (number crunching) applications (such as HPC for example),
> but they can be compensated by the NO_HZ_FULL boot-time option.
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

-- 
Dimitri

Sent from Ubuntu Pro
https://ubuntu.com/pro