APPLIED: [SRU][Jammy][PATCH 0/1] isolcpus are ignored when using cgroups V2, causing processes to have wrong affinity
Stefan Bader
stefan.bader at canonical.com
Fri Aug 16 13:52:39 UTC 2024
On 14.08.24 05:41, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/2076957
>
> [Impact]
>
> In latency sensitive environments, it is very common to use isolcpus to reserve
> a set of cpus that no other processes are to be placed on, and run just dpdk in
> poll mode.
>
> There is a bug in the jammy kernel, where if cgroups V2 are enabled, after
> several minutes the kernel will place other processes onto these reserved
> isolcpus at random. This disturbs dpdk and introduces latency.
>
> The issue does not occur with cgroups V1, so a workaround is to use cgroups V1
> instead of V2 for the moment.
>
> [Fix]
>
> I arrived at this commit after a full git bisect, which fixes the issue. It
> landed in 6.2-rc1:
>
> commit 7fd4da9c1584be97ffbc40e600a19cb469fd4e78
> Author: Waiman Long <longman at redhat.com>
> Date: Sat Nov 12 17:19:39 2022 -0500
> Subject: cgroup/cpuset: Optimize cpuset_attach() on v2
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fd4da9c1584be97ffbc40e600a19cb469fd4e78
>
> Only the 5.15 Jammy kernel needs this fix. Focal works correctly as is.
>
> The commit skips calls to cpuset_attach() if the underlying cpusets or memory
> have not changed in a cgroup, and it seems to fix the issue.
>
> [Testcase]
>
> Deploy a bare metal server, ideally with a number of cores, 56 should be plenty.
> Use Jammy, with the 5.15 GA kernel.
>
> 1) Edit /etc/default/grub and set GRUB_CMDLINE_LINUX_DEFAULT to have
> "isolcpus=4-7,32-35 rcu_nocb_poll rcu_nocbs=4-7,32-35 systemd.unified_cgroup_hierarchy=1"
> 2) sudo reboot
> 3) sudo cat /sys/devices/system/cpu/isolated
> 4-7,32-35
> 4) sudo apt install s-tui stress
> 5) sudo s-tui
> 6) htop
> 7) $ while true; do sudo ps -eLF | head -n 1; sudo ps -eLF | grep stress | awk -v a="4" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="5" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="6" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="7" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="32" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="33" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="34" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="35" '$9 == a {print;}'; sleep 5; done
>
> Setup isolcpus to separate off 4-7 and 32-35, so each NUMA node has a set of
> isolated CPUs.
>
> s-tui is a great frontend for stress, and it starts stress processes. All stress
> processes should initially be on non-isolated CPUs, confirm this with htop, that
> 4-7 and 32-25 are at 0% while every other cpu is at 100%.
>
> After 3 minutes, but sometimes it takes up to 10 minutes, a stress process, or
> the s-tui process will be incorrectly placed onto an isolated cpu, causing it to
> increase in usage in htop. The while script checking ps with cpu affinities will
> also likely be printing the incorrectly placed process.
>
> A test kernel is available in the following ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf391137-test
>
> If you install it, the processes will not be placed onto the isolated cpus.
>
> [Where problems could occur]
>
> The patch changes how cgroups determines when cpuset_attach() should be called.
> cpuset_attach() is currently called very frequently in the 5.15 Jammy kernel,
> but most operations should be NOP due to no changes occurring in cpusets or
> memory in the cgroup the process is attached to. We are changing it to instead
> skip calling cpuset_attach() if there are no changes, which should offer a small
> performance increase, as well as fixing this isolcpus bug.
>
> If a regression were to occur, it would affect cgroups V2 only, and it could
> cause resource limits to be applied incorrectly in the worst case.
>
> Waiman Long (1):
> cgroup/cpuset: Optimize cpuset_attach() on v2
>
> kernel/cgroup/cpuset.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
Applied to jammy:linux/master-next. Thanks.
-Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xE8675DEECBEECEA3.asc
Type: application/pgp-keys
Size: 48643 bytes
Desc: OpenPGP public key
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20240816/9aff65ad/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20240816/9aff65ad/attachment-0001.sig>
More information about the kernel-team
mailing list