ACK: [SRU Bionic/Focal] netprio_cgroup: Fix unlimited memory leak of v2 cgroups

Colin Ian King colin.king at canonical.com
Wed Nov 25 23:31:56 UTC 2020


On 25/11/2020 23:19, Thadeu Lima de Souza Cascardo wrote:
> From: Zefan Li <lizefan at huawei.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1886859
> 
> If systemd is configured to use hybrid mode which enables the use of
> both cgroup v1 and v2, systemd will create new cgroup on both the default
> root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
> task to the two cgroups. If the task does some network thing then the v2
> cgroup can never be freed after the session exited.
> 
> One of our machines ran into OOM due to this memory leak.
> 
> In the scenario described above when sk_alloc() is called
> cgroup_sk_alloc() thought it's in v2 mode, so it stores
> the cgroup pointer in sk->sk_cgrp_data and increments
> the cgroup refcnt, but then sock_update_netprioidx()
> thought it's in v1 mode, so it stores netprioidx value
> in sk->sk_cgrp_data, so the cgroup refcnt will never be freed.
> 
> Currently we do the mode switch when someone writes to the ifpriomap
> cgroup control file. The easiest fix is to also do the switch when
> a task is attached to a new cgroup.
> 
> Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
> Reported-by: Yang Yingliang <yangyingliang at huawei.com>
> Tested-by: Yang Yingliang <yangyingliang at huawei.com>
> Signed-off-by: Zefan Li <lizefan at huawei.com>
> Acked-by: Tejun Heo <tj at kernel.org>
> Signed-off-by: Jakub Kicinski <kuba at kernel.org>
> (cherry picked from commit 090e28b229af92dc5b40786ca673999d59e73056)
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
> ---
>  net/core/netprio_cgroup.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index b9057478d69c..239786608ee4 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -240,6 +240,8 @@ static void net_prio_attach(struct cgroup_taskset *tset)
>  	struct task_struct *p;
>  	struct cgroup_subsys_state *css;
>  
> +	cgroup_sk_alloc_disable();
> +
>  	cgroup_taskset_for_each(p, css, tset) {
>  		void *v = (void *)(unsigned long)css->cgroup->id;
>  
> 

Makes sense; it's and upstream cherry pick, so..

Acked-by: Colin Ian King <colin.king at canonical.com>



More information about the kernel-team mailing list