[SRU B 1/1] UBUNTU: SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2 cgroups"

Thadeu Lima de Souza Cascardo cascardo at canonical.com
Wed Jul 8 21:31:03 UTC 2020


BugLink: https://bugs.launchpad.net/bugs/1886668

This reverts commit 5eebba2159d707ae9533a52839e1ba71754c4426, which is
commit 090e28b229af92dc5b40786ca673999d59e73056 upstream.

There is a crash related to a possible use-after-free of cgroups when
cgroup BPF is user with INET_INGRESS or INET_EGRESS.

[ 696.396993] RIP: 0010:__cgroup_bpf_run_filter_skb+0xbb/0x1e0
[ 696.397005] RSP: 0018:ffff893fdcb83a70 EFLAGS: 00010292
[ 696.397015] RAX: 6d69546e6f697469 RBX: 0000000000000000 RCX: 0000000000000014
[ 696.397028] RDX: 0000000000000000 RSI: ffff893fd0360000 RDI: ffff893fb5154800
[ 696.397041] RBP: ffff893fdcb83ad0 R08: 0000000000000001 R09: 0000000000000000
[ 696.397058] R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000014
[ 696.397075] R13: ffff893fb5154800 R14: 0000000000000020 R15: ffff893fc6ba4d00
[ 696.397091] FS: 0000000000000000(0000) GS:ffff893fdcb80000(0000) knlGS:0000000000000000
[ 696.397107] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 696.397119] CR2: 000000c0001b4000 CR3: 00000006dce0a004 CR4: 00000000003606e0
[ 696.397135] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 696.397152] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 696.397169] Call Trace:
[ 696.397175] <IRQ>
[ 696.397183] sk_filter_trim_cap+0xd0/0x1b0
[ 696.397191] tcp_v4_rcv+0x8b7/0xa80
[ 696.397199] ip_local_deliver_finish+0x66/0x210
[ 696.397208] ip_local_deliver+0x7e/0xe0
[ 696.397215] ? ip_rcv_finish+0x430/0x430
[ 696.397223] ip_rcv_finish+0x129/0x430
[ 696.397230] ip_rcv+0x296/0x360
[ 696.397238] ? inet_del_offload+0x40/0x40
[ 696.397249] __netif_receive_skb_core+0x432/0xb80
[ 696.397261] ? skb_send_sock+0x50/0x50
[ 696.397271] ? tcp4_gro_receive+0x137/0x1a0
[ 696.397280] __netif_receive_skb+0x18/0x60
[ 696.397290] ? __netif_receive_skb+0x18/0x60
[ 696.397300] netif_receive_skb_internal+0x45/0xe0
[ 696.397309] napi_gro_receive+0xc5/0xf0
[ 696.397317] xennet_poll+0x9ca/0xbc0
[ 696.397325] net_rx_action+0x140/0x3a0
[ 696.397334] __do_softirq+0xe4/0x2d4
[ 696.397344] irq_exit+0xc5/0xd0
[ 696.397352] xen_evtchn_do_upcall+0x30/0x50
[ 696.397361] xen_hvm_callback_vector+0x90/0xa0
[ 696.397371] </IRQ>
[ 696.397378] RIP: 0010:native_safe_halt+0x12/0x20
[ 696.397390] RSP: 0018:ffff94c4862cbe80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c
[ 696.397405] RAX: ffffffff8efc1800 RBX: 0000000000000006 RCX: 0000000000000000
[ 696.397419] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 696.397435] RBP: ffff94c4862cbe80 R08: 0000000000000002 R09: 0000000000000001
[ 696.397449] R10: 0000000000100000 R11: 0000000000000397 R12: 0000000000000006
[ 696.397462] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 696.397479] ? __sched_text_end+0x1/0x1
[ 696.397489] default_idle+0x20/0x100
[ 696.397499] arch_cpu_idle+0x15/0x20
[ 696.397507] default_idle_call+0x23/0x30
[ 696.397515] do_idle+0x172/0x1f0
[ 696.397522] cpu_startup_entry+0x73/0x80
[ 696.397530] start_secondary+0x1ab/0x200
[ 696.397538] secondary_startup_64+0xa5/0xb0
[ 696.397545] Code: 89 5d b0 49 29 cc 45 01 a7 80 00 00 00 44 89 e1 48 29 c8 48 89 4d a8 49 89 87 d8 00 00 00 89 d2 48 8d 84 d6 38 03 00 00 48 8b 00 <4c> 8b 70 10 4c 8d 68 10 4d 85 f6 0f 84 f6 00 00 00 49 8d 47 30
[ 696.397584] RIP: __cgroup_bpf_run_filter_skb+0xbb/0x1e0 RSP: ffff893fdcb83a70
[ 696.397607] ---[ end trace ec5c84424d511a6f ]---
[ 696.397616] Kernel panic - not syncing: Fatal exception in interrupt
[ 696.397876] Kernel Offset: 0xd600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

This is caused by net_cls and net_prio cgroups disabling cgroup BPF and
causing it to stop refcounting when allocating new sockets. Releasing those
sockets will cause the refcount to go negative, leading to the potential
use-after-free.

Though this revert won't prevent the issue from happening as it could still
theoretically be caused by setting net_cls.classid or net_prio.ifpriomap,
this will prevent it from happening on default system configurations. A
combination of systemd use of cgroup BPF and extensive cgroup use including
net_prio will cause this. Reports usually involve using lxd, libvirt,
docker or kubernetes and some systemd service with IPAddressDeny or
IPAddressAllow.

And though this patch has been introduced to avoid some potential memory
leaks, the cure is worse than the disease. We will need to revisit both
issues later on and reapply this patch when we have a real fix for the
crash.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
---
 net/core/netprio_cgroup.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 239786608ee4..b9057478d69c 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -240,8 +240,6 @@ static void net_prio_attach(struct cgroup_taskset *tset)
 	struct task_struct *p;
 	struct cgroup_subsys_state *css;
 
-	cgroup_sk_alloc_disable();
-
 	cgroup_taskset_for_each(p, css, tset) {
 		void *v = (void *)(unsigned long)css->cgroup->id;
 
-- 
2.25.1




More information about the kernel-team mailing list