[Bug 2018500] Re: tc: crash on malformed reply from kernel
Steven Relf
2018500 at bugs.launchpad.net
Thu Oct 5 11:26:10 UTC 2023
We have deployed the package to our canary hypervisor. Will monitor for
5 days and report back.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/2018500
Title:
tc: crash on malformed reply from kernel
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive zed series:
Fix Committed
Status in openvswitch package in Ubuntu:
Fix Released
Bug description:
Log excerpt:
2023-04-27T12:48:47.133Z|00001|poll_loop(handler27)|DBG|wakeup due to [POLLIN] on fd 52 (NETLINK_GENERIC<->NETLINK_GENERIC) at ../lib/dpif-netlink.c:3195
2023-04-27T12:48:47.133Z|00002|netlink_socket(handler27)|DBG|Dropped 844 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
2023-04-27T12:48:47.133Z|00003|netlink_socket(handler27)|DBG|nl_sock_recv__ (Success): nl(len:222, type=39(ovs_packet), flags=0, seq=0, pid=0,genl(cmd=1,version=1)
2023-04-27T12:48:47.133Z|00004|dpif(handler27)|DBG|Dropped 151 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
2023-04-27T12:48:47.133Z|00005|dpif(handler27)|DBG|system at ovs-system: miss upcall:
recirc_id(0),dp_hash(0),skb_priority(0),in_port(34),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=fa:16:3e:97:8a:8a,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.142.11.28,tip=10.142.11.222,op=1,sha=fa:16:3e:97:8a:8a,tha=00:00:00:00:00:00)
arp,vlan_tci=0x0000,dl_src=fa:16:3e:97:8a:8a,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=10.142.11.28,arp_tpa=10.142.11.222,arp_op=1,arp_sha=fa:16:3e:97:8a:8a,arp_tha=00:00:00:00:00:00
2023-04-27T12:48:47.134Z|00006|util(handler27)|EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert()
Stack trace:
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140421195884096) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140421195884096) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140421195884096, signo=signo at entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007fb873442476 in __GI_raise (sig=sig at entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007fb8734287f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x0000561dac108ea2 in ovs_abort_valist (args=0x7fb65b7b21c0, format=0x561dac1acf60 "%s: assertion %s failed in %s()", err_no=0) at ../lib/util.c:444
#6 vlog_abort_valist (args=0x7fb65b7b21c0, message=0x561dac1acf60 "%s: assertion %s failed in %s()", module_=<optimized out>) at ../lib/vlog.c:1249
#7 vlog_abort (module=<optimized out>, message=message at entry=0x561dac1acf60 "%s: assertion %s failed in %s()") at ../lib/vlog.c:1263
#8 0x0000561dac108eeb in ovs_assert_failure (where=<optimized out>, function=<optimized out>, condition=<optimized out>) at ../lib/util.c:86
#9 0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194
#10 tc_replace_flower (id=<optimized out>, flower=<optimized out>) at ../lib/tc.c:3223
#11 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>,
ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at ../lib/netdev-offload-tc.c:2096
#12 0x0000561dac117541 in netdev_flow_put (stats=<optimized out>, info=0x7fb65b7ba780, ufid=<optimized out>, act_len=<optimized out>, actions=<optimized out>,
match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257
#13 parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2297
#14 try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2384
#15 dpif_netlink_operate (dpif_=0x561dad0ad550, ops=0x7fb65b7bb820, n_ops=<optimized out>, offload_type=DPIF_OFFLOAD_AUTO) at ../lib/dpif-netlink.c:2455
#16 0x0000561dac080969 in dpif_operate (dpif=0x561dad0ad550, ops=0x7fb65b7bb820, n_ops=2, offload_type=<optimized out>) at ../lib/dpif.c:1372
#17 0x0000561dac031f9d in handle_upcalls (n_upcalls=1, upcalls=0x7fb65b7dd620, udpif=0x561dad0b58d0) at ../ofproto/ofproto-dpif-upcall.c:1662
#18 recv_upcalls (handler=handler at entry=0x561dad0c39a0) at ../ofproto/ofproto-dpif-upcall.c:900
#19 0x0000561dac032994 in udpif_upcall_handler (arg=0x561dad0c39a0) at ../ofproto/ofproto-dpif-upcall.c:800
#20 0x0000561dac0e5363 in ovsthread_wrapper (aux_=<optimized out>) at ../lib/ovs-thread.c:422
--Type <RET> for more, q to quit, c to continue without paging--
#21 0x00007fb873494b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#22 0x00007fb873526a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Details:
(gdb) frame 10
#10 tc_replace_flower (id=<optimized out>, flower=<optimized out>) at ../lib/tc.c:3223
3223 ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc);
(gdb) print *reply
$8 = {base = 0x7fb650005d70, data = 0x7fb650005d70, size = 0, allocated = 1024, header = 0x0, msg = 0x0, list_node = {prev = 0xcccccccccccccccc,
next = 0xcccccccccccccccc}, source = OFPBUF_MALLOC}
(gdb) frame 11
#11 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>,
ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at ../lib/netdev-offload-tc.c:2096
2096 err = tc_replace_flower(&id, &flower);
(gdb) print sizeof flower
$9 = 16536
I can instrument the crash with the following patch:
diff --git a/lib/tc.c b/lib/tc.c
index 5c32c6f97..ff56e5c3b 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -242,6 +242,10 @@ tc_transact(struct ofpbuf *request, struct ofpbuf **replyp)
{
int error = nl_transact(NETLINK_ROUTE, request, replyp);
ofpbuf_uninit(request);
+ if(!error && replyp) {
+ /* instrument EAGAIN situation */
+ ofpbuf_reinit(*replyp, (*replyp)->allocated);
+ }
return error;
}
If you look at the nl_sock_recv__ function in lib/netlink-socket.c [0], which is ultimately called when calling nl_transact, this is the situation you would have if the recvmsg call would result in EAGAIN.
0:
https://github.com/openvswitch/ovs/blob/77d82289857f5cdcaaf4be06e17e750edcf0abd3/lib/netlink-
socket.c#L712-L725
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2018500/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list