[Bug 2018500] Re: tc: crash on malformed reply from kernel

Steven Relf 2018500 at bugs.launchpad.net
Thu Oct 5 11:26:10 UTC 2023


We have deployed the package to our canary hypervisor. Will monitor for
5 days and report back.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/2018500

Title:
  tc: crash on malformed reply from kernel

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive zed series:
  Fix Committed
Status in openvswitch package in Ubuntu:
  Fix Released

Bug description:
  Log excerpt:
  2023-04-27T12:48:47.133Z|00001|poll_loop(handler27)|DBG|wakeup due to [POLLIN] on fd 52 (NETLINK_GENERIC<->NETLINK_GENERIC) at ../lib/dpif-netlink.c:3195
  2023-04-27T12:48:47.133Z|00002|netlink_socket(handler27)|DBG|Dropped 844 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
  2023-04-27T12:48:47.133Z|00003|netlink_socket(handler27)|DBG|nl_sock_recv__ (Success): nl(len:222, type=39(ovs_packet), flags=0, seq=0, pid=0,genl(cmd=1,version=1)
  2023-04-27T12:48:47.133Z|00004|dpif(handler27)|DBG|Dropped 151 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
  2023-04-27T12:48:47.133Z|00005|dpif(handler27)|DBG|system at ovs-system: miss upcall:
  recirc_id(0),dp_hash(0),skb_priority(0),in_port(34),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=fa:16:3e:97:8a:8a,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.142.11.28,tip=10.142.11.222,op=1,sha=fa:16:3e:97:8a:8a,tha=00:00:00:00:00:00)
  arp,vlan_tci=0x0000,dl_src=fa:16:3e:97:8a:8a,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=10.142.11.28,arp_tpa=10.142.11.222,arp_op=1,arp_sha=fa:16:3e:97:8a:8a,arp_tha=00:00:00:00:00:00
  2023-04-27T12:48:47.134Z|00006|util(handler27)|EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert()

  Stack trace:
  (gdb) bt
  #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140421195884096) at ./nptl/pthread_kill.c:44
  #1  __pthread_kill_internal (signo=6, threadid=140421195884096) at ./nptl/pthread_kill.c:78
  #2  __GI___pthread_kill (threadid=140421195884096, signo=signo at entry=6) at ./nptl/pthread_kill.c:89
  #3  0x00007fb873442476 in __GI_raise (sig=sig at entry=6) at ../sysdeps/posix/raise.c:26
  #4  0x00007fb8734287f3 in __GI_abort () at ./stdlib/abort.c:79
  #5  0x0000561dac108ea2 in ovs_abort_valist (args=0x7fb65b7b21c0, format=0x561dac1acf60 "%s: assertion %s failed in %s()", err_no=0) at ../lib/util.c:444
  #6  vlog_abort_valist (args=0x7fb65b7b21c0, message=0x561dac1acf60 "%s: assertion %s failed in %s()", module_=<optimized out>) at ../lib/vlog.c:1249
  #7  vlog_abort (module=<optimized out>, message=message at entry=0x561dac1acf60 "%s: assertion %s failed in %s()") at ../lib/vlog.c:1263
  #8  0x0000561dac108eeb in ovs_assert_failure (where=<optimized out>, function=<optimized out>, condition=<optimized out>) at ../lib/util.c:86
  #9  0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194
  #10 tc_replace_flower (id=<optimized out>, flower=<optimized out>) at ../lib/tc.c:3223
  #11 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, 
      ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at ../lib/netdev-offload-tc.c:2096
  #12 0x0000561dac117541 in netdev_flow_put (stats=<optimized out>, info=0x7fb65b7ba780, ufid=<optimized out>, act_len=<optimized out>, actions=<optimized out>, 
      match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257
  #13 parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2297
  #14 try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2384
  #15 dpif_netlink_operate (dpif_=0x561dad0ad550, ops=0x7fb65b7bb820, n_ops=<optimized out>, offload_type=DPIF_OFFLOAD_AUTO) at ../lib/dpif-netlink.c:2455
  #16 0x0000561dac080969 in dpif_operate (dpif=0x561dad0ad550, ops=0x7fb65b7bb820, n_ops=2, offload_type=<optimized out>) at ../lib/dpif.c:1372
  #17 0x0000561dac031f9d in handle_upcalls (n_upcalls=1, upcalls=0x7fb65b7dd620, udpif=0x561dad0b58d0) at ../ofproto/ofproto-dpif-upcall.c:1662
  #18 recv_upcalls (handler=handler at entry=0x561dad0c39a0) at ../ofproto/ofproto-dpif-upcall.c:900
  #19 0x0000561dac032994 in udpif_upcall_handler (arg=0x561dad0c39a0) at ../ofproto/ofproto-dpif-upcall.c:800
  #20 0x0000561dac0e5363 in ovsthread_wrapper (aux_=<optimized out>) at ../lib/ovs-thread.c:422
  --Type <RET> for more, q to quit, c to continue without paging--
  #21 0x00007fb873494b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
  #22 0x00007fb873526a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

  Details:
  (gdb) frame 10
  #10 tc_replace_flower (id=<optimized out>, flower=<optimized out>) at ../lib/tc.c:3223
  3223                ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc);
  (gdb) print *reply
  $8 = {base = 0x7fb650005d70, data = 0x7fb650005d70, size = 0, allocated = 1024, header = 0x0, msg = 0x0, list_node = {prev = 0xcccccccccccccccc, 
      next = 0xcccccccccccccccc}, source = OFPBUF_MALLOC}
  (gdb) frame 11
  #11 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, 
      ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at ../lib/netdev-offload-tc.c:2096
  2096        err = tc_replace_flower(&id, &flower);
  (gdb) print sizeof flower
  $9 = 16536

  I can instrument the crash with the following patch:
  diff --git a/lib/tc.c b/lib/tc.c
  index 5c32c6f97..ff56e5c3b 100644
  --- a/lib/tc.c
  +++ b/lib/tc.c
  @@ -242,6 +242,10 @@ tc_transact(struct ofpbuf *request, struct ofpbuf **replyp)
   {
       int error = nl_transact(NETLINK_ROUTE, request, replyp);
       ofpbuf_uninit(request);
  +    if(!error && replyp) {
  +        /* instrument EAGAIN situation */
  +        ofpbuf_reinit(*replyp, (*replyp)->allocated);
  +    }
       return error;
   }
   
  If you look at the nl_sock_recv__ function in lib/netlink-socket.c [0], which is ultimately called when calling nl_transact, this is the situation you would have if the recvmsg call would result in EAGAIN.

  0:
  https://github.com/openvswitch/ovs/blob/77d82289857f5cdcaaf4be06e17e750edcf0abd3/lib/netlink-
  socket.c#L712-L725

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2018500/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list