[Bug 1916708] [NEW] udpif_revalidator crash in ofpbuf_resize__

Trent Lloyd 1916708 at bugs.launchpad.net
Wed Feb 24 07:35:50 UTC 2021


Public bug reported:

The udpif_revalidator thread crashed in ofpbuf_resize__ on openvswitch
2.9.2-0ubuntu0.18.04.3~cloud0 (on 16.04 from the xenial-queens cloud
archive, backported from the 18.04 release of the same version). Kernel
version was 4.4.0-159-generic.

The issue is suspected to still exist in upstream master as Feb
2021/v2.15.0 but has not been completed understood. Opening this bug to
track future occurances.

The general issue appears to be that the udpif_revaliditator thread tried
to expand a stack-allocated ofpbuf to fit a netlink reply with size 3204
but the buffer is of size 2048. This intentionally raises an assertion as
we can't expand the memory on the stack. 

The crash in __ofpbuf_resize__ appears due to OVS_NOT_REACHED() being
called because b->source = OFPBUF_STACK (the line number indicates it's the
default: case but this appears to be an optimiser quirk, b->source is
OFPBUF_STACK). We can't realloc() the buffer memory if it's allocated on
the stack.

This buffer is provided in #7 nl_sock_transact_multiple__ during the call
to nl_sock_recv__, specified as buf_txn->reply. In this specific case it
seems we found transactions[0] available and so we used that rather than
tmp_txn.
The original source of transactions (it's passed through most of the
function calls) appears to be op_auxdata allocated on the stack at the top
of the dpif_netlink_operate__ function (dpif-netlink.c:1875).

The size of this particular message was 3204, so 2048 went into the buffer
and 1156 went into the tail iovector setup inside nl_sock_recv__ which it
then tried to expand the ofpbuf to hold. Various nl_sock_* functions have
comments about the buffer ideally being the right size for optimal
performance (I guess to avoid the reallocation), but it seems like a
possible oversight in the dpif_netlink_operate__ workflow that the
nl_sock_* functions may ultimately want to try to expand that buffer and
then fail because of the stack allocation.

The relevant source tree can be found here:
git clone -b applied/2.9.2-0ubuntu0.18.04.3
https://git.launchpad.net/ubuntu/+source/openvswitch
https://git.launchpad.net/ubuntu/+source/openvswitch/tree/?h=applied/2.9.2-0ubuntu0.18.04.3

Thread 1 (Thread 0x7f3e0ffff700 (LWP 1539131)):
#0  0x00007f3ed30c8428 in __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f3ed30ca02a in __GI_abort () at abort.c:89
#2  0x00000000004e5035 in ofpbuf_resize__ (b=b at entry=0x7f3e0fffb050, new_headroom=<optimized out>, new_tailroom=new_tailroom at entry=1156) at ../lib/ofpbuf.c:262
#3  0x00000000004e5338 in ofpbuf_prealloc_tailroom (b=b at entry=0x7f3e0fffb050, size=size at entry=1156) at ../lib/ofpbuf.c:291
#4  0x00000000004e54e5 in ofpbuf_put_uninit (size=size at entry=1156, b=b at entry=0x7f3e0fffb050) at ../lib/ofpbuf.c:365
#5  ofpbuf_put (b=b at entry=0x7f3e0fffb050, p=p at entry=0x7f3e0ffcf0a0, size=size at entry=1156) at ../lib/ofpbuf.c:388
#6  0x00000000005392a6 in nl_sock_recv__ (sock=sock at entry=0x7f3e50009150, buf=0x7f3e0fffb050, wait=wait at entry=false) at ../lib/netlink-socket.c:705
#7  0x0000000000539474 in nl_sock_transact_multiple__ (sock=sock at entry=0x7f3e50009150, transactions=transactions at entry=0x7f3e0ffdff20, n=1, done=done at entry=0x7f3e0ffdfe10) at ../lib/netlink-socket.c:824
#8  0x000000000053980a in nl_sock_transact_multiple (sock=0x7f3e50009150, transactions=transactions at entry=0x7f3e0ffdff20, n=n at entry=1) at ../lib/netlink-socket.c:1009
#9  0x000000000053aa1b in nl_sock_transact_multiple (n=1, transactions=0x7f3e0ffdff20, sock=<optimized out>) at ../lib/netlink-socket.c:1765
#10 nl_transact_multiple (protocol=protocol at entry=16, transactions=transactions at entry=0x7f3e0ffdff20, n=n at entry=1) at ../lib/netlink-socket.c:1764
#11 0x0000000000528b01 in dpif_netlink_operate__ (dpif=dpif at entry=0x25a6150, ops=ops at entry=0x7f3e0fffaf28, n_ops=n_ops at entry=1) at ../lib/dpif-netlink.c:1964
#12 0x0000000000529956 in dpif_netlink_operate_chunks (n_ops=1, ops=0x7f3e0fffaf28, dpif=<optimized out>) at ../lib/dpif-netlink.c:2243
#13 dpif_netlink_operate (dpif_=0x25a6150, ops=<optimized out>, n_ops=<optimized out>) at ../lib/dpif-netlink.c:2279
#14 0x00000000004756de in dpif_operate (dpif=0x25a6150, ops=<optimized out>, ops at entry=0x7f3e0fffaf28, n_ops=n_ops at entry=1) at ../lib/dpif.c:1359
#15 0x00000000004758e7 in dpif_flow_get (dpif=<optimized out>, key=<optimized out>, key_len=<optimized out>, ufid=<optimized out>, pmd_id=<optimized out>, buf=buf at entry=0x7f3e0fffb050, flow=<optimized out>) at ../lib/dpif.c:1014
#16 0x000000000043f662 in ukey_create_from_dpif_flow (udpif=0x229cbf0, udpif=0x229cbf0, ukey=<synthetic pointer>, flow=0x7f3e0fffc790) at ../ofproto/ofproto-dpif-upcall.c:1709
#17 ukey_acquire (error=<synthetic pointer>, result=<synthetic pointer>, flow=0x7f3e0fffc790, udpif=0x229cbf0) at ../ofproto/ofproto-dpif-upcall.c:1914
#18 revalidate (revalidator=0x250eaa8) at ../ofproto/ofproto-dpif-upcall.c:2473
#19 0x000000000043f816 in udpif_revalidator (arg=0x250eaa8) at ../ofproto/ofproto-dpif-upcall.c:913
#20 0x00000000004ea4b4 in ovsthread_wrapper (aux_=<optimized out>) at ../lib/ovs-thread.c:348
#21 0x00007f3ed39756ba in start_thread (arg=0x7f3e0ffff700) at pthread_create.c:333
#22 0x00007f3ed319a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

** Affects: openvswitch (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: sts

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1916708

Title:
  udpif_revalidator crash in ofpbuf_resize__

Status in openvswitch package in Ubuntu:
  New

Bug description:
  The udpif_revalidator thread crashed in ofpbuf_resize__ on openvswitch
  2.9.2-0ubuntu0.18.04.3~cloud0 (on 16.04 from the xenial-queens cloud
  archive, backported from the 18.04 release of the same version).
  Kernel version was 4.4.0-159-generic.

  The issue is suspected to still exist in upstream master as Feb
  2021/v2.15.0 but has not been completed understood. Opening this bug
  to track future occurances.

  The general issue appears to be that the udpif_revaliditator thread tried
  to expand a stack-allocated ofpbuf to fit a netlink reply with size 3204
  but the buffer is of size 2048. This intentionally raises an assertion as
  we can't expand the memory on the stack. 

  The crash in __ofpbuf_resize__ appears due to OVS_NOT_REACHED() being
  called because b->source = OFPBUF_STACK (the line number indicates it's the
  default: case but this appears to be an optimiser quirk, b->source is
  OFPBUF_STACK). We can't realloc() the buffer memory if it's allocated on
  the stack.

  This buffer is provided in #7 nl_sock_transact_multiple__ during the call
  to nl_sock_recv__, specified as buf_txn->reply. In this specific case it
  seems we found transactions[0] available and so we used that rather than
  tmp_txn.
  The original source of transactions (it's passed through most of the
  function calls) appears to be op_auxdata allocated on the stack at the top
  of the dpif_netlink_operate__ function (dpif-netlink.c:1875).

  The size of this particular message was 3204, so 2048 went into the buffer
  and 1156 went into the tail iovector setup inside nl_sock_recv__ which it
  then tried to expand the ofpbuf to hold. Various nl_sock_* functions have
  comments about the buffer ideally being the right size for optimal
  performance (I guess to avoid the reallocation), but it seems like a
  possible oversight in the dpif_netlink_operate__ workflow that the
  nl_sock_* functions may ultimately want to try to expand that buffer and
  then fail because of the stack allocation.

  The relevant source tree can be found here:
  git clone -b applied/2.9.2-0ubuntu0.18.04.3
  https://git.launchpad.net/ubuntu/+source/openvswitch
  https://git.launchpad.net/ubuntu/+source/openvswitch/tree/?h=applied/2.9.2-0ubuntu0.18.04.3

  Thread 1 (Thread 0x7f3e0ffff700 (LWP 1539131)):
  #0  0x00007f3ed30c8428 in __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
  #1  0x00007f3ed30ca02a in __GI_abort () at abort.c:89
  #2  0x00000000004e5035 in ofpbuf_resize__ (b=b at entry=0x7f3e0fffb050, new_headroom=<optimized out>, new_tailroom=new_tailroom at entry=1156) at ../lib/ofpbuf.c:262
  #3  0x00000000004e5338 in ofpbuf_prealloc_tailroom (b=b at entry=0x7f3e0fffb050, size=size at entry=1156) at ../lib/ofpbuf.c:291
  #4  0x00000000004e54e5 in ofpbuf_put_uninit (size=size at entry=1156, b=b at entry=0x7f3e0fffb050) at ../lib/ofpbuf.c:365
  #5  ofpbuf_put (b=b at entry=0x7f3e0fffb050, p=p at entry=0x7f3e0ffcf0a0, size=size at entry=1156) at ../lib/ofpbuf.c:388
  #6  0x00000000005392a6 in nl_sock_recv__ (sock=sock at entry=0x7f3e50009150, buf=0x7f3e0fffb050, wait=wait at entry=false) at ../lib/netlink-socket.c:705
  #7  0x0000000000539474 in nl_sock_transact_multiple__ (sock=sock at entry=0x7f3e50009150, transactions=transactions at entry=0x7f3e0ffdff20, n=1, done=done at entry=0x7f3e0ffdfe10) at ../lib/netlink-socket.c:824
  #8  0x000000000053980a in nl_sock_transact_multiple (sock=0x7f3e50009150, transactions=transactions at entry=0x7f3e0ffdff20, n=n at entry=1) at ../lib/netlink-socket.c:1009
  #9  0x000000000053aa1b in nl_sock_transact_multiple (n=1, transactions=0x7f3e0ffdff20, sock=<optimized out>) at ../lib/netlink-socket.c:1765
  #10 nl_transact_multiple (protocol=protocol at entry=16, transactions=transactions at entry=0x7f3e0ffdff20, n=n at entry=1) at ../lib/netlink-socket.c:1764
  #11 0x0000000000528b01 in dpif_netlink_operate__ (dpif=dpif at entry=0x25a6150, ops=ops at entry=0x7f3e0fffaf28, n_ops=n_ops at entry=1) at ../lib/dpif-netlink.c:1964
  #12 0x0000000000529956 in dpif_netlink_operate_chunks (n_ops=1, ops=0x7f3e0fffaf28, dpif=<optimized out>) at ../lib/dpif-netlink.c:2243
  #13 dpif_netlink_operate (dpif_=0x25a6150, ops=<optimized out>, n_ops=<optimized out>) at ../lib/dpif-netlink.c:2279
  #14 0x00000000004756de in dpif_operate (dpif=0x25a6150, ops=<optimized out>, ops at entry=0x7f3e0fffaf28, n_ops=n_ops at entry=1) at ../lib/dpif.c:1359
  #15 0x00000000004758e7 in dpif_flow_get (dpif=<optimized out>, key=<optimized out>, key_len=<optimized out>, ufid=<optimized out>, pmd_id=<optimized out>, buf=buf at entry=0x7f3e0fffb050, flow=<optimized out>) at ../lib/dpif.c:1014
  #16 0x000000000043f662 in ukey_create_from_dpif_flow (udpif=0x229cbf0, udpif=0x229cbf0, ukey=<synthetic pointer>, flow=0x7f3e0fffc790) at ../ofproto/ofproto-dpif-upcall.c:1709
  #17 ukey_acquire (error=<synthetic pointer>, result=<synthetic pointer>, flow=0x7f3e0fffc790, udpif=0x229cbf0) at ../ofproto/ofproto-dpif-upcall.c:1914
  #18 revalidate (revalidator=0x250eaa8) at ../ofproto/ofproto-dpif-upcall.c:2473
  #19 0x000000000043f816 in udpif_revalidator (arg=0x250eaa8) at ../ofproto/ofproto-dpif-upcall.c:913
  #20 0x00000000004ea4b4 in ovsthread_wrapper (aux_=<optimized out>) at ../lib/ovs-thread.c:348
  #21 0x00007f3ed39756ba in start_thread (arg=0x7f3e0ffff700) at pthread_create.c:333
  #22 0x00007f3ed319a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1916708/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list