[Bug 1655117] [NEW] kernel BUG at skbuff.h:1486 Insufficient linear data in skb __skb_pull.part.7+0x4/0x6 [openvswitch]

Andrew Crawford acrawford at cctv.org
Mon Jan 9 19:08:53 UTC 2017


Public bug reported:

Since 2016-12-30 EST we have been experiencing repeated crashes of our
OpenStack Icehouse / Trusty Neutron node with a kernel BUG at skbuff.h
line 1486:

1471 /**
1472 * skb_peek - peek at the head of an &sk_buff_head
1473 * @list_: list to peek at
1474 *
1475 * Peek an &sk_buff. Unlike most other operations you _MUST_
1476 * be careful with this one. A peek leaves the buffer on the
1477 * list and someone else may run off with it. You must hold
1478 * the appropriate locks or have a private queue to do this.
1479 *
1480 * Returns %NULL for an empty list or a pointer to the head element.
1481 * The reference count is not incremented and the reference is therefore
1482 * volatile. Use with caution.
1483 */
1484 static inline struct sk_buff *skb_peek(const struct sk_buff_head *list_)
1485 {
1486 struct sk_buff *skb = list_->next;
1487
1488 if (skb == (struct sk_buff *)list_)
1489 skb = NULL;
1490 return skb;
1491 }

This generally results in a full panic crash of the Neutron node and
connectivity breaking for VMs within the cloud. However, after using
crash-dumptools to collect information on the crashes over the past
three days, the kernel loaded by kexec during the crashdump appears in
about 2 out of 3 crash instances to continue running, and we see a flap
of the neutron services instead of a full panic that brings the Neutron
server down and necessitates a hard reboot.

I believe that this is a manifestation of the openvswitch and issue
described on 2017-01-08 as:

"OVS can only process L2 packets. But OVS GRE receive handler
can accept IP-GRE packets. When such packet is processed by
OVS datapath it can trigger following assert failure due
to insufficient linear data in skb."

https://patchwork.ozlabs.org/patch/712373/

I have not tested the patch provided above yet.

Other information and a few sample dmesg outputs from the crash:
(multiple dumps available)

# lsb_release -rd
Description: Ubuntu 14.04.5 LTS
Release: 14.04

# apt-cache policy openvswitch
N: Unable to locate package openvswitch
root at neutron01:/var/crash# apt-cache policy openvswitch-common
openvswitch-common:
  Installed: 2.0.2-0ubuntu0.14.04.3
  Candidate: 2.0.2-0ubuntu0.14.04.3
  Version table:
 *** 2.0.2-0ubuntu0.14.04.3 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     2.0.1+git20140120-0ubuntu2 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

# apt-cache policy openvswitch-switch
openvswitch-switch:
  Installed: 2.0.2-0ubuntu0.14.04.3
  Candidate: 2.0.2-0ubuntu0.14.04.3
  Version table:
 *** 2.0.2-0ubuntu0.14.04.3 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     2.0.1+git20140120-0ubuntu2 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

# apt-cache policy neutron-plugin-openvswitch-agent
neutron-plugin-openvswitch-agent:
  Installed: 1:2014.1.5-0ubuntu7
  Candidate: 1:2014.1.5-0ubuntu7
  Version table:
 *** 1:2014.1.5-0ubuntu7 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:2014.1.3-0ubuntu1.1 0
        500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
     1:2014.1-0ubuntu1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

example dmesg:

############## dmesg.201701060019

> [33100.131019] ------------[ cut here ]------------
> [33100.131176] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
> [33100.131424] invalid opcode: 0000 [#1] SMP
> [33100.131560] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich cdc_ether x86_pkg_temp_thermal intel_powerclamp coretemp usbnet kvm_intel mii kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core lpc_ich wmi ipmi_si bonding shpchp ioatdma lp mac_hid parport ahci libahci sfc igb e1000e mtd dca i2c_algo_bit ptp pps_core megaraid_sas mdio
> [33100.133560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
> [33100.133800] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
> [33100.134096] task: ffff880469da4800 ti: ffff880469dae000 task.ti: ffff880469dae000
> [33100.134325] RIP: 0010:[<ffffffffa02321c9>] [<ffffffffa02321c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
> [33100.134628] RSP: 0018:ffff88046fd03bb0 EFLAGS: 00010297
> [33100.134792] RAX: ffff880035d73866 RBX: ffff880461efb600 RCX: ffff880035d73800
> [33100.135011] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
> [33100.135231] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880035d73800
> [33100.135451] R10: ffff880461efb600 R11: 0000000000000000 R12: ffff88046fd03c18
> [33100.135671] R13: ffff880866a88a80 R14: ffff88046fd03c18 R15: ffff880461e49480
> [33100.141118] FS: 0000000000000000(0000) GS:ffff88046fd00000(0000) knlGS:0000000000000000
> [33100.152198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [33100.157796] CR2: 00007fc30157d090 CR3: 0000000001c0e000 CR4: 00000000001407e0
> [33100.163382] Stack:
> [33100.168800] ffff88046fd03be0 ffffffffa022bbc5 ffffffff81cdaf00 ffff880461efb600
> [33100.179942] ffffe8fbefd04890 ffff880866a88a80 ffff88046fd03cc8 ffffffffa022a8c5
> [33100.191068] ffffffff81cdaf00 0000000000000001 ffff880866cb70c4 ffff8804541b6180
> [33100.202184] Call Trace:
> [33100.207553] <IRQ>
> [33100.207617]
> [33100.212849] [<ffffffffa022bbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
> [33100.218139] [<ffffffffa022a8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
> [33100.228464] [<ffffffffa0230b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
> [33100.233727] [<ffffffffa0231ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
> [33100.238898] [<ffffffffa0222745>] gre_cisco_rcv+0x65/0xba [gre]
> [33100.243974] [<ffffffffa02222cd>] gre_rcv+0x5d/0x80 [gre]
> [33100.248938] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
> [33100.253823] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
> [33100.258547] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
> [33100.263138] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
> [33100.267636] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
> [33100.272134] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
> [33100.276544] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
> [33100.280999] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
> [33100.285447] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
> [33100.289886] [<ffffffff81070315>] irq_exit+0x105/0x110
> [33100.294224] [<ffffffff81740066>] do_IRQ+0x56/0xc0
> [33100.298433] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
> [33100.302613] <EOI>
> [33100.302676]
> [33100.306717] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
> [33100.310816] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
> [33100.314828] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
> [33100.318732] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
> [33100.322479] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
> [33100.326138] [<ffffffff810427cd>] start_secondary+0x21d/0x2d0
> [33100.329686] Code: a0 e8 8c 86 e3 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 70 42 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
> [33100.340962] RIP [<ffffffffa02321c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
> [33100.344857] RSP <ffff88046fd03bb0>

############## dmesg.201701080127

[ 911.714512] ------------[ cut here ]------------
[ 911.714670] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
[ 911.714917] invalid opcode: 0000 [#1] SMP
[ 911.715053] Modules linked in: xt_nat xt_conntrack xt_REDIRECT xt_tcpudp ip6table_filter ip6_tables iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_devintf gpio_ich kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cdc_ether aesni_intel aes_x86_64 lrw gf128mul glue_helper usbnet ablk_helper cryptd sb_edac mii edac_core lpc_ich bonding mac_hid ipmi_si shpchp wmi lp ioatdma parport ahci sfc libahci igb e1000e mtd dca i2c_algo_bit ptp pps_core megaraid_sas mdio
[ 911.717060] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-106-generic #153-Ubuntu
[ 911.717301] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
[ 911.717597] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 911.717827] RIP: 0010:[<ffffffffa01c61c9>] [<ffffffffa01c61c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[ 911.718128] RSP: 0018:ffff88046fc03bb0 EFLAGS: 00010297
[ 911.718291] RAX: ffff880079de52e6 RBX: ffff880463335000 RCX: ffff880079de5280
[ 911.718511] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fc03c98
[ 911.718731] RBP: ffff88046fc03bb0 R08: 0000000000000000 R09: ffff880079de5280
[ 911.718951] R10: ffff880463335000 R11: 0000000000000000 R12: ffff88046fc03c18
[ 911.719171] R13: ffff880468b60c00 R14: ffff88046fc03c18 R15: ffff8804631a0b40
[ 911.724614] FS: 0000000000000000(0000) GS:ffff88046fc00000(0000) knlGS:0000000000000000
[ 911.735614] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 911.741214] CR2: 00007f1898042d70 CR3: 0000000001c0e000 CR4: 00000000001407f0
[ 911.746800] Stack:
[ 911.752201] ffff88046fc03be0 ffffffffa01bfbc5 ffffffff81cdaf00 ffff880463335000
[ 911.763305] ffffe8fbefc04890 ffff880468b60c00 ffff88046fc03cc8 ffffffffa01be8c5
[ 911.774433] ffffffff81cdaf00 0000000000000001 ffff8804675cf9c4 ffff88045941d380
[ 911.785550] Call Trace:
[ 911.790915] <IRQ>
[ 911.790979]
[ 911.796163] [<ffffffffa01bfbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
[ 911.801437] [<ffffffffa01be8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
[ 911.811769] [<ffffffffa01c4b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[ 911.817038] [<ffffffffa01c5ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
[ 911.822211] [<ffffffffa01b6745>] gre_cisco_rcv+0x65/0xba [gre]
[ 911.827280] [<ffffffffa01b62cd>] gre_rcv+0x5d/0x80 [gre]
[ 911.832213] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
[ 911.837094] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
[ 911.841810] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
[ 911.846397] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
[ 911.850889] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
[ 911.855384] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
[ 911.859796] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
[ 911.864208] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
[ 911.868654] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
[ 911.873093] [<ffffffff81070315>] irq_exit+0x105/0x110
[ 911.877442] [<ffffffff81740066>] do_IRQ+0x56/0xc0
[ 911.881654] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
[ 911.885832] <EOI>
[ 911.885896]
[ 911.889937] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
[ 911.894036] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
[ 911.898017] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
[ 911.901888] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
[ 911.905643] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
[ 911.909308] [<ffffffff8171b2e7>] rest_init+0x77/0x80
[ 911.912842] [<ffffffff81d34f6a>] start_kernel+0x432/0x43d
[ 911.916281] [<ffffffff81d34941>] ? repair_env_string+0x5c/0x5c
[ 911.919767] [<ffffffff81d34120>] ? early_idt_handler_array+0x120/0x120
[ 911.923347] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
[ 911.926859] [<ffffffff81d34733>] x86_64_start_kernel+0x143/0x152
[ 911.930305] Code: a0 e8 8c 46 ea e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 30 49 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
[ 911.940880] RIP [<ffffffffa01c61c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[ 911.944483] RSP <ffff88046fc03bb0>

############## dmesg.201701071542

[23738.192626] ------------[ cut here ]------------
[23738.192782] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
[23738.193031] invalid opcode: 0000 [#1] SMP
[23738.193167] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul cdc_ether crc32_pclmul usbnet mii ghash_clmulni_intel aesni_intel aes_x86_64 lrw lpc_ich sb_edac gf128mul glue_helper ablk_helper cryptd edac_core bonding wmi ipmi_si mac_hid shpchp lp ioatdma parport ahci libahci igb dca sfc e1000e mtd i2c_algo_bit ptp pps_core megaraid_sas mdio
[23738.195169] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
[23738.195410] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
[23738.195706] task: ffff880869959800 ti: ffff880469da4000 task.ti: ffff880469da4000
[23738.195936] RIP: 0010:[<ffffffffa02441c9>] [<ffffffffa02441c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[23738.196238] RSP: 0018:ffff88046fd03bb0 EFLAGS: 00010297
[23738.196402] RAX: ffff880453cad7e6 RBX: ffff88045d1e7200 RCX: ffff880453cad780
[23738.196622] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
[23738.196842] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880453cad780
[23738.197062] R10: ffff88045d1e7200 R11: 0000000000000000 R12: ffff88046fd03c18
[23738.197283] R13: ffff880466dbc0c0 R14: ffff88046fd03c18 R15: ffff880462a32f00
[23738.202738] FS: 0000000000000000(0000) GS:ffff88046fd00000(0000) knlGS:0000000000000000
[23738.213771] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[23738.219381] CR2: 00007efcd7eee090 CR3: 0000000001c0e000 CR4: 00000000001407e0
[23738.224978] Stack:
[23738.230390] ffff88046fd03be0 ffffffffa023dbc5 ffffffff81cdaf00 ffff88045d1e7200
[23738.241516] ffffe8fbefd04770 ffff880466dbc0c0 ffff88046fd03cc8 ffffffffa023c8c5
[23738.252668] ffffffff81cdaf00 0000000000000001 ffff880462a54244 ffff88045d1c4100
[23738.263818] Call Trace:
[23738.269200] <IRQ>
[23738.269264]
[23738.274454] [<ffffffffa023dbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
[23738.279737] [<ffffffffa023c8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
[23738.290071] [<ffffffffa0242b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[23738.295339] [<ffffffffa0243ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
[23738.300513] [<ffffffffa0206745>] gre_cisco_rcv+0x65/0xba [gre]
[23738.305587] [<ffffffffa02062cd>] gre_rcv+0x5d/0x80 [gre]
[23738.310531] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
[23738.315420] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
[23738.320146] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
[23738.324743] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
[23738.329244] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
[23738.333744] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
[23738.338158] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
[23738.342576] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
[23738.347025] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
[23738.351463] [<ffffffff81070315>] irq_exit+0x105/0x110
[23738.355804] [<ffffffff81740066>] do_IRQ+0x56/0xc0
[23738.360010] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
[23738.364183] <EOI>
[23738.364246]
[23738.368280] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
[23738.372372] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
[23738.376347] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
[23738.380212] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
[23738.383958] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
[23738.387612] [<ffffffff810427cd>] start_secondary+0x21d/0x2d0
[23738.391156] Code: a0 e8 8c 66 e2 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 50 41 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
[23738.402433] RIP [<ffffffffa02441c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[23738.406297] RSP <ffff88046fd03bb0>

###########################

** Affects: openvswitch (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "full dmesg example"
   https://bugs.launchpad.net/bugs/1655117/+attachment/4802045/+files/dmesg.201701051502

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1655117

Title:
  kernel BUG at skbuff.h:1486 Insufficient linear data in skb
  __skb_pull.part.7+0x4/0x6 [openvswitch]

Status in openvswitch package in Ubuntu:
  New

Bug description:
  Since 2016-12-30 EST we have been experiencing repeated crashes of our
  OpenStack Icehouse / Trusty Neutron node with a kernel BUG at skbuff.h
  line 1486:

  1471 /**
  1472 * skb_peek - peek at the head of an &sk_buff_head
  1473 * @list_: list to peek at
  1474 *
  1475 * Peek an &sk_buff. Unlike most other operations you _MUST_
  1476 * be careful with this one. A peek leaves the buffer on the
  1477 * list and someone else may run off with it. You must hold
  1478 * the appropriate locks or have a private queue to do this.
  1479 *
  1480 * Returns %NULL for an empty list or a pointer to the head element.
  1481 * The reference count is not incremented and the reference is therefore
  1482 * volatile. Use with caution.
  1483 */
  1484 static inline struct sk_buff *skb_peek(const struct sk_buff_head *list_)
  1485 {
  1486 struct sk_buff *skb = list_->next;
  1487
  1488 if (skb == (struct sk_buff *)list_)
  1489 skb = NULL;
  1490 return skb;
  1491 }

  This generally results in a full panic crash of the Neutron node and
  connectivity breaking for VMs within the cloud. However, after using
  crash-dumptools to collect information on the crashes over the past
  three days, the kernel loaded by kexec during the crashdump appears in
  about 2 out of 3 crash instances to continue running, and we see a
  flap of the neutron services instead of a full panic that brings the
  Neutron server down and necessitates a hard reboot.

  I believe that this is a manifestation of the openvswitch and issue
  described on 2017-01-08 as:

  "OVS can only process L2 packets. But OVS GRE receive handler
  can accept IP-GRE packets. When such packet is processed by
  OVS datapath it can trigger following assert failure due
  to insufficient linear data in skb."

  https://patchwork.ozlabs.org/patch/712373/

  I have not tested the patch provided above yet.

  Other information and a few sample dmesg outputs from the crash:
  (multiple dumps available)

  # lsb_release -rd
  Description: Ubuntu 14.04.5 LTS
  Release: 14.04

  # apt-cache policy openvswitch
  N: Unable to locate package openvswitch
  root at neutron01:/var/crash# apt-cache policy openvswitch-common
  openvswitch-common:
    Installed: 2.0.2-0ubuntu0.14.04.3
    Candidate: 2.0.2-0ubuntu0.14.04.3
    Version table:
   *** 2.0.2-0ubuntu0.14.04.3 0
          500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
          100 /var/lib/dpkg/status
       2.0.1+git20140120-0ubuntu2 0
          500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

  # apt-cache policy openvswitch-switch
  openvswitch-switch:
    Installed: 2.0.2-0ubuntu0.14.04.3
    Candidate: 2.0.2-0ubuntu0.14.04.3
    Version table:
   *** 2.0.2-0ubuntu0.14.04.3 0
          500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
          100 /var/lib/dpkg/status
       2.0.1+git20140120-0ubuntu2 0
          500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

  # apt-cache policy neutron-plugin-openvswitch-agent
  neutron-plugin-openvswitch-agent:
    Installed: 1:2014.1.5-0ubuntu7
    Candidate: 1:2014.1.5-0ubuntu7
    Version table:
   *** 1:2014.1.5-0ubuntu7 0
          500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
          100 /var/lib/dpkg/status
       1:2014.1.3-0ubuntu1.1 0
          500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
       1:2014.1-0ubuntu1 0
          500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

  example dmesg:

  ############## dmesg.201701060019

  > [33100.131019] ------------[ cut here ]------------
  > [33100.131176] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
  > [33100.131424] invalid opcode: 0000 [#1] SMP
  > [33100.131560] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich cdc_ether x86_pkg_temp_thermal intel_powerclamp coretemp usbnet kvm_intel mii kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core lpc_ich wmi ipmi_si bonding shpchp ioatdma lp mac_hid parport ahci libahci sfc igb e1000e mtd dca i2c_algo_bit ptp pps_core megaraid_sas mdio
  > [33100.133560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
  > [33100.133800] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
  > [33100.134096] task: ffff880469da4800 ti: ffff880469dae000 task.ti: ffff880469dae000
  > [33100.134325] RIP: 0010:[<ffffffffa02321c9>] [<ffffffffa02321c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
  > [33100.134628] RSP: 0018:ffff88046fd03bb0 EFLAGS: 00010297
  > [33100.134792] RAX: ffff880035d73866 RBX: ffff880461efb600 RCX: ffff880035d73800
  > [33100.135011] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
  > [33100.135231] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880035d73800
  > [33100.135451] R10: ffff880461efb600 R11: 0000000000000000 R12: ffff88046fd03c18
  > [33100.135671] R13: ffff880866a88a80 R14: ffff88046fd03c18 R15: ffff880461e49480
  > [33100.141118] FS: 0000000000000000(0000) GS:ffff88046fd00000(0000) knlGS:0000000000000000
  > [33100.152198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  > [33100.157796] CR2: 00007fc30157d090 CR3: 0000000001c0e000 CR4: 00000000001407e0
  > [33100.163382] Stack:
  > [33100.168800] ffff88046fd03be0 ffffffffa022bbc5 ffffffff81cdaf00 ffff880461efb600
  > [33100.179942] ffffe8fbefd04890 ffff880866a88a80 ffff88046fd03cc8 ffffffffa022a8c5
  > [33100.191068] ffffffff81cdaf00 0000000000000001 ffff880866cb70c4 ffff8804541b6180
  > [33100.202184] Call Trace:
  > [33100.207553] <IRQ>
  > [33100.207617]
  > [33100.212849] [<ffffffffa022bbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
  > [33100.218139] [<ffffffffa022a8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
  > [33100.228464] [<ffffffffa0230b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
  > [33100.233727] [<ffffffffa0231ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
  > [33100.238898] [<ffffffffa0222745>] gre_cisco_rcv+0x65/0xba [gre]
  > [33100.243974] [<ffffffffa02222cd>] gre_rcv+0x5d/0x80 [gre]
  > [33100.248938] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
  > [33100.253823] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
  > [33100.258547] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
  > [33100.263138] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
  > [33100.267636] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
  > [33100.272134] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
  > [33100.276544] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
  > [33100.280999] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
  > [33100.285447] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
  > [33100.289886] [<ffffffff81070315>] irq_exit+0x105/0x110
  > [33100.294224] [<ffffffff81740066>] do_IRQ+0x56/0xc0
  > [33100.298433] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
  > [33100.302613] <EOI>
  > [33100.302676]
  > [33100.306717] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
  > [33100.310816] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
  > [33100.314828] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
  > [33100.318732] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
  > [33100.322479] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
  > [33100.326138] [<ffffffff810427cd>] start_secondary+0x21d/0x2d0
  > [33100.329686] Code: a0 e8 8c 86 e3 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 70 42 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
  > [33100.340962] RIP [<ffffffffa02321c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
  > [33100.344857] RSP <ffff88046fd03bb0>

  ############## dmesg.201701080127

  [ 911.714512] ------------[ cut here ]------------
  [ 911.714670] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
  [ 911.714917] invalid opcode: 0000 [#1] SMP
  [ 911.715053] Modules linked in: xt_nat xt_conntrack xt_REDIRECT xt_tcpudp ip6table_filter ip6_tables iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_devintf gpio_ich kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cdc_ether aesni_intel aes_x86_64 lrw gf128mul glue_helper usbnet ablk_helper cryptd sb_edac mii edac_core lpc_ich bonding mac_hid ipmi_si shpchp wmi lp ioatdma parport ahci sfc libahci igb e1000e mtd dca i2c_algo_bit ptp pps_core megaraid_sas mdio
  [ 911.717060] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-106-generic #153-Ubuntu
  [ 911.717301] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
  [ 911.717597] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti: ffffffff81c00000
  [ 911.717827] RIP: 0010:[<ffffffffa01c61c9>] [<ffffffffa01c61c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
  [ 911.718128] RSP: 0018:ffff88046fc03bb0 EFLAGS: 00010297
  [ 911.718291] RAX: ffff880079de52e6 RBX: ffff880463335000 RCX: ffff880079de5280
  [ 911.718511] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fc03c98
  [ 911.718731] RBP: ffff88046fc03bb0 R08: 0000000000000000 R09: ffff880079de5280
  [ 911.718951] R10: ffff880463335000 R11: 0000000000000000 R12: ffff88046fc03c18
  [ 911.719171] R13: ffff880468b60c00 R14: ffff88046fc03c18 R15: ffff8804631a0b40
  [ 911.724614] FS: 0000000000000000(0000) GS:ffff88046fc00000(0000) knlGS:0000000000000000
  [ 911.735614] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 911.741214] CR2: 00007f1898042d70 CR3: 0000000001c0e000 CR4: 00000000001407f0
  [ 911.746800] Stack:
  [ 911.752201] ffff88046fc03be0 ffffffffa01bfbc5 ffffffff81cdaf00 ffff880463335000
  [ 911.763305] ffffe8fbefc04890 ffff880468b60c00 ffff88046fc03cc8 ffffffffa01be8c5
  [ 911.774433] ffffffff81cdaf00 0000000000000001 ffff8804675cf9c4 ffff88045941d380
  [ 911.785550] Call Trace:
  [ 911.790915] <IRQ>
  [ 911.790979]
  [ 911.796163] [<ffffffffa01bfbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
  [ 911.801437] [<ffffffffa01be8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
  [ 911.811769] [<ffffffffa01c4b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
  [ 911.817038] [<ffffffffa01c5ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
  [ 911.822211] [<ffffffffa01b6745>] gre_cisco_rcv+0x65/0xba [gre]
  [ 911.827280] [<ffffffffa01b62cd>] gre_rcv+0x5d/0x80 [gre]
  [ 911.832213] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
  [ 911.837094] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
  [ 911.841810] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
  [ 911.846397] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
  [ 911.850889] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
  [ 911.855384] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
  [ 911.859796] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
  [ 911.864208] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
  [ 911.868654] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
  [ 911.873093] [<ffffffff81070315>] irq_exit+0x105/0x110
  [ 911.877442] [<ffffffff81740066>] do_IRQ+0x56/0xc0
  [ 911.881654] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
  [ 911.885832] <EOI>
  [ 911.885896]
  [ 911.889937] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
  [ 911.894036] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
  [ 911.898017] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
  [ 911.901888] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
  [ 911.905643] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
  [ 911.909308] [<ffffffff8171b2e7>] rest_init+0x77/0x80
  [ 911.912842] [<ffffffff81d34f6a>] start_kernel+0x432/0x43d
  [ 911.916281] [<ffffffff81d34941>] ? repair_env_string+0x5c/0x5c
  [ 911.919767] [<ffffffff81d34120>] ? early_idt_handler_array+0x120/0x120
  [ 911.923347] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
  [ 911.926859] [<ffffffff81d34733>] x86_64_start_kernel+0x143/0x152
  [ 911.930305] Code: a0 e8 8c 46 ea e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 30 49 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
  [ 911.940880] RIP [<ffffffffa01c61c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
  [ 911.944483] RSP <ffff88046fc03bb0>

  ############## dmesg.201701071542

  [23738.192626] ------------[ cut here ]------------
  [23738.192782] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
  [23738.193031] invalid opcode: 0000 [#1] SMP
  [23738.193167] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul cdc_ether crc32_pclmul usbnet mii ghash_clmulni_intel aesni_intel aes_x86_64 lrw lpc_ich sb_edac gf128mul glue_helper ablk_helper cryptd edac_core bonding wmi ipmi_si mac_hid shpchp lp ioatdma parport ahci libahci igb dca sfc e1000e mtd i2c_algo_bit ptp pps_core megaraid_sas mdio
  [23738.195169] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
  [23738.195410] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
  [23738.195706] task: ffff880869959800 ti: ffff880469da4000 task.ti: ffff880469da4000
  [23738.195936] RIP: 0010:[<ffffffffa02441c9>] [<ffffffffa02441c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
  [23738.196238] RSP: 0018:ffff88046fd03bb0 EFLAGS: 00010297
  [23738.196402] RAX: ffff880453cad7e6 RBX: ffff88045d1e7200 RCX: ffff880453cad780
  [23738.196622] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
  [23738.196842] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880453cad780
  [23738.197062] R10: ffff88045d1e7200 R11: 0000000000000000 R12: ffff88046fd03c18
  [23738.197283] R13: ffff880466dbc0c0 R14: ffff88046fd03c18 R15: ffff880462a32f00
  [23738.202738] FS: 0000000000000000(0000) GS:ffff88046fd00000(0000) knlGS:0000000000000000
  [23738.213771] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [23738.219381] CR2: 00007efcd7eee090 CR3: 0000000001c0e000 CR4: 00000000001407e0
  [23738.224978] Stack:
  [23738.230390] ffff88046fd03be0 ffffffffa023dbc5 ffffffff81cdaf00 ffff88045d1e7200
  [23738.241516] ffffe8fbefd04770 ffff880466dbc0c0 ffff88046fd03cc8 ffffffffa023c8c5
  [23738.252668] ffffffff81cdaf00 0000000000000001 ffff880462a54244 ffff88045d1c4100
  [23738.263818] Call Trace:
  [23738.269200] <IRQ>
  [23738.269264]
  [23738.274454] [<ffffffffa023dbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
  [23738.279737] [<ffffffffa023c8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
  [23738.290071] [<ffffffffa0242b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
  [23738.295339] [<ffffffffa0243ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
  [23738.300513] [<ffffffffa0206745>] gre_cisco_rcv+0x65/0xba [gre]
  [23738.305587] [<ffffffffa02062cd>] gre_rcv+0x5d/0x80 [gre]
  [23738.310531] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
  [23738.315420] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
  [23738.320146] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
  [23738.324743] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
  [23738.329244] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
  [23738.333744] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
  [23738.338158] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
  [23738.342576] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
  [23738.347025] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
  [23738.351463] [<ffffffff81070315>] irq_exit+0x105/0x110
  [23738.355804] [<ffffffff81740066>] do_IRQ+0x56/0xc0
  [23738.360010] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
  [23738.364183] <EOI>
  [23738.364246]
  [23738.368280] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
  [23738.372372] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
  [23738.376347] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
  [23738.380212] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
  [23738.383958] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
  [23738.387612] [<ffffffff810427cd>] start_secondary+0x21d/0x2d0
  [23738.391156] Code: a0 e8 8c 66 e2 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 50 41 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
  [23738.402433] RIP [<ffffffffa02441c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
  [23738.406297] RSP <ffff88046fd03bb0>

  ###########################

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1655117/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list