[Bug 1736390] Re: openvswitch: kernel oops destroying interfaces on i386
Christian Ehrhardt
1736390 at bugs.launchpad.net
Fri Sep 14 05:23:59 UTC 2018
Thanks,
installed that in the test env, after a manual reboot I got:
$ uname -a
Linux autopkgtest 4.15.0-34-generic #38~lp1736390Commit12064551Reverted SMP Thu Sep 13 13:28:33 UTC i686 i686 i686 GNU/Linux
The change is persistent into the autopkgtest:
autopkgtest [05:11:17]: testbed running kernel: Linux 4.15.0-34-generic #38~lp1736390Commit12064551Reverted SMP Thu Sep 13 13:28:33 UTC
The test kernel works fine where the other one failed.
To be sure I ran it multiple times and with different cpu options enables in KVM (e.g. to also run the DPDK tests which need sse3).
But they all worked, no crash.
That said - yes reverting that change seems to be the solution.
Yet for what was it needed and what would break if it is reverted?
commit 120645513f55a4ac5543120d9e79925d30a0156f
Author: Jarno Rajahalme <jarno at ovn.org>
Date: Fri Apr 21 16:48:06 2017 -0700
openvswitch: Add eventmask support to CT action.
Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
which can be used in conjunction with the commit flag
(OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
conntrack events (IPCT_*) should be delivered via the Netfilter
netlink multicast groups. Default behavior depends on the system
configuration, but typically a lot of events are delivered. This can be
very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
types of events are of interest.
Netfilter core init_conntrack() adds the event cache extension, so we
only need to set the ctmask value. However, if the system is
configured without support for events, the setting will be skipped due
to extension not being found.
That is odd, I thought in the past we had identified an Ubuntu-sauce patch, but that is a normal upstream change.
I'd hope that other are affected as well and this is fixed, or could it be that we are affected by 1206455 due to some Ubuntu-sauce?
For the sake of checking if latest upstream (no sauce and 4.19-rc3)
might be better I ran the latest mainline kernel.
autopkgtest [05:21:50]: testbed running kernel: Linux
4.19.0-041900rc3-generic #201809120832 SMP Wed Sep 12 12:47:16 UTC 2018
But that is crashing still.
@James: can you estimate what we loose on non-i386 when reverting that change for now?
@Joseph: what would we do now, report upstream - if so what exactly a description and link sent to the author and the ML as we don#t have a fix yet?
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1736390
Title:
openvswitch: kernel oops destroying interfaces on i386
Status in linux package in Ubuntu:
In Progress
Status in openvswitch package in Ubuntu:
Invalid
Status in linux source package in Artful:
Won't Fix
Status in openvswitch source package in Artful:
Invalid
Status in linux source package in Bionic:
In Progress
Status in openvswitch source package in Bionic:
Invalid
Status in linux source package in Cosmic:
In Progress
Status in openvswitch source package in Cosmic:
Invalid
Bug description:
Reproducable on bionic using the autopkgtest's from openvswitch on
i386:
[ 41.420568] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 41.421000] IP: igmp_group_dropped+0x21/0x220
[ 41.421246] *pdpt = 000000001d62c001 *pde = 0000000000000000
[ 41.421659] Oops: 0000 [#1] SMP
[ 41.421852] Modules linked in: veth openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c 9p fscache ppdev kvm_intel kvm 9pnet_virtio irqbypass input_leds joydev 9pnet parport_pc serio_raw parport i2c_piix4 qemu_fw_cfg mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs xor raid6_pq psmouse virtio_blk virtio_net pata_acpi floppy
[ 41.423855] CPU: 0 PID: 5 Comm: kworker/u2:0 Tainted: G W 4.13.0-18-generic #21-Ubuntu
[ 41.424355] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 41.424849] Workqueue: netns cleanup_net
[ 41.425071] task: db8fba80 task.stack: dba10000
[ 41.425346] EIP: igmp_group_dropped+0x21/0x220
[ 41.425656] EFLAGS: 00010202 CPU: 0
[ 41.425864] EAX: 00000000 EBX: dd726360 ECX: dba11e6c EDX: 00000002
[ 41.426335] ESI: 00000000 EDI: dd4db500 EBP: dba11dcc ESP: dba11d94
[ 41.426687] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 41.426990] CR0: 80050033 CR2: 00000000 CR3: 1e6d6d60 CR4: 000006f0
[ 41.427340] Call Trace:
[ 41.427485] ? __wake_up+0x36/0x40
[ 41.427680] ip_mc_down+0x27/0x90
[ 41.427869] inetdev_event+0x398/0x4e0
[ 41.428082] ? skb_dequeue+0x5b/0x70
[ 41.428286] ? wireless_nlevent_flush+0x4c/0x90
[ 41.428541] notifier_call_chain+0x4e/0x70
[ 41.428772] raw_notifier_call_chain+0x11/0x20
[ 41.429023] call_netdevice_notifiers_info+0x2a/0x60
[ 41.429301] dev_close_many+0x9d/0xe0
[ 41.429509] rollback_registered_many+0xd7/0x380
[ 41.429768] unregister_netdevice_many.part.102+0x10/0x80
[ 41.430075] default_device_exit_batch+0x134/0x160
[ 41.430344] ? do_wait_intr_irq+0x80/0x80
[ 41.430650] ops_exit_list.isra.8+0x4d/0x60
[ 41.430886] cleanup_net+0x18e/0x260
[ 41.431090] process_one_work+0x1a0/0x390
[ 41.431317] worker_thread+0x37/0x450
[ 41.431525] kthread+0xf3/0x110
[ 41.431714] ? process_one_work+0x390/0x390
[ 41.431941] ? kthread_create_on_node+0x20/0x20
[ 41.432187] ret_from_fork+0x19/0x24
[ 41.432382] Code: 90 90 90 90 90 90 90 90 90 90 3e 8d 74 26 00 55 89 e5 57 56 53 89 c3 83 ec 2c 8b 33 65 a1 14 00 00 00 89 45 f0 31 c0 80 7b 4b 00 <8b> 06 8b b8 20 03 00 00 8b 43 04 0f 85 5e 01 00 00 3d e0 00 00
[ 41.433405] EIP: igmp_group_dropped+0x21/0x220 SS:ESP: 0068:dba11d94
[ 41.433750] CR2: 0000000000000000
[ 41.433961] ---[ end trace 595db54cab84070c ]---
system then becomes unresponsive; no further interfaces can be created.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1736390/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list