[Bug 1336555] Re: ovs-vswitchd crashed with SIGSEGV in nl_attr_get_size()
James Page
james.page at ubuntu.com
Mon May 11 07:50:29 UTC 2015
proxied from Joe Stringer at VMware
"I wonder if this patch fixes the issue:
https://github.com/openvswitch/ovs/commit/546953509095cec6fad42663b659171618b765d2
Note that the last three lines within the bad_key_len / bad_mask_len
conditional statement are the following:
format_generic_odp_key(ma, ds);
ds_put_char(ds, ')');
return;
This is the same logic as the end of the function, where the backtrace
is reporting the callstack to be. Jarno pointed out that the compiler
could optimize out the first copy of this code to turn into a jump
instruction which jumps inside the if (!is_exact) statement. Hence the
backtrace shows this confusing callstack.
Note that this problem would only present itself if there is:
A) A mismatch between a newer kernel version and an older userspace
(OVS<2.3), where
B) The kernel has a new flow match field available which ovs-vswitchd
doesn't understand, and
C) A flow_del command fails for some reason."
It would be great if we could confirm by getting the existing build of
OVS and applying the patch above.
** Description changed:
+ [Impact]
+ Open vSwitch daemon crashes, causing flow data to be lost and in an OpenStack cloud, instance connectivity to be lost.
+
+ [Test Case]
+ <trivialized step> Install and OpenStack cloud using Neutron + ML2 plugin and OpenvSwitch
+ Run cloud for some time - ovs-vswitchd will crash causing loss of instance connectivity.
+
+
+ [Regression Potential]
+ Minimal - this code is in versions > 2.0.2 for some time.
+
+ [Original Bug Report]
Hi I find that every 2 days or so I lose part of my cluster.
It seems that openvswitch is crashing... The only message left on syslog
is as follows:
syslog:Jul 1 22:52:32 blue-compute kernel: [530482.190688] ovs-
vswitchd[1935]: segfault at 0 ip 0000000000459110 sp 00007fff85804758
error 4 in ovs-vswitchd[400000+133000]
And this is the last message. I'm unable to reboot gracefully. I have to
reset. (This can be because ceph not giving up also).
And I can see a lot of traffic going around in the network. There so
much traffic that some lowend routers/switches fail. Can be because
another problem (machines stalled because the ovs fault and others
trying to connect. Maybe it fails because much traffic). But I tell this
for completeness.
Now some info:
Linux version 3.13.0-30-generic (buildd at allspice) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #54-Ubu
ntu SMP Mon Jun 9 22:45:01 UTC 2014 (Ubuntu 3.13.0-30.54-generic 3.13.11.2)
vendor_id : AuthenticAMD
cpu family : 16
model : 4
model name : AMD Phenom(tm) II X4 810 Processor
Ubuntu 14.04 LTS (server).
ovs-vsctl --version
ovs-vsctl (Open vSwitch) 2.0.1
Compiled Feb 23 2014 14:42:32
I can attach full logs but I think there's nothing useful because only
one line referring the problem.
NOTE: restarting ovs does not solve the problem.
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1336555
Title:
ovs-vswitchd crashed with SIGSEGV in nl_attr_get_size()
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1336555/+subscriptions
More information about the Ubuntu-server-bugs
mailing list