[Bug 1906922] Re: Unpredictable behaviour on conflicting flow actions
Frode Nordahl
1906922 at bugs.launchpad.net
Tue Feb 16 09:28:32 UTC 2021
On a bionic-ussuri deployment, performing the steps as described in the
Test Case I can see before adding the security groups:
$ ping 10.78.95.52
PING 10.78.95.52 (10.78.95.52) 56(84) bytes of data.
64 bytes from 10.78.95.52: icmp_seq=1 ttl=63 time=2.65 ms
64 bytes from 10.78.95.52: icmp_seq=2 ttl=63 time=1.52 ms
64 bytes from 10.78.95.52: icmp_seq=3 ttl=63 time=0.914 ms
^C
--- 10.78.95.52 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.914/1.691/2.645/0.717 ms
After adding the security groups I can see the NXBAC_BAD_CONJUNCTION messages in /var/log/ovn/ovn-controller.log:
2021-02-16T09:25:00.292Z|00022|ofp_actions|WARN|"conjunction" actions may be used along with "note" but not any other kind of action (such as the "resubmit" action used here)
2021-02-16T09:25:00.292Z|00023|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.3) (xid=0x1bf): NXBAC_BAD_CONJUNCTION
OFPT_FLOW_MOD (OF1.3) (xid=0x1bf): ***decode error: NXBAC_BAD_CONJUNCTION***
00000000 04 0e 00 b0 00 00 01 bf-00 00 00 00 7f 0d 9c 96 |................|
00000010 00 00 00 00 00 00 00 00-2c 00 00 00 00 00 07 d2 |........,.......|
00000020 ff ff ff ff ff ff ff ff-ff ff ff ff 00 00 00 00 |................|
00000030 00 01 00 53 80 00 0a 02-08 00 80 00 14 01 01 00 |...S............|
00000040 01 1e 04 00 00 00 03 00-01 d3 08 00 00 00 22 00 |..............".|
00000050 00 00 2b 00 01 d9 20 00-00 00 00 00 00 00 00 00 |..+... .........|
00000060 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 |................|
00000070 00 00 00 00 00 00 01 80-00 04 08 00 00 00 00 00 |................|
00000080 00 00 03 00 00 00 00 00-00 04 00 28 00 00 00 00 |...........(....|
00000090 ff ff 00 10 00 00 23 20-00 0e ff f8 2d 00 00 00 |......# ....-...|
000000a0 ff ff 00 10 00 00 23 20-00 22 01 02 00 00 00 09 |......# ."......|
And connectivity to the server is impacted:
$ ping -c 3 10.78.95.52
PING 10.78.95.52 (10.78.95.52) 56(84) bytes of data.
--- 10.78.95.52 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2030ms
Installing the ovn packages from -proposed removes the messages from the log and connectivity is restored:
$ ping -c 3 10.78.95.52
PING 10.78.95.52 (10.78.95.52) 56(84) bytes of data.
64 bytes from 10.78.95.52: icmp_seq=1 ttl=63 time=2.23 ms
64 bytes from 10.78.95.52: icmp_seq=2 ttl=63 time=1.67 ms
64 bytes from 10.78.95.52: icmp_seq=3 ttl=63 time=0.816 ms
--- 10.78.95.52 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.816/1.573/2.229/0.581 ms
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
$ dpkg -l | grep ovn
ii neutron-ovn-metadata-agent 2:16.2.0-0ubuntu2~cloud0 all Neutron is a virtual network service for Openstack - OVN metadata agent
ii ovn-common 20.03.1-0ubuntu1.2~cloud0 amd64 OVN common components
ii ovn-host 20.03.1-0ubuntu1.2~cloud0 amd64 OVN host components
** Tags removed: verification-ussuri-needed
** Tags added: verification-ussuri-done
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1906922
Title:
Unpredictable behaviour on conflicting flow actions
Status in Ubuntu Cloud Archive:
In Progress
Status in Ubuntu Cloud Archive ussuri series:
Fix Committed
Status in Ubuntu Cloud Archive victoria series:
In Progress
Status in ovn package in Ubuntu:
Fix Released
Status in ovn source package in Focal:
Fix Committed
Status in ovn source package in Groovy:
Fix Committed
Bug description:
[Impact]
When CMS configures ACLs with overlapping rules the flow rules OVN
programs into Open vSwitch may lead to unpredictable forwarding
behavior such as every other packet being dropped.
[Test Case]
How to reproduce with OpenStack as CMS:
- Update the "default" group to accept ICMP, then:
openstack security group create a
openstack security group create b
openstack security group create c
openstack security group rule create --ingress --ethertype IPv4 --protocol icmp --remote-group b b
openstack security group rule create --ingress --ethertype IPv6 --protocol icmp --remote-group b b
openstack security group rule create --ingress --ethertype IPv4 --protocol icmp --remote-group c c
openstack security group rule create --ingress --ethertype IPv6 --protocol icmp --remote-group c c
openstack server add security group
for server in zaza-neutrontests-ins-1 zaza-neutrontests-ins-2; do for group in a b c; do openstack server add security group $server $group;done;done
Look for bad conjunction messages in ovn-controller log and monitor
ICMP reachability to the instances.
[Regression potential]
The fixes all apply to a single file and area of the OVN controller operation, except for the patches to its tests. 6 of the patches have been in the wild since the 20.09 release of September 2020. 10 of them have been in the wild since the 20.12 release of December 2020. There has since not been any bugs reported nor further updates touching this area of the code. We have also had the code in the wild through Ubuntu Groovy with OVN 20.06 (the parts that are in 20.06) and Ubuntu Hirsute (all of them). The code paths are executed by anyone using OVN so if any of these patches caused a regression chances are very high it would have bubbled up somewhere by now. For extra caution we have had the packages in -proposed for an extended period and the packages has also been consumed in other recent large scale internal networking tests, such as the PS5 project.
[Other Info]
Fixed upstream:
https://github.com/ovn-org/ovn/commit/986b3d5e4ad6f05245d021ba699c957246294a22
Other bug trackers:
https://bugzilla.redhat.com/1871931
Symptoms:
Every other packet does not arrive.
2020-12-05T10:33:38.304Z|00016|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.3) (xid=0x1af): NXBAC_BAD_CONJUNCTION
OFPT_FLOW_MOD (OF1.3) (xid=0x1af): ***decode error: NXBAC_BAD_CONJUNCTION***
00000000 04 0e 00 b0 00 00 01 af-00 00 00 00 e6 89 28 3a |..............(:|
00000010 00 00 00 00 00 00 00 00-2c 00 00 00 00 00 07 d2 |........,.......|
00000020 ff ff ff ff ff ff ff ff-ff ff ff ff 00 00 00 00 |................|
00000030 00 01 00 53 80 00 0a 02-08 00 80 00 14 01 01 00 |...S............|
00000040 01 1e 04 00 00 00 03 00-01 d3 08 00 00 00 22 00 |..............".|
00000050 00 00 2b 00 01 d9 20 00-00 00 00 00 00 00 00 00 |..+... .........|
00000060 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 |................|
00000070 00 00 00 00 00 00 01 80-00 04 08 00 00 00 00 00 |................|
00000080 00 00 03 00 00 00 00 00-00 04 00 28 00 00 00 00 |...........(....|
00000090 ff ff 00 10 00 00 23 20-00 0e ff f8 2d 00 00 00 |......# ....-...|
000000a0 ff ff 00 10 00 00 23 20-00 22 01 02 00 00 00 09 |......# ."......|
I have been able to backport this fix to 20.03.1 with minor adaption
using these commits from master, however a flaky test may need some
more investigation:
commit 986b3d5e4ad6f05245d021ba699c957246294a22
commit 33c15c145988daa6172928dc870f3a0225515f50
commit 107bb25029350bd0f7dfeeb0ef3053adbd504e3e
commit e49ce9a33f38f29c44e3c30afcc189b5f6a9ef8e
commit dadae4f800ccb1f2759378f0bd804dd002e31605
commit 7cab7bd1268ba67429954da4f73de91090acf779
commit 9d2e8d32fb9865513b70408a665184a67564390d
commit f4e508dd7a6cfbfc2e3250a8c11a8d0fdc1dfdd0
commit 6f0b1e02d9ab3a94048c4818f2d382938cad4b71
commit 23063cf4178c05f5d6b3e4ec6d323ccc88df6101
commit 354d3853d40cbce89a434632f67daed7fc992d8b
The list of commits is quite long and this is due to how
controller/ofctrl.c has changed from 20.03.1 was cut until now, but
the nature of the changes look sane to me.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1906922/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list