[Bug 1905933] Re: Incorrect ARP processing when enable_distributed_floating_ip=True

Frode Nordahl 1905933 at bugs.launchpad.net
Mon Feb 15 16:48:05 UTC 2021


With a focal deployment as deployed by the OpenStack Charms using the
in-distro released packages I can have a instance created with a
floating IP and confirm it is able to reach the Ubuntu archive:

$ chmod 600 /tmp/zaza-467221d1afd6/id_rsa_zaza
$ ssh -i /tmp/zaza-467221d1afd6/id_rsa_zaza ubuntu at 10.78.95.103
ubuntu at zaza-neutrontests-ins-1:~$ ping archive.ubuntu.com
PING archive.ubuntu.com (91.189.88.142) 56(84) bytes of data.
64 bytes from aerodent.canonical.com (91.189.88.142): icmp_seq=1 ttl=60 time=74.3 ms
64 bytes from aerodent.canonical.com (91.189.88.142): icmp_seq=2 ttl=60 time=67.5 ms
^C
--- archive.ubuntu.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 67.568/70.938/74.308/3.370 ms

If I then enable Distributed Floatring IP for OVN by issuing the following command and wait for deployment to settle:
$ juju config neutron-api-plugin-ovn enable-distributed-floating-ip=true

I can then repeat the attempt to reach the Ubuntu archive:
ubuntu at zaza-neutrontests-ins-1:~$ ping archive.ubuntu.com
PING archive.ubuntu.com (91.189.88.152) 56(84) bytes of data.
^C
--- archive.ubuntu.com ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2032ms

Dumping packets on the external port of the hypervisor for the instance I can see:
# tcpdump -nevvi enp6s0 arp
tcpdump: listening on enp6s0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:37:54.290577 fa:16:3e:fa:6a:11 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 91.189.88.152 tell 10.78.95.103, length 28

Adding -proposed to ovn-central and hypervisor units:
juju run --application ovn-central 'sudo sh -c "echo deb http://archive.ubuntu.com/ubuntu focal-proposed multiverse restricted main universe >> /etc/apt/sources.list"'
juju run --application ovn-central 'apt update'
juju run --application ovn-central 'sudo apt -y install ovn-central ovn-common'
juju run --application neutron-api 'systemctl restart neutron-server'

for machine in 0 1 2; do juju run --machine $machine 'sudo sh -c "echo deb http://archive.ubuntu.com/ubuntu focal-proposed multiverse restricted main universe >> /etc/apt/sources.list"'&done
for machine in 0 1 2; do juju run --machine $machine 'sudo apt update'&done
for machine in 0 1 2; do juju run --machine $machine 'sudo apt -y install ovn-common ovn-host'&done
wait

I can now repeat the attempt to ping the Ubuntu archive from my instance:
ubuntu at zaza-neutrontests-ins-1:~$ ping archive.ubuntu.com
PING archive.ubuntu.com (91.189.88.142) 56(84) bytes of data.
64 bytes from aerodent.canonical.com (91.189.88.142): icmp_seq=1 ttl=60 time=44.5 ms
64 bytes from aerodent.canonical.com (91.189.88.142): icmp_seq=2 ttl=60 time=43.8 ms
^C
--- archive.ubuntu.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 43.883/44.197/44.512/0.378 ms

Success!

** Tags removed: verification-needed verification-needed-focal
** Tags added: verification-done verification-done-focal

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ovn in Ubuntu.
https://bugs.launchpad.net/bugs/1905933

Title:
  Incorrect ARP processing when enable_distributed_floating_ip=True

Status in ovn package in Ubuntu:
  Fix Released
Status in ovn source package in Focal:
  Fix Committed
Status in ovn source package in Groovy:
  Fix Released
Status in ovn source package in Hirsute:
  Fix Released

Bug description:
  [Impact]
  Enabling `enable-distributed-floating-ip` on a cloud with OVN 20.03 results in loss of external connectivity for instances with floating IPs.

  [Test Case]
  Launch two instances and assign floating IPs to them. Toggle the `enable-distributed-floating-ip` configuration option and attempt to access a IP address on the internet that is not reachable in the external network L2 broadcast domain.

  Observe as the instances will attempt to reach the IP by obtaining
  it's MAC address through ARP resolution directly rather than applying
  L3 routing.

  The functional test gate of the neutron-api-plugin-ovn charm may be
  useful for verification.

  [Regression Potential]
  We have cherry-picked a patch from upstream that reverts the change that introduced the erratic behavior in its entirety. The optimization has later been replaced by a new set of patches which is available in newer versions. As such the regression potential is minimal.

  [Original Bug Report]
  In a focal-ussuri deployment when enabling `enable-distributed-floating-ip` traffic from instances with FIPs should exit the HV directly and not go through a gateway chassis.

  However due to a bug each HV will attempt to do ARP processing locally
  even for IP addresses not in the external network CIDR.

  This results in loss of connectivity for instances with FIPs.

  The issue is not present in Groovy with OVN 20.06 and I suspect the issue is fixed by this commit:
  https://github.com/ovn-org/ovn/commit/d9ed450713eda62af1bec5009694b2d206c9f435

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1905933/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list