[Bug 2017494] Re: "nova.exception.PortBindingFailed: Binding failed" for OpenStack Zed in Juju deployment

Thomas Dreibholz 2017494 at bugs.launchpad.net
Mon Apr 24 10:01:47 UTC 2023


Some further debugging: I entered the instance container for ovn-central/0, i.e.:
juju ssh ovn-central/0

In /etc/ovn/ovn-northd-db-params.conf, I found the TCP and SSL parameters for the OVN NB and SB databases for running check (based on https://numans.blog/2018/01/05/debugging-ovn-external-connectivity-part-1/), in my case:
sudo ovn-nbctl -c /etc/ovn/cert_host -C /etc/ovn/ovn-central.crt -p /etc/ovn/key_host --db=ssl:172.31.255.114:6641,ssl:172.31.255.115:6641,ssl:172.31.255.116:6641  show
sudo ovn-sbctl -c /etc/ovn/cert_host -C /etc/ovn/ovn-central.crt -p /etc/ovn/key_host --db=ssl:172.31.255.114:16642,ssl:172.31.255.115:16642,ssl:172.31.255.116:16642  show

Connecting to the DBs works.

SB lists all my 8 nodes, i.e.:
...
Chassis P52S11.maas
    hostname: P52S11.maas
    Encap geneve
        ip: "172.31.255.100"
        options: {csum="true"}
...
This seems to look okay.

NB seems to even list my test VM's port, i.e.:
switch b2e5de93-e19a-458d-af63-e80d44629614 (neutron-97c5c0a1-5c29-4fc4-852b-ce818c972a6d) (aka smil-network4)
    port ec87abb0-b261-49fd-8e95-9cefdf32798e (aka Port-warrnambool.fire.smil)
        addresses: ["unknown"]
...
But "addresses" only contain "unknown".

Destroying the failed instance leads to removing the port, a new trial
leads to creating a new one. So, I assume that at least some
communication with the OVN system is working.

It seems that something goes wrong somewhere after creating this port,
but without any information in one of the log files. Is there any hint
for where to look for further debugging?

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/2017494

Title:
  "nova.exception.PortBindingFailed: Binding failed" for OpenStack Zed
  in Juju deployment

Status in charm-ovn-central:
  New
Status in neutron package in Ubuntu:
  New
Status in ovn package in Ubuntu:
  New

Bug description:
  I made an OpenStack deployment with Juju, as documented in the Charm
  deployment guide (https://docs.openstack.org/project-deploy-
  guide/charm-deployment-guide/latest/install-juju.html). The setup
  consists of 8 nodes. The deployment itself is successful, Dashboard,
  Glance, etc. are running. But when trying to instantiate a VM, the
  deployment fails with "nova.exception.PortBindingFailed: Binding
  failed <UUUID>, please check neutron logs for more information." (in
  Nova log, i.e. /var/log/nova/nova-compute.log).

  There is the hint to check "neutron logs", but there is actually no
  useful information there.

  So, I checked the configuration first:

  neutron.yaml for deployment of "neutron-api" and "ovn-chassis":
  ovn-chassis:
    debug: true
    bridge-interface-mappings: >-
      br-simulamet:<MAC_NODE_1>
      br-simulamet:<MAC_NODE_2>
      br-simulamet:...
      ...
    ovn-bridge-mappings: physnet2:br-simulamet

  neutron-api:
    verbose: true
    enable-ml2-port-security: true
    neutron-security-groups: true
    enable-vlan-trunking: false

    vlan-ranges: physnet2
    flat-network-providers:

  This looks okay. The network interfaces are mapped into the bridge "br-simulamet", it it is actually existing on all nodes, e.g.:
  root at P52S11:/var/log# ovs-vsctl get open . external_ids:ovn-bridge-mappings
  "physnet2:br-simulamet"

  
  The network/subnet configuration in OpenStack should also be okay, e.g.:
  network create smil-network4 --external --provider-network-type vlan --provider-physical-network physnet2 --provider-segment 0204 --share
  subnet create smil-network4-ipv4 --network smil-network4 --ip-version 4 --description "VLAN0204-SMIL-Network4" --subnet-range 10.193.4.0/24 --no-dhcp --allocation-pool start=10.193.4.200,end=10.193.4.254

  So, the network should correctly map to "physnet2", with a VLAN tag
  (here: 204).

  (For debugging, I also tried to use the network interface as "flat
  network" without VLANs. This does not change anything.)

  
  The deployment (from "juju status") also looks okay for Neutron and OVN:
  ...
  neutron-api               21.0.0           active      1  neutron-api             zed/stable     546  no       Unit is ready
  neutron-api-mysql-router  8.0.32           active      1  mysql-router            8.0/stable      35  no       Unit is ready
  neutron-api-plugin-ovn    21.0.0           active      1  neutron-api-plugin-ovn  zed/stable      45  no       Unit is ready
  nova-cloud-controller     26.1.0           active      1  nova-cloud-controller   zed/stable     633  no       Unit is ready
  ...
  ovn-central               22.09.0          active      3  ovn-central             22.09/stable    75  no       Unit is ready (leader: ovnsb_db)
  ovn-chassis               22.09.1          active      8  ovn-chassis             22.09/stable   109  no       Unit is ready
  ...

  
  /var/log/ovn/ovn-controller.log does not provide useful information about the port binding failure, even after enabling "debug = true" in /etc/neutron/ovn.ini and restarting the services. Also, increasing the OVN log level did not reveal more information here, i.e.:
  ovn-appctl vlog/set dbg
  ovn-appctl vlog/disable-rate-limit

  
  Increasing the Open vSwitch log level also did not reveal more insight, i.e.:
  ovn-appctl vlog/set dbg
  ovn-appctl vlog/disable-rate-limit

  
  So, maybe the issue is related to some component around OVN? One strange thing I noticed: There are two processes "ovsdb-server" running, each with a "--log-file" parameter, referring to /var/log/ovn/ovn-northd.log, /var/log/ovn/ovsdb-server-sb.log:

  root at P52S11:/var/log/openvswitch# ps ax | grep ovn
   129278 ?        Ssl    4:58 ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db=ssl:172.31.255.116:6641,ssl:172.31.255.115:6641,ssl:172.31.255.114:6641 --ovnsb-db=ssl:172.31.255.116:16642,ssl:172.31.255.115:16642,ssl:172.31.255.114:16642 -c /etc/ovn/cert_host -C /etc/ovn/ovn-central.crt -p /etc/ovn/key_host --no-chdir --log-file=/var/log/ovn/ovn-northd.log --pidfile=/var/run/ovn/ovn-northd.pid --detach
   130048 ?        Ssl   34:42 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid --unixctl=/var/run/ovn/ovnnb_db.ctl --remote=db:OVN_Northbound,NB_Global,connections --private-key=/etc/ovn/key_host --certificate=/etc/ovn/cert_host --ca-cert=/etc/ovn/ovn-central.crt --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db
   130251 ?        Ssl   47:13 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-sb.log --remote=punix:/var/run/ovn/ovnsb_db.sock --pidfile=/var/run/ovn/ovnsb_db.pid --unixctl=/var/run/ovn/ovnsb_db.ctl --remote=db:OVN_Southbound,SB_Global,connections --private-key=/etc/ovn/key_host --certificate=/etc/ovn/cert_host --ca-cert=/etc/ovn/ovn-central.crt --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers /var/lib/ovn/ovnsb_db.db

  The logs are in containers, checking them:

  /var/snap/lxd/common/lxd/storage-pools/default/containers/juju-3083dc-2-lxd-1/rootfs/var/log/ovn/ovsdb-server-sb.log:
  ...
  2023-04-24T08:30:50.962Z|21781|stream_ssl|WARN|SSL_accept: error:0A000126:SSL routines::unexpected eof while reading
  2023-04-24T08:30:50.962Z|21782|jsonrpc|WARN|ssl:127.0.0.1:35594: receive error: Protocol error
  2023-04-24T08:30:50.962Z|21783|reconnect|WARN|ssl:127.0.0.1:35594: connection dropped (Protocol error)
  2023-04-24T08:35:00.857Z|21784|stream_ssl|WARN|SSL_accept: error:0A000126:SSL routines::unexpected eof while reading
  2023-04-24T08:35:00.857Z|21785|jsonrpc|WARN|ssl:127.0.0.1:34324: receive error: Protocol error
  2023-04-24T08:35:00.857Z|21786|reconnect|WARN|ssl:127.0.0.1:34324: connection dropped (Protocol error)

  /var/snap/lxd/common/lxd/storage-pools/default/containers/juju-3083dc-2-lxd-1/rootfs/var/log/ovn/ovsdb-server-nb.log:
  ...
  2023-04-24T08:30:50.960Z|22445|stream_ssl|WARN|SSL_accept: error:0A000126:SSL routines::unexpected eof while reading
  2023-04-24T08:30:50.960Z|22446|jsonrpc|WARN|ssl:127.0.0.1:48028: receive error: Protocol error
  2023-04-24T08:30:50.960Z|22447|reconnect|WARN|ssl:127.0.0.1:48028: connection dropped (Protocol error)
  2023-04-24T08:35:00.855Z|22448|stream_ssl|WARN|SSL_accept: error:0A000126:SSL routines::unexpected eof while reading
  2023-04-24T08:35:00.855Z|22449|jsonrpc|WARN|ssl:127.0.0.1:52444: receive error: Protocol error
  2023-04-24T08:35:00.855Z|22450|reconnect|WARN|ssl:127.0.0.1:52444: connection dropped (Protocol error)

  The containers belong to the deployment of "ovn-central", so I assume
  something is wrong here.

  
  The issue appears on all 8 nodes I have set up. So, it is reproducible. I can provide log files, etc. on request.

  
  Could this issue be a bug of an OpenStack package (may be ovn-central?), or a problem with the Juju Charms for deployment for OpenStack Zed, or some issue with the setup?

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ovn-central/+bug/2017494/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list