[Bug 2017494] Re: "nova.exception.PortBindingFailed: Binding failed" for OpenStack Zed in Juju deployment
Thomas Dreibholz
2017494 at bugs.launchpad.net
Mon Apr 24 10:01:47 UTC 2023
Some further debugging: I entered the instance container for ovn-central/0, i.e.:
juju ssh ovn-central/0
In /etc/ovn/ovn-northd-db-params.conf, I found the TCP and SSL parameters for the OVN NB and SB databases for running check (based on https://numans.blog/2018/01/05/debugging-ovn-external-connectivity-part-1/), in my case:
sudo ovn-nbctl -c /etc/ovn/cert_host -C /etc/ovn/ovn-central.crt -p /etc/ovn/key_host --db=ssl:172.31.255.114:6641,ssl:172.31.255.115:6641,ssl:172.31.255.116:6641 show
sudo ovn-sbctl -c /etc/ovn/cert_host -C /etc/ovn/ovn-central.crt -p /etc/ovn/key_host --db=ssl:172.31.255.114:16642,ssl:172.31.255.115:16642,ssl:172.31.255.116:16642 show
Connecting to the DBs works.
SB lists all my 8 nodes, i.e.:
...
Chassis P52S11.maas
hostname: P52S11.maas
Encap geneve
ip: "172.31.255.100"
options: {csum="true"}
...
This seems to look okay.
NB seems to even list my test VM's port, i.e.:
switch b2e5de93-e19a-458d-af63-e80d44629614 (neutron-97c5c0a1-5c29-4fc4-852b-ce818c972a6d) (aka smil-network4)
port ec87abb0-b261-49fd-8e95-9cefdf32798e (aka Port-warrnambool.fire.smil)
addresses: ["unknown"]
...
But "addresses" only contain "unknown".
Destroying the failed instance leads to removing the port, a new trial
leads to creating a new one. So, I assume that at least some
communication with the OVN system is working.
It seems that something goes wrong somewhere after creating this port,
but without any information in one of the log files. Is there any hint
for where to look for further debugging?
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/2017494
Title:
"nova.exception.PortBindingFailed: Binding failed" for OpenStack Zed
in Juju deployment
Status in charm-ovn-central:
New
Status in neutron package in Ubuntu:
New
Status in ovn package in Ubuntu:
New
Bug description:
I made an OpenStack deployment with Juju, as documented in the Charm
deployment guide (https://docs.openstack.org/project-deploy-
guide/charm-deployment-guide/latest/install-juju.html). The setup
consists of 8 nodes. The deployment itself is successful, Dashboard,
Glance, etc. are running. But when trying to instantiate a VM, the
deployment fails with "nova.exception.PortBindingFailed: Binding
failed <UUUID>, please check neutron logs for more information." (in
Nova log, i.e. /var/log/nova/nova-compute.log).
There is the hint to check "neutron logs", but there is actually no
useful information there.
So, I checked the configuration first:
neutron.yaml for deployment of "neutron-api" and "ovn-chassis":
ovn-chassis:
debug: true
bridge-interface-mappings: >-
br-simulamet:<MAC_NODE_1>
br-simulamet:<MAC_NODE_2>
br-simulamet:...
...
ovn-bridge-mappings: physnet2:br-simulamet
neutron-api:
verbose: true
enable-ml2-port-security: true
neutron-security-groups: true
enable-vlan-trunking: false
vlan-ranges: physnet2
flat-network-providers:
This looks okay. The network interfaces are mapped into the bridge "br-simulamet", it it is actually existing on all nodes, e.g.:
root at P52S11:/var/log# ovs-vsctl get open . external_ids:ovn-bridge-mappings
"physnet2:br-simulamet"
The network/subnet configuration in OpenStack should also be okay, e.g.:
network create smil-network4 --external --provider-network-type vlan --provider-physical-network physnet2 --provider-segment 0204 --share
subnet create smil-network4-ipv4 --network smil-network4 --ip-version 4 --description "VLAN0204-SMIL-Network4" --subnet-range 10.193.4.0/24 --no-dhcp --allocation-pool start=10.193.4.200,end=10.193.4.254
So, the network should correctly map to "physnet2", with a VLAN tag
(here: 204).
(For debugging, I also tried to use the network interface as "flat
network" without VLANs. This does not change anything.)
The deployment (from "juju status") also looks okay for Neutron and OVN:
...
neutron-api 21.0.0 active 1 neutron-api zed/stable 546 no Unit is ready
neutron-api-mysql-router 8.0.32 active 1 mysql-router 8.0/stable 35 no Unit is ready
neutron-api-plugin-ovn 21.0.0 active 1 neutron-api-plugin-ovn zed/stable 45 no Unit is ready
nova-cloud-controller 26.1.0 active 1 nova-cloud-controller zed/stable 633 no Unit is ready
...
ovn-central 22.09.0 active 3 ovn-central 22.09/stable 75 no Unit is ready (leader: ovnsb_db)
ovn-chassis 22.09.1 active 8 ovn-chassis 22.09/stable 109 no Unit is ready
...
/var/log/ovn/ovn-controller.log does not provide useful information about the port binding failure, even after enabling "debug = true" in /etc/neutron/ovn.ini and restarting the services. Also, increasing the OVN log level did not reveal more information here, i.e.:
ovn-appctl vlog/set dbg
ovn-appctl vlog/disable-rate-limit
Increasing the Open vSwitch log level also did not reveal more insight, i.e.:
ovn-appctl vlog/set dbg
ovn-appctl vlog/disable-rate-limit
So, maybe the issue is related to some component around OVN? One strange thing I noticed: There are two processes "ovsdb-server" running, each with a "--log-file" parameter, referring to /var/log/ovn/ovn-northd.log, /var/log/ovn/ovsdb-server-sb.log:
root at P52S11:/var/log/openvswitch# ps ax | grep ovn
129278 ? Ssl 4:58 ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db=ssl:172.31.255.116:6641,ssl:172.31.255.115:6641,ssl:172.31.255.114:6641 --ovnsb-db=ssl:172.31.255.116:16642,ssl:172.31.255.115:16642,ssl:172.31.255.114:16642 -c /etc/ovn/cert_host -C /etc/ovn/ovn-central.crt -p /etc/ovn/key_host --no-chdir --log-file=/var/log/ovn/ovn-northd.log --pidfile=/var/run/ovn/ovn-northd.pid --detach
130048 ? Ssl 34:42 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid --unixctl=/var/run/ovn/ovnnb_db.ctl --remote=db:OVN_Northbound,NB_Global,connections --private-key=/etc/ovn/key_host --certificate=/etc/ovn/cert_host --ca-cert=/etc/ovn/ovn-central.crt --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db
130251 ? Ssl 47:13 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-sb.log --remote=punix:/var/run/ovn/ovnsb_db.sock --pidfile=/var/run/ovn/ovnsb_db.pid --unixctl=/var/run/ovn/ovnsb_db.ctl --remote=db:OVN_Southbound,SB_Global,connections --private-key=/etc/ovn/key_host --certificate=/etc/ovn/cert_host --ca-cert=/etc/ovn/ovn-central.crt --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers /var/lib/ovn/ovnsb_db.db
The logs are in containers, checking them:
/var/snap/lxd/common/lxd/storage-pools/default/containers/juju-3083dc-2-lxd-1/rootfs/var/log/ovn/ovsdb-server-sb.log:
...
2023-04-24T08:30:50.962Z|21781|stream_ssl|WARN|SSL_accept: error:0A000126:SSL routines::unexpected eof while reading
2023-04-24T08:30:50.962Z|21782|jsonrpc|WARN|ssl:127.0.0.1:35594: receive error: Protocol error
2023-04-24T08:30:50.962Z|21783|reconnect|WARN|ssl:127.0.0.1:35594: connection dropped (Protocol error)
2023-04-24T08:35:00.857Z|21784|stream_ssl|WARN|SSL_accept: error:0A000126:SSL routines::unexpected eof while reading
2023-04-24T08:35:00.857Z|21785|jsonrpc|WARN|ssl:127.0.0.1:34324: receive error: Protocol error
2023-04-24T08:35:00.857Z|21786|reconnect|WARN|ssl:127.0.0.1:34324: connection dropped (Protocol error)
/var/snap/lxd/common/lxd/storage-pools/default/containers/juju-3083dc-2-lxd-1/rootfs/var/log/ovn/ovsdb-server-nb.log:
...
2023-04-24T08:30:50.960Z|22445|stream_ssl|WARN|SSL_accept: error:0A000126:SSL routines::unexpected eof while reading
2023-04-24T08:30:50.960Z|22446|jsonrpc|WARN|ssl:127.0.0.1:48028: receive error: Protocol error
2023-04-24T08:30:50.960Z|22447|reconnect|WARN|ssl:127.0.0.1:48028: connection dropped (Protocol error)
2023-04-24T08:35:00.855Z|22448|stream_ssl|WARN|SSL_accept: error:0A000126:SSL routines::unexpected eof while reading
2023-04-24T08:35:00.855Z|22449|jsonrpc|WARN|ssl:127.0.0.1:52444: receive error: Protocol error
2023-04-24T08:35:00.855Z|22450|reconnect|WARN|ssl:127.0.0.1:52444: connection dropped (Protocol error)
The containers belong to the deployment of "ovn-central", so I assume
something is wrong here.
The issue appears on all 8 nodes I have set up. So, it is reproducible. I can provide log files, etc. on request.
Could this issue be a bug of an OpenStack package (may be ovn-central?), or a problem with the Juju Charms for deployment for OpenStack Zed, or some issue with the setup?
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ovn-central/+bug/2017494/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list