[Bug 1834045] Re: Live-migration double binding doesn't work with OVN

Balazs Gibizer balazs.gibizer at est.tech
Thu Apr 30 15:25:21 UTC 2020


Do we need anything else top of https://review.opendev.org/673803 to
make live migration with OVN work? Reading the commit message for the
patch it feels that the issue is resolved. Please put this back to New
if you disagree.

** Changed in: nova
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/1834045

Title:
  Live-migration double binding doesn't work with OVN

Status in networking-ovn:
  New
Status in neutron:
  New
Status in OpenStack Compute (nova):
  Incomplete
Status in neutron package in Ubuntu:
  New

Bug description:
  For ml2/OVN live-migration doesn't work. After spending some time
  debugging this issue I found that its potentially more complicated and
  not related to OVN intself.

  Here is the full story behind not working live-migration while using
  OVN in latest u/s master.

  To speedup live-migration double-binding was introduced in neutron [1] and nova [2]. It implements this blueprint [3]. In short words it creates double binding (ACTIVE and INACTIVE) to verify if network bind is possible to be done on destination host and then starts live-migration (to not waste time in case of rollback).
  This mechanism started to be default in Stein [4]. So before actual qemu live-migration neutron should send 'network-vif-plugged' to nova and then migration is being run.

  While using OVN this mechanism doesn't work. Notification 'network-
  vif-plugged' is not being send so live-migration is stuck at the
  beginning.

  Lets check how those notifications are send. On every change of
  'status' field (sqlalchemy event) in neutron.ports row [5] function
  [6] is executed and it is responsible for sending 'network-vif-
  unplugged' and 'network-vif-plugged' notifications.

  During pre_live_migration tasks two bindings and bindings levels are created. At the end of this process I found that commit_port_binding() is executed [7]. At this time neutron port status in the db is DOWN. 
  I found that at the end of commit_port_binding() [8] after neutron_lib.callbacks.registry notification is send the port status moves to UP. For ml2/OVN it stays DOWN. This is the first difference that I found between ml2/ovs and ml2/ovn.

  After a bit digging I figured out how 'network-vif-plugged' is triggered in ml2/ovs.
  Lets see how this is done.

  1. On list of registered callbacks in ml2/ovs [8] we have configured
  callback from class ovo_rpc._ObjectChangeHandler [9] and at the end of
  commit_port_binding() this callback is used.

  -------------------------------------------------------------
  neutron.plugins.ml2.ovo_rpc._ObjectChangeHandler.handle_event
  -------------------------------------------------------------

  2. It is responsible for pushing new port object revisions to agents,
  like:

  ----------------------------------------------------------------------------
  Jun 24 10:01:01 test-migrate-1 neutron-server[3685]: DEBUG neutron.api.rpc.handlers.resources_rpc [None req-1430f349-d644-4d33-8833-90fad0124dcd service neutron] Pushing event updated for resources: {'Port': ['ID=3704a567-ef4c-4f6d-9557-a1191de07c4a,revision_number=10']} {{(pid=3697) push /opt/stack/neutron/neutron/api/rpc/handlers/resources_rpc.py:243}}
  ----------------------------------------------------------------------------

  3. OVS agent consumes it and sends back RPC to the neutron server that port is actually UP (on source node!):
  ------------------------------------------------------------------------------------------------------------
  Jun 24 10:01:01 test-migrate-1 neutron-openvswitch-agent[18660]: DEBUG neutron.agent.resource_cache [None req-1430f349-d644-4d33-8833-90fad0124dcd service neutron] Resource Port 3704a567-ef4c-4f6d-9557-a1191de07c4a updated (revision_number 8->10). Old fields: {'status': u'ACTIVE', 'bindings': [PortBinding(host='test-migrate-1',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={},status='INACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal'), PortBinding(host='test-migrate-2',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={"migrating_to": "test-migrate-1"},status='ACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal')], 'binding_levels': [PortBindingLevel(driver='openvswitch',host='test-migrate-1',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59), PortBindingLevel(driver='openvswitch',host='test-migrate-2',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59)]} New fields: {'status': u'DOWN', 'bindings': [PortBinding(host='test-migrate-1',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={},status='ACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal'), PortBinding(host='test-migrate-2',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={"migrating_to": "test-migrate-1"},status='INACTIVE',vif_details=None,vif_type='unbound',vnic_type='normal')], 'binding_levels': [PortBindingLevel(driver='openvswitch',host='test-migrate-1',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59)]} {{(pi
  Jun 24 10:01:01 test-migrate-1 neutron-openvswitch-agent[18660]: d=18660) record_resource_update /opt/stack/neutron/neutron/agent/resource_cache.py:186}}
  ...

  Jun 24 10:01:02 test-migrate-1 neutron-openvswitch-agent[18660]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-9daaf112-57f4-49bb-8390-4b65a5c5e674 None None] Setting status for 3704a567-ef4c-4f6d-9557-a1191de07c4a to UP {{(pid=18660) _bind_devices /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1088}}
  ------------------------------------------------------------------------------------------------------------

  4. Neutron server consumes it:
  ------------------------------------------------------------------------------------------------------------
  Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.plugins.ml2.rpc [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Device 3704a567-ef4c-4f6d-9557-a1191de07c4a up at agent ovs-agent-test-migrate-1 {{(pid=3698) update_device_up /opt/stack/neutron/neutron/plugins/ml2/rpc.py:269}}
  ...
  Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.db.provisioning_blocks [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Provisioning for port 3704a567-ef4c-4f6d-9557-a1191de07c4a completed by entity L2. {{(pid=3698) provisioning_complete /opt/stack/neutron/neutron/db/provisioning_blocks.py:133}}
  ...
  Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.db.provisioning_blocks [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Provisioning complete for port 3704a567-ef4c-4f6d-9557-a1191de07c4a triggered by entity L2. {{(pid=3698) provisioning_complete /opt/stack/neutron/neutron/db/provisioning_blocks.py:140}}
  ------------------------------------------------------------------------------------------------------------

  and then generates internal event "PROVISIONING_COMPLETE" [10]. This
  event is consumed by [11] and port_provisioned() updates port status
  in the DB to UP [12]. At the end it emits notification 'network-vif-
  plugged' and nova continues migration.

  
  In ml2/ovn we don't have agents, so we don't use ovo_rpc. That's why migration for ml2/ovn doesn't work.

  It looks like general bug somewhere between nova and neutron. Neutron shouldn't send notification 'network-vif-plug' during configuration of double binding from source host like it is now (paragraph 3.)
  Maybe we could consider using some more sophisticated names, like 'neutron-vif-inactive-binding-set'?
  Maybe nova could watch for inactive binding being created [13] and then start live-migration
  instead waiting for neutron notification?

  
  Thanks,
  Maciej


  [1] https://review.opendev.org/#/q/topic:bp/live-migration-portbinding+(status:open+OR+status:merged)
  [2] https://review.opendev.org/#/c/558001/
  [3] https://blueprints.launchpad.net/nova/+spec/neutron-new-port-binding-api 
  [4] https://review.opendev.org/#/c/635360/
  [5] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/db/db_base_plugin_v2.py#L173
  [6] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/notifiers/nova.py#L182
  [7] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L505
  [8] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L713
  [9] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/ovo_rpc.py#L51
  [10] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/db/provisioning_blocks.py#L140
  [11] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L285
  [12] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L316
  [13] https://specs.openstack.org/openstack/neutron-specs/specs/backlog/pike/portbinding_information_for_nova.html#list-bindings

To manage notifications about this bug go to:
https://bugs.launchpad.net/networking-ovn/+bug/1834045/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list