[Bug 1357368] Re: Source side post Live Migration Logic cannot disconnect multipath iSCSI devices cleanly

Billy Olsen billy.olsen at canonical.com
Fri Jul 1 23:29:13 UTC 2016


** Description changed:

+ [Impact]
+ 
  When a volume is attached to a VM in the source compute node through
  multipath, the related files in /dev/disk/by-path/ are like this
  
  stack at ubuntu-server12:~/devstack$ ls /dev/disk/by-path/*24
  /dev/disk/by-path/ip-192.168.3.50:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.a5-lun-24
  /dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-24
  
  The information on its corresponding multipath device is like this
  stack at ubuntu-server12:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
  3600601602ba03400921130967724e411 dm-3 DGC,VRAID
  size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
  |-+- policy='round-robin 0' prio=-1 status=active
  | `- 19:0:0:24 sdl 8:176 active undef running
  `-+- policy='round-robin 0' prio=-1 status=enabled
    `- 18:0:0:24 sdj 8:144 active undef running
  
  But when the VM is migrated to the destination, the related information
  is like the following example since we CANNOT guarantee that all nodes
  are able to access the same iSCSI portals and the same target LUN
  number. And the information is used to overwrite connection_info in the
  DB before the post live migration logic is executed.
  
  stack at ubuntu-server13:~/devstack$ ls /dev/disk/by-path/*24
  /dev/disk/by-path/ip-192.168.3.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b5-lun-100
  /dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-100
  
  stack at ubuntu-server13:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
  3600601602ba03400921130967724e411 dm-3 DGC,VRAID
  size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
  |-+- policy='round-robin 0' prio=-1 status=active
  | `- 19:0:0:100 sdf 8:176 active undef running
  `-+- policy='round-robin 0' prio=-1 status=enabled
    `- 18:0:0:100 sdg 8:144 active undef running
  
  As a result, if post live migration in source side uses <IP>, <IQN> and <TARGET LUN Number> to find the devices to clean up, it may use 192.168.3.51, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 100.
  However, the correct one should be 192.168.3.50, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 24.
  
  Similar philosophy in (https://bugs.launchpad.net/nova/+bug/1327497) can
  be used to fix it: Leverage the unchanged multipath_id to find correct
  devices to delete.
+ 
+ [Test Case]
+ 
+ Live migrate an instance which uses iSCSI multipath. Verify the correct
+ target is removed on source hypervisor.
+ 
+ [Regression Potential]
+ 
+ Not much, its included in the next release (Juno). The change introduces
+ a check to use a field already used by fiber multipath connections which
+ was not used by iscsi multipath code path on cleanup. If it fails it
+ would keep remaining behavior of not cleaning up iscsi sessions/paths.

** Tags removed: in-stable-juno

** Tags removed: verification-needed
** Tags added: in-stable-juno verification-done

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1357368

Title:
  Source side post Live Migration Logic cannot disconnect multipath
  iSCSI devices cleanly

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) juno series:
  Fix Released
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Trusty:
  Fix Committed

Bug description:
  [Impact]

  When a volume is attached to a VM in the source compute node through
  multipath, the related files in /dev/disk/by-path/ are like this

  stack at ubuntu-server12:~/devstack$ ls /dev/disk/by-path/*24
  /dev/disk/by-path/ip-192.168.3.50:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.a5-lun-24
  /dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-24

  The information on its corresponding multipath device is like this
  stack at ubuntu-server12:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
  3600601602ba03400921130967724e411 dm-3 DGC,VRAID
  size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
  |-+- policy='round-robin 0' prio=-1 status=active
  | `- 19:0:0:24 sdl 8:176 active undef running
  `-+- policy='round-robin 0' prio=-1 status=enabled
    `- 18:0:0:24 sdj 8:144 active undef running

  But when the VM is migrated to the destination, the related
  information is like the following example since we CANNOT guarantee
  that all nodes are able to access the same iSCSI portals and the same
  target LUN number. And the information is used to overwrite
  connection_info in the DB before the post live migration logic is
  executed.

  stack at ubuntu-server13:~/devstack$ ls /dev/disk/by-path/*24
  /dev/disk/by-path/ip-192.168.3.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b5-lun-100
  /dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-100

  stack at ubuntu-server13:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
  3600601602ba03400921130967724e411 dm-3 DGC,VRAID
  size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
  |-+- policy='round-robin 0' prio=-1 status=active
  | `- 19:0:0:100 sdf 8:176 active undef running
  `-+- policy='round-robin 0' prio=-1 status=enabled
    `- 18:0:0:100 sdg 8:144 active undef running

  As a result, if post live migration in source side uses <IP>, <IQN> and <TARGET LUN Number> to find the devices to clean up, it may use 192.168.3.51, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 100.
  However, the correct one should be 192.168.3.50, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 24.

  Similar philosophy in (https://bugs.launchpad.net/nova/+bug/1327497)
  can be used to fix it: Leverage the unchanged multipath_id to find
  correct devices to delete.

  [Test Case]

  Live migrate an instance which uses iSCSI multipath. Verify the
  correct target is removed on source hypervisor.

  [Regression Potential]

  Not much, its included in the next release (Juno). The change
  introduces a check to use a field already used by fiber multipath
  connections which was not used by iscsi multipath code path on
  cleanup. If it fails it would keep remaining behavior of not cleaning
  up iscsi sessions/paths.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1357368/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list