[Bug 1452641] Re: Static Ceph mon IP addresses in connection_info can prevent VM startup

Tyler Stachecki 1452641 at bugs.launchpad.net
Fri Oct 23 13:56:52 UTC 2020


We have also been bitten by this.  Apologies if this does not help solve
the bug, but this issue has been floating for quite awhile and the
following may help future cloud operators...

In our case, we trying to re-IP ALL of our Ceph Mons.  As Corey
mentioned, this bug report is for *Cinder volumes*... but note that all
of our instances were observed to make use of RBD-backed configuration
drives which suffered the same problem as the images... so you may
suffer from both problems even if you exclusively boot all instances
from volume!

* RBD config drives AND Glance/image-based RBD volumes DID NOT have
their Ceph Mon addresses updated as part of a live-migration, even with
the patch in #9.  The Ceph Mon addresses for these types in volumes IN
PARTICULAR are NOT stored anywhere in a database and rather seem to be
derived as needed  when certain actions occur and otherwise carted
around from hyp to hyp by way of the libvirt domain XML.  Again, see the
other LP bug for this.

* Trying to 'fix up' the Ceph Mon addresses via 'virsh edit' or
comparable and then trying to live-migrate an instance to have those
changes reflected is futile, because the Ceph Mon address changes are
not reflected until a hard bounce of the VMM for that instance AND nova-
compute uses the running copy of libvirt domain XML when shipping a copy
to a destination hypervisor, NOT the copy on disk.

What we may end up doing (that worked in a lab environment) is to respin
a patch off #9 that is applied to all worknode.  It searches for all
instances of './devices/disk/source' in the XML document which have an
'rbd' protocol.  For each entry, we replace the current host subelements
with our new Ceph Mon addresses.  Then live-migrate every VM exactly
once.

This works for all kinds of RBD volumes and, unlike 'virsh edit', works
because the in-memory libvirt domain XML is rewritten prior to the VMM
starting up on the destination host.  Note that while you are doing the
LMs and updating the domain XMLs, you must keep at least one of the old
and new Ceph Mons accessible at all times.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1452641

Title:
  Static Ceph mon IP addresses in connection_info can prevent VM startup

Status in OpenStack Compute (nova):
  In Progress
Status in nova package in Ubuntu:
  Triaged

Bug description:
  The Cinder rbd driver extracts the IP addresses of the Ceph mon servers from the Ceph mon map when the instance/volume connection is established. This info is then stored in nova's block-device-mapping table and is never re-validated down the line. 
  Changing the Ceph mon servers' IP adresses will prevent the instance from booting as the stale connection info will enter the instance's XML. One idea to fix this would be to use the information from ceph.conf, which should be an alias or a loadblancer, directly.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1452641/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list