[Bug 1807077] Re: [SRU] mountall crashes on udev node with missing devname

Tue Dec 25 04:38:12 UTC 2018

** Description changed:

  [Impact]

-  * udev block nodes without a devname will crash mountall, resulting in
+  * udev block nodes without a devname will crash mountall, resulting in
  an unbootable system (emergency root shell)

-  * While this is not likely to happen in a matched distro/kernel
+  * While this is not likely to happen in a matched distro/kernel
  environment (it was discovered while needing to run a bionic 4.15 kernel
  on trusty), it is possible.

-  * The code in try_udev_device() assumes a block subsystem will always
+  * The code in try_udev_device() assumes a block subsystem will always
  have a devname; the SRU patches explicitly check for a devname and
  return if null.

  [Test Case]

-  * HPE DL385 Gen10, Samsung a822 NVMe controller, trusty install
+  * HPE DL385 Gen10, Samsung a822 NVMe controller, trusty install

-  * Kernels <4.15 (or possibly 4.14) do not expose nvme0c33n1 and do not
+  * Kernels <4.15 (or possibly 4.14) do not expose nvme0c33n1 and do not
  trigger the bug.  Tested on 4.13, 4.4 and 3.13.

-  * Kernel 4.15 exposes nvme0c33n1 in udev but does not have a devname,
+  * Kernel 4.15 exposes nvme0c33n1 in udev but does not have a devname,
  mountall crash ensues.

  [Regression Potential]

-  * Patch might ignore legitimate block devices on existing
+  * Patch might ignore legitimate block devices on existing
  installations.  Unlikely, since the logic path for null devname leads
  directly to a program crash.

  [Other Info]
-  
-  * Additional context for Canonical employees: PS4.5 is a trusty backend cloud, but we now have Gen10 hardware incoming (this was discovered while adding new nova-compute hardware).  Older kernels are not usable because Gen10 requires ilorest, which requires a >4.4 kernel (at least artful 4.13 is known good).  So trusty+4.15 is the only viable combination for continued support of the cloud while adding new hardware.  This is done via apt pinning of bionic for the kernel packages, and, mountall notwithstanding, is working fine so far.
+ 
+  * Additional context for Canonical employees: PS4.5 is a trusty backend
+ cloud, but we now have Gen10 hardware incoming (this was discovered
+ while adding new nova-compute hardware).  Older kernels are not usable
+ because Gen10 requires ilorest, which requires a >4.4 kernel (at least
+ artful 4.13 is known good).  So trusty+4.15 is the only viable
+ combination for continued support of the cloud while adding new
+ hardware.  This is done via apt pinning of bionic for the kernel
+ packages, and, mountall notwithstanding, is working fine so far.
+ 
+ TEST CASE:
+ 1. Enable -proposed
+ 2. apt-get install mountall=2.53ubuntu1
+ 3. update-initramfs -k all -u
+ 4. Reboot
+ 
+ VERIFICATION DONE
+ Rebooted successfully on affected Gen10 systems.  Confirmed no regression on unaffected systems.

  Original description:

  Running bionic's 4.15 kernel on trusty on an HPE DL385 Gen10 results in
  a device node for the NVMe controller,
  /devices/pci0000:40/0000:40:03.1/0000:43:00.0/nvme/nvme0/nvme0c33n1
  which itself does not have a devname.  When mountall gets to it:

  fsck_update: updating check priorities
  try_mount: /srv/nova/instances waiting for device
  try_udev_device: ignored /dev/loop5 (not yet ready?)
  try_udev_device: ignored /dev/loop6 (not yet ready?)
  try_udev_device: ignored /dev/loop1 (not yet ready?)
  try_udev_device: ignored /dev/loop0 (not yet ready?)
  try_udev_device: block (null) (null) (null)

  and then crashes, leaving the boot at an emergency root shell.  A
  successful scan looks like this for comparison:

  try_udev_device: block /dev/sdb (null) (null)
  try_udev_device: block /dev/sdb (null) (null)
  try_udev_device: block /dev/sda (null) (null)
  try_udev_device: block /dev/nvme0n1 ed56e3a9-60f7-4636-85a2-b53137b598e7 (null)
  try_udev_device: block /dev/bcache0 756cb2c6-b999-4905-a021-c2e688e81a86 instances

  The debdiffs check for a null devname in try_udev_device() and will not
  attempt to process it.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mountall in Ubuntu.
https://bugs.launchpad.net/bugs/1807077

Title:
  [SRU] mountall crashes on udev node with missing devname

Status in mountall package in Ubuntu:
  Invalid
Status in mountall source package in Trusty:
  Fix Committed
Status in mountall source package in Xenial:
  Won't Fix

Bug description:
  [Impact]

   * udev block nodes without a devname will crash mountall, resulting
  in an unbootable system (emergency root shell)

   * While this is not likely to happen in a matched distro/kernel
  environment (it was discovered while needing to run a bionic 4.15
  kernel on trusty), it is possible.

   * The code in try_udev_device() assumes a block subsystem will always
  have a devname; the SRU patches explicitly check for a devname and
  return if null.

  [Test Case]

   * HPE DL385 Gen10, Samsung a822 NVMe controller, trusty install

   * Kernels <4.15 (or possibly 4.14) do not expose nvme0c33n1 and do
  not trigger the bug.  Tested on 4.13, 4.4 and 3.13.

   * Kernel 4.15 exposes nvme0c33n1 in udev but does not have a devname,
  mountall crash ensues.

  [Regression Potential]

   * Patch might ignore legitimate block devices on existing
  installations.  Unlikely, since the logic path for null devname leads
  directly to a program crash.

  [Other Info]

   * Additional context for Canonical employees: PS4.5 is a trusty
  backend cloud, but we now have Gen10 hardware incoming (this was
  discovered while adding new nova-compute hardware).  Older kernels are
  not usable because Gen10 requires ilorest, which requires a >4.4
  kernel (at least artful 4.13 is known good).  So trusty+4.15 is the
  only viable combination for continued support of the cloud while
  adding new hardware.  This is done via apt pinning of bionic for the
  kernel packages, and, mountall notwithstanding, is working fine so
  far.

  TEST CASE:
  1. Enable -proposed
  2. apt-get install mountall=2.53ubuntu1
  3. update-initramfs -k all -u
  4. Reboot

  VERIFICATION DONE
  Rebooted successfully on affected Gen10 systems.  Confirmed no regression on unaffected systems.

  Original description:

  Running bionic's 4.15 kernel on trusty on an HPE DL385 Gen10 results
  in a device node for the NVMe controller,
  /devices/pci0000:40/0000:40:03.1/0000:43:00.0/nvme/nvme0/nvme0c33n1
  which itself does not have a devname.  When mountall gets to it:

  fsck_update: updating check priorities
  try_mount: /srv/nova/instances waiting for device
  try_udev_device: ignored /dev/loop5 (not yet ready?)
  try_udev_device: ignored /dev/loop6 (not yet ready?)
  try_udev_device: ignored /dev/loop1 (not yet ready?)
  try_udev_device: ignored /dev/loop0 (not yet ready?)
  try_udev_device: block (null) (null) (null)

  and then crashes, leaving the boot at an emergency root shell.  A
  successful scan looks like this for comparison:

  try_udev_device: block /dev/sdb (null) (null)
  try_udev_device: block /dev/sdb (null) (null)
  try_udev_device: block /dev/sda (null) (null)
  try_udev_device: block /dev/nvme0n1 ed56e3a9-60f7-4636-85a2-b53137b598e7 (null)
  try_udev_device: block /dev/bcache0 756cb2c6-b999-4905-a021-c2e688e81a86 instances

  The debdiffs check for a null devname in try_udev_device() and will
  not attempt to process it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mountall/+bug/1807077/+subscriptions