[Bug 1869116] Re: smartctl-validate is borked in a recent release

Lee Trager lee.trager at canonical.com
Fri Mar 27 19:53:49 UTC 2020


I spoke with the LXD team and they're passing through the name they get
from the kernel. It seems like this may be a bug that was introduced to
util-linux with RAID devices. I tried running lsblk on Focal against two
NVME drives and I do not see an underscore used in model names.

@cees - Can you try using a different commissioning operating system and
see if the problem is resolved? You can download 16.04 and 20.04 on the
images page and then change the commissioning operating system on the
settings page.

** Also affects: util-linux-ng
   Importance: Undecided
       Status: New

** No longer affects: util-linux-ng

** Also affects: util-linux (Ubuntu)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1869116

Title:
  smartctl-validate is borked in a recent release

Status in lxd:
  Unknown
Status in MAAS:
  Triaged
Status in util-linux package in Ubuntu:
  New

Bug description:
  Bug (maybe?) details first, diatribe second.

  Bug Summary: multi-hdd / raid with multiple drives / multiple devices
  or something along those lines cannot be commissioned anymore: 2.4.x
  worked fine, 2.7.0 does not.

  Here is the script output of smartctl-validate:

  -----
  # /dev/sda (Model: PERC 6/i, Serial: 6842b2b0740e9900260e66c9220df4ac)

  Unable to run 'smartctl-validate': Storage device 'PERC 6/i' with serial '6842b2b0740e9900260e66c9220df4ac' not found!
  This indicates the storage device has been removed or the OS is unable to find it due to a hardware failure. Please re-commission this node to re-discover the storage devices, or delete this device manually.
  Given parameters:
  {'storage': {'argument_format': '{path
          }', 'type': 'storage', 'value': {'id_path': '/dev/disk/by-id/wwn-0x6842b2b0740e9900260e66c9220df4ac', 'model': 'PERC 6/i', 'name': 'sda', 'physical_blockdevice_id': 33, 'serial': '6842b2b0740e9900260e66c9220df4ac'
          }
      }
  }
  Discovered storage devices: [
      {'NAME': 'sda', 'MODEL': 'PERC_6/i', 'SERIAL': '6842b2b0740e9900260e66c9220df4ac'
      },
      {'NAME': 'sdb', 'MODEL': 'PERC_6/i', 'SERIAL': '6842b2b0740e9900260e66f924ecece0'
      },
      {'NAME': 'sr0', 'MODEL': 'TEAC_DVD-ROM_DV-28SW', 'SERIAL': '10092013112645'
      }
  ]
  Discovered interfaces: {'xx: xx: xx: xx: xx: xx': 'eno1'
  }
  -----
  -----
  # /dev/sdb (Model: PERC 6/i, Serial: 6842b2b0740e9900260e66f924ecece0)
  Unable to run 'smartctl-validate': Storage device 'PERC 6/i' with serial '6842b2b0740e9900260e66f924ecece0' not found!
  This indicates the storage device has been removed or the OS is unable to find it due to a hardware failure. Please re-commission this node to re-discover the storage devices, or delete this device manually.
  Given parameters: {'storage': {'argument_format': '{path
          }', 'type': 'storage', 'value': {'id_path': '/dev/disk/by-id/wwn-0x6842b2b0740e9900260e66f924ecece0', 'model': 'PERC 6/i', 'name': 'sdb', 'physical_blockdevice_id': 34, 'serial': '6842b2b0740e9900260e66f924ecece0'
          }
      }
  }
  Discovered storage devices: [
      {'NAME': 'sda', 'MODEL': 'PERC_6/i', 'SERIAL': '6842b2b0740e9900260e66c9220df4ac'
      },
      {'NAME': 'sdb', 'MODEL': 'PERC_6/i', 'SERIAL': '6842b2b0740e9900260e66f924ecece0'
      },
      {'NAME': 'sr0', 'MODEL': 'TEAC_DVD-ROM_DV-28SW', 'SERIAL': '10092013112645'
      }
  ]
  Discovered interfaces: {'xx: xx: xx: xx: xx: xx': 'eno1'
  }
  -----

  You can see that it says the storage cannot be found and immediately
  lists it as a discovered device. It does it for both tests (one for
  each drive), and for both servers

  Bug Details:
  I had maas 2.4.x for the longest time over my journey (see below journey) and have never had any problems re-commissioning (or deleting and re-discovering over boot PXE) 2 of my servers (r610, r710).

  r610 has an iPERC 6, four 10K X00GB drives configured in a RAID10, 1 virtual disk.
  r710 has an iPERC 6, 6x 2TB drives, configured in a RAID10, 2 virtual disks

  So commission after commission trying to get through my journey, 0
  problems. After I finally get everything figured out on the juju,
  network/vlan, quad-nic end, I go to re-commission and I cannot.
  smartctl-validate fails on both, over and over again. I even destroyed
  and re-created the raid/VDs, still not.

  After spending so much time on it I remembered that it was the first
  time I had tried to re-commission these two servers since doing an
  upgrade from 2.4.x->2.7 in an effort to use the updated KVM
  integration to add a couple more guests. Once I got all everything
  figured out I went to re-commission everything and boom.

  [Upgrade path notes]
  In full disclosure, in case this matters. I was on apt install of 2.4.x and using snap for 2.7, except it didn't work. So I read on how to do apt 2.7 and did that and did not uninstall snap 2.7 yet. I wanted to migrate from apt to snap but do not know how to without losing all maas data and could not find docs on it, so a problem for another day. But in case that is part of the problem for some odd reason, I wanted to share.

  
  [Diatribe]
  My journey to get maas+juju+openstack+kubernets has been less then stellar. I have ran into problem after problem; albeit some of which were my own. I am so close, after spending the last 6 months on/off when I had time, and really hardcore the last 4 days. The last half day of which has been this little gem. Maas has been pretty fun to work with but some thing have been the biggest pain in the a-hole to understand. Like un/managed subnets comes to mind: "Managed: we're going to use IPs, even with DHCP off. Unmanaged: We're still going to use IPs, but be different". Anyway, this doesn't belong here, if it gets modded out that's fine. It makes me feel a little better typing it knowing that I *think* my last problem was solved to get this up and running; just trying to contribute something that I can, back"

  I did want to say thanks to those made/maintain maas. Despite the
  problems I somehow always run into I have enjoyed figuring it out.

  
  -Red

To manage notifications about this bug go to:
https://bugs.launchpad.net/lxd/+bug/1869116/+subscriptions



More information about the foundations-bugs mailing list