[Bug 1615021] Comment bridged from LTC Bugzilla

bugproxy bugproxy at us.ibm.com
Tue Sep 13 18:20:49 UTC 2016


------- Comment From dougmill at us.ibm.com 2016-09-13 14:16 EDT-------
This bug was opened because of a hang being experienced while booting the Ubuntu 16.04 network installer on Briggs & Stratton machines with their X710 ethernet adapters, using the i40e driver.

During investigation, and problem/mistake was found with systemd but is
almost-certainly not the cause of the hang. This fixed systemd was
supposedly being made available in xenial-proposed repositories, but so
far does not seem to have appeared there.

This bug was placed in "verify" state and it started causing email to be
sent several times a day reminding me to verify the fix.Since we don't
believe that the "fix to systemd" will fix the hang during the installer
boot, and since this new systemd has not been pushed out to the xenial-
proposed installer after 6 days, I have taken this bug out of "verify"
state by re-opening it.

When there actually is something to be tested, and it has made it's way
into the xenial-proposed installer, then this bug can be set back to
"verify" and I will test the fix.

------- Comment From dougmill at us.ibm.com 2016-09-13 14:18 EDT-------
I should also ammend my previous comment by saying, if Canonical has some suggestions of how to gather more information in order to help debug this, they should let us know and we can make test runs for them.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1615021

Title:
  Unable to network boot Ubuntu 16.04 installer normally on Briggs

Status in busybox package in Ubuntu:
  Fix Released
Status in debian-installer package in Ubuntu:
  Triaged
Status in systemd package in Ubuntu:
  Fix Released
Status in busybox source package in Xenial:
  Won't Fix
Status in debian-installer source package in Xenial:
  Triaged
Status in systemd source package in Xenial:
  Fix Committed
Status in busybox source package in Yakkety:
  Fix Released
Status in debian-installer source package in Yakkety:
  Triaged
Status in systemd source package in Yakkety:
  Fix Released

Bug description:
  == Comment: #7 - Guilherme Guaglianoni Piccoli <gpiccoli at br.ibm.com> - 2016-08-19 10:08:07 ==
  The normal procedure to perform a Netboot installation of Ubuntu 16.04 is to download the latest vmlinux and initrd.gz files available, and kexec them with no parameters (at least in ppc64el).

  We're experiencing a strange issue in which the installer freezes
  before menus are showed. The system hangs in the point specified
  below, right after the i40e driver initialization:

  [   11.052832] i40e 0002:01:00.0 enP2p1s0f0: renamed from eth0
  [   11.073976] i40e 0002:01:00.1 enP2p1s0f1: renamed from eth1
  [   11.117799] i40e 0002:01:00.2 enP2p1s0f2: renamed from eth2
  [   11.225745] i40e 0002:01:00.3 enP2p1s0f3: renamed from eth3
  ***HANG***

  The most difficult part in this issue is that it seems to be a timing
  issue/race condition, and many debug trials end up by avoiding the
  issue reproduction (heisenbug).

  We were successful though in getting logs by booting the kernel with
  the command-line "BOOT_DEBUG=2" and by changing the initrd in order to
  enable systemd debug; only the files "init" and "start-udev" were
  changed in initrd, both attached here.

  We've attached here a saved screen session that shows the entire boot
  process until it gets flooded with lots of messages like:

  "starting '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'
  '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/
  udev/rules.d/80-net-setup-link.rules': No such file or directory'

  seq 3244 queued, 'add' 'pci_bus'
  starting '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'
  passed 408 byte device to netlink monitor 0x1003cfe8020seq 3236 running'/bin/readlink /etc/udev/rules.d/80-net-setup-l
  ink.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules': No such
  file or directory'
  '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/
  udev/rules.d/80-net-setup-link.rules': No such file or directory'
  Process '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' failed with exit code 2.
  PROGRAM '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' /lib/udev/rules.d/73-usb-net-by-mac.rules:6
  passed device to netlink monitor 0x1003d01f730
  "

  Then it keeps hanged in this stage. We re-tested it by changing the
  file 73-usb-net-by-mac.rules in initrd, replacing "
  /etc/udev/rules.d/80-net-setup-link.rules" to  "/lib/udev/rules.d/80
  -net-setup-link.rules", since the former does not exist whereas the
  latter does. Same issue were observed!

  Notice that if we boot the installer with command-line "net.ifnames=0"
  or "net.ifnames=1", the problem does not reproduces anymore.

  We want to ask Canonical's help in investigating this issue.
  Thanks,

  Guilherme

  
  SRU INFORMATION for systemd
  ===========================

  Test case:
   * Check what happens for uevents on devices which are not USB network interfaces:
     udevadm test /sys/devices/virtual/mem/null
     udevadm test /sys/class/net/lo

   With the current version these will run

    PROGRAM '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'
  /lib/udev/rules.d/73-usb-net-by-mac.rules:6

   which is pointless. With the proposed version these should be gone.

   * Ensure that the rule still works as intended by connecting an USB
  network device that has a permanent MAC address (e. g. Android
  tethering uses a temporary MAC): You should get a MAC-based name like
  "enx12345678" for it. Now disconnect it again, disable ifnames with

      sudo ln -s /dev/null /etc/udev/rules.d/80-net-setup-link.rules

  and reconnect the device. You should now get a kernel name like "usb0"
  for it.

  * Regression potential: Errors in the rule could break persistent
  naming - or its disabling - of USB network interfaces. Running the
  above test carefully is important to ensure this keeps working. This
  has little to no actual effect on anything else on the system (aside
  from a performance impact and spamming logs), so overall the
  regression potential is low.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/busybox/+bug/1615021/+subscriptions



More information about the foundations-bugs mailing list