[Bug 1437353] Re: UEFI network boot hangs at grub for adapter 82599ES 10-Gigabit SFI/SFP+
Mathieu Trudel-Lapierre
mathieu.tl at gmail.com
Thu May 17 13:49:11 UTC 2018
I have positively verified that an affected system (which has a 82599ES
10-Gigabit SFI/SFP+) exhibits the ARP storm behavior when booting with
MAAS using the grubnetx64.efi binary in xenial(-updates), leading to
stopping in the grub prompt; and with the grubnetx64.efi binary in
xenial-proposed no ARP storm is noticeable, and the system boots
normally as expected.
The bad binaries are any 2.02~beta2-36ubuntu3.17 and prior; the binary
installed by MAAS or I would most commonly expect to see on an affected
setup comes from xenial-updates (grub2 2.02~beta2-36ubuntu3.17) has a
sha256sum of:
b164561b4f42223b6d37e00f613adc32c22e5377c0fb6a6615e101c625d9b9cb
And the valid binary from xenial-proposed (until this SRU is released to xenial-updates), comes from grub2 2.02~beta2-36ubuntu3.18 and has a sha256sum of:
a92ed9943c6569a999b9b437e7ca07ccac7d30a4df1a18cdd68b406e5d45013c
If running 'sha256sum /var/lib/maas/boot-resources/current/bootloader/uefi/amd64/grubx64.efi' (or using the path appropriate to non-MAAS netboot setups) yields the same value as above (a92ed9943c6569a999b9b437e7ca07ccac7d30a4df1a18cdd68b406e5d45013c), then you are running the patched version of grub2. If you are still noticing issues, then you would be seeing a different bug, one that should be reported separately.
Marking this verification-done.
** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grub2 in Ubuntu.
https://bugs.launchpad.net/bugs/1437353
Title:
UEFI network boot hangs at grub for adapter 82599ES 10-Gigabit
SFI/SFP+
Status in MAAS:
Invalid
Status in maas-images:
Triaged
Status in python-tx-tftp:
Invalid
Status in grub2 package in Ubuntu:
Fix Released
Status in grub2-signed package in Ubuntu:
New
Status in grub2 source package in Trusty:
New
Status in grub2-signed source package in Trusty:
New
Status in grub2 source package in Xenial:
Fix Committed
Status in grub2-signed source package in Xenial:
Fix Committed
Status in grub2 source package in Yakkety:
Won't Fix
Status in grub2-signed source package in Yakkety:
Won't Fix
Bug description:
[Impact]
MAAS commissioning may fail when deploying Xenial images or using grubx64.efi from Xenial due to hardware particularities of some Intel 82599-based network cards. Other network manufacturers may be affected as well. The main failure mode appears to be an infinite re-send of some packets because of an unexpected response from the network hardware.
[Test case]
1) Attempt to netboot on a system with a "82599ES 10-Gigabit SFI/SFP+" network adapter; in UEFI mode.
2) Validate that netbooting happens correctly, passing control over to the kernel as configured in grub.cfg.
3) Validate that netbooting another system, not using an Intel 82599
adapter, behaves normally when booting in UEFI mode.
4) Validate that netbooting another system, not using an Intel 82599
adapter, behaves normally when booting in LEGACY mode.
[Regression potential]
As this affects network in EFI mode; any failure to netboot using EFI should be considered a possible regression. Systems may fail to receive data from the network boot server and terminate the process with a timeout. Another possible failure scenario is to fail to receive complete data over the network, or data corruption.
----
I am using MAAS to commission and install machines. When I attempt to commission a machine with a "82599ES 10-Gigabit SFI/SFP+" network adapter the following happens:
1) TFTP Request — bootx64.efi
2) TFTP Request — /grubx64.efi
3) Console hangs at grub prompt
If I go into bios and force the adapter above into legacy mode then the machine is able to network boot and run through the commission process.
1) TFTP Request — ubuntu/amd64/generic/trusty/release/boot-initrd
2) TFTP Request — ubuntu/amd64/generic/trusty/release/boot-kernel
3) TFTP Request — ifcpu64.c32
4) PXE Request — power off
5) TFTP Request — pxelinux.cfg/01-90-e2-ba-52-23-78
6) TFTP Request — pxelinux.cfg/71e3f102-bd8b-11e4-b634-3c18a001c80a
7) TFTP Request — pxelinux.0
Also, if I disconnect the cable to the adapter above and connect a
cable to the integrated "I210 Gigabit" adapter which is configured for
UEFI mode. The machine is able to network boot grubx64.efi and run
through the commission process.
~$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=====================================-==================================-============-===============================================================================
ii maas 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS server all-in-one metapackage
ii maas-cli 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS command line API tool
ii maas-cluster-controller 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS server cluster controller
ii maas-common 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS server common files
ii maas-dhcp 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS DHCP server
ii maas-dns 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS DNS server
ii maas-proxy 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS Caching Proxy
ii maas-region-controller 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS server complete region controller
ii maas-region-controller-min 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS Server minimum region controller
ii python-django-maas 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS server Django web framework
ii python-maas-client 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS python API client
ii python-maas-provisioningserver 1.7.2+bzr3355-0ubuntu1~trusty1 all MAAS server provisioning libraries
~$
To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1437353/+subscriptions
More information about the foundations-bugs
mailing list