[Bug 2044549] Re: pxe boot on arm64 stopped working
Mate Kukri
2044549 at bugs.launchpad.net
Thu Jan 18 15:28:15 UTC 2024
Few bits of information about this:
- It was found that the cause of the regression is GRUB binary size growing instead of any code or compiler changes. Padding the last pre-regression binary to the same size as the first regressed one reproduces the same failures. (Padding to much larger binary size seems to avoid this.)
- After considerable effort debugging this, it was found that the failure happens because the firmware interface used by GRUB's efinet driver locks up and stops transmitting packets. (This is manifested by transmit buffers never being "recycled", and adding new buffers to the queue eventually fill it up and
lock the driver to permanently return EFI_NOT_READY until platform reset.)
- This is either a UNDI driver / firmware bug on the target machines, or an issue in GRUB's usage of the EFI simple networking protocol that always existed (I consider this rather unlikely).
Unfortunately, there is no realistic code change in stable release GRUBs
that could fix this (unless a rather unlikely existing GRUB bug is
identified).
Proposed workarounds were:
- Padding GRUB binary sizes to a larger size which seems to experimentally avoid locking up the network card (tho this wasn't proven).
- Using the UEFI provided TFTP stack for TFTP netbooting in future GRUB releases.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grub2 in Ubuntu.
https://bugs.launchpad.net/bugs/2044549
Title:
pxe boot on arm64 stopped working
Status in grub2 package in Ubuntu:
Incomplete
Bug description:
We have a lab with eight arm64 machines which are deployed from MAAS (also on arm64).
Two weeks ago we noticed the commissioning in MAAS is not working.
We can see MAAS sends two files (bootaa64.efi and grubaa64.efi) as requested and then machine goes to grub prompt.
@alexsander-souza from MAAS team had a look and he suspects grub issue. net_bootp command returns "error: couldn't send network packet".
Sometimes when machine is power cycled it can deploy again, but not always.
Any idea how to debug this?
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/2044549/+subscriptions
More information about the foundations-bugs
mailing list