[SRU][J:linux-bluefield][PATCH v1 0/1] UBUNTU: SAUCE: mlxbf-gige: Vitesse PHY stuck in a bad state during reboot test
Asmaa Mnebhi
asmaa at nvidia.com
Mon Apr 29 20:01:14 UTC 2024
BugLink: https://bugs.launchpad.net/bugs/2062384
SRU Justification:
[Impact]
During the QA reboot test, the BF3 Vitesse PHY gets stuck in a bad state, resulting in no ip provisioning. The only way to recover is to powercycle.
We might have found a software workaround to avoid getting in this state in the first place: suspend the PHY during graceful shutdown. Suspend the PHY = Power down = set bit 11 to 1 in reg 0 of the PHY. This WA passed 1800 reboots on QA's setup.
[Fix]
* During reboot, the mlxbf_gige_shutdown() function makes a call to phy_stop(). phy_stop() calls phy_suspend().
* Certain Linux PHY drivers, like the Vitesse PHY, don't support suspend() to power down the PHY during shutdown.
* Our Hardware also does not toggle the hard reset signal of the PHY during reboot.
* Hence, when the PHY is in a bad state, it stays in its bad state until powercycle.
* We have found a way to prevent the PHY from entering this bad state by suspending the PHY in the case of reboot.
[Test Case]
* do the reboot test (at least 2000 reboots): run 'reboot' from linux.
* Check that the oob_net0 interface is up and the ip is assigned.
* please note that if the the OOB doesn't get an ip, try reloading the driver (rmmod/modprobe). it that solves the issue, that would be a different bug. In the bug at stake, nothing recovers the OOB ip except power cycle.
[Regression Potential]
* Make sure the redfish DHCP is still working during the reboot test
* Make sure the OOB gets an ip
[Other]
These changes were made both in the mlxbf-gige driver and UEFI
More information about the kernel-team
mailing list