APPLIED: [SRU][J:linux-bluefield][PATCH v1 0/1] UBUNTU: SAUCE: gpio-mlxbf3: During reboot test, ipmb driver fails to load intermittently
Bartlomiej Zolnierkiewicz
bartlomiej.zolnierkiewicz at canonical.com
Mon Jun 3 10:55:10 UTC 2024
Applied to jammy:linux-bluefield/master-next. Thanks.
--
Best regards,
Bartlomiej
On Tue, May 21, 2024 at 12:02 AM Asmaa Mnebhi <asmaa at nvidia.com> wrote:
>
> BugLink: https://bugs.launchpad.net/bugs/2066198
>
> SRU Justification:
>
> [Impact]
>
> The ipmb driver failing to load is just the result of i2c-mlxbf
> not receiving interrupts.
> In fact, any driver dependent on the i2c-mlxbf driver will not work.
>
> How to reproduce this issue?
>
> - modprobe gpio-mlxbf3
> - modprobe pwr-mlxbf
> - modprobe mlxbf-gige -> this calls into the gpio driver which enables the PHY interrupt (gpio10)
> - reboot linux
> -> graceful reboot does not remove modules so it doesn't disable the PHY interrupt via
> mlxbf3_gpio_irq_disable. Hence, the interrupt remains enabled.
> - In anolis, we don't enforce the dependency between gpio-mlxbf3 and mlxbf-gige.
> So the next time linux boots and loads the driver in this order, we encounter the issue:
> - modprobe mlxbf-gige. The gige driver uses polling in the case where it loads before the gpio
> driver. Note that the interrupt at GPIO10 is still enabled at this point so if the interrupt
> triggers, there is nothing to clear it.
> - modprobe gpio-mlxbf3
> - modprobe i2c-mlxbf. The interrupt wouldn't work here because it is shared with the gpio
> interrupts which was not cleared.
>
> [Fix]
>
> * The solution is to add a shutdown function to the gpio driver to clear and disable all interrupts.
> * Also make sure to clear the interrupt after disabling it in the disable irq function.
>
> [Test Case]
>
> * Do the reboot test (2000-3000 iterations)
> * Check that all following drivers are loaded without errors: gpio-mlxbf3, pwr_mlxbf, mlxbf-gige, i2c-mlxbf
> * check that the ipmb drivers are loaded and functional (send ipmb command to the bmc and vice versa)
>
> [Regression Potential]
>
> * No known regression.
>
More information about the kernel-team
mailing list