APPLIED: [SRU][J:linux-bluefield][PATCH v1 0/1] UBUNTU: SAUCE: gpio-mlxbf3: During reboot test, ipmb driver fails to load intermittently

Bartlomiej Zolnierkiewicz bartlomiej.zolnierkiewicz at canonical.com
Mon Jun 3 10:55:10 UTC 2024


Applied to jammy:linux-bluefield/master-next. Thanks.

--
Best regards,
Bartlomiej

On Tue, May 21, 2024 at 12:02 AM Asmaa Mnebhi <asmaa at nvidia.com> wrote:
>
> BugLink: https://bugs.launchpad.net/bugs/2066198
>
> SRU Justification:
>
> [Impact]
>
>     The ipmb driver failing to load is just the result of i2c-mlxbf
>     not receiving interrupts.
>     In fact, any driver dependent on the i2c-mlxbf driver will not work.
>
>     How to reproduce this issue?
>
>     - modprobe gpio-mlxbf3
>     - modprobe pwr-mlxbf
>     - modprobe mlxbf-gige -> this calls into the gpio driver which enables the PHY interrupt (gpio10)
>     - reboot linux
>       -> graceful reboot does not remove modules so it doesn't disable the PHY interrupt via
>          mlxbf3_gpio_irq_disable. Hence, the interrupt remains enabled.
>     - In anolis, we don't enforce the dependency between gpio-mlxbf3 and mlxbf-gige.
>       So the next time linux boots and loads the driver in this order, we encounter the issue:
>     - modprobe mlxbf-gige. The gige driver uses polling in the case where it loads before the gpio
>       driver. Note that the interrupt at GPIO10 is still enabled at this point so if the interrupt
>       triggers, there is nothing to clear it.
>     - modprobe gpio-mlxbf3
>     - modprobe i2c-mlxbf. The interrupt wouldn't work here because it is shared with the gpio
>       interrupts which was not cleared.
>
> [Fix]
>
> * The solution is to add a shutdown function to the gpio driver to clear and disable all interrupts.
> * Also make sure to clear the interrupt after disabling it in the disable irq function.
>
> [Test Case]
>
> * Do the reboot test (2000-3000 iterations)
> * Check that all following drivers are loaded without errors: gpio-mlxbf3, pwr_mlxbf, mlxbf-gige, i2c-mlxbf
> * check that the ipmb drivers are loaded and functional (send ipmb command to the bmc and vice versa)
>
> [Regression Potential]
>
> * No known regression.
>



More information about the kernel-team mailing list