[B][D][E][F][SRU][PATCH 0/1] bonding: fix state transition issue in link monitoring

Po-Hsu Lin po-hsu.lin at canonical.com
Wed Nov 13 10:51:17 UTC 2019

BTW I just found out that this is queuing for the stable kernel upstream.

On Wed, Nov 13, 2019 at 3:33 PM Po-Hsu Lin <po-hsu.lin at canonical.com> wrote:
> == Justification ==
> From the well explained commit message:
> Since de77ecd4ef02 ("bonding: improve link-status update in
> mii-monitoring"), the bonding driver has utilized two separate variables
> to indicate the next link state a particular slave should transition to.
> Each is used to communicate to a different portion of the link state
> change commit logic; one to the bond_miimon_commit function itself, and
> another to the state transition logic.
>         Unfortunately, the two variables can become unsynchronized,
> resulting in incorrect link state transitions within bonding.  This can
> cause slaves to become stuck in an incorrect link state until a
> subsequent carrier state transition.
>         The issue occurs when a special case in bond_slave_netdev_event
> sets slave->link directly to BOND_LINK_FAIL.  On the next pass through
> bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL
> case will set the proposed next state (link_new_state) to BOND_LINK_UP,
> but the new_link to BOND_LINK_DOWN.  The setting of the final link state
> from new_link comes after that from link_new_state, and so the slave
> will end up incorrectly in _DOWN state.
>         Resolve this by combining the two variables into one.
> == Fixes ==
> * 1899bb32 (bonding: fix state transition issue in link monitoring)
> This patch can be cherry-picked into E/F
> For older releases like B/D, it will needs to be backported as they are
> missing the slave_err() printk marco added in 5237ff79 (bonding: add
> slave_foo printk macros) as well as the commit to replace netdev_err()
> with slave_err() in e2a7420d (bonding/main: convert to using slave
> printk macros)
> For Xenial, the commit that causes this issue, de77ecd4, does not exist.
> == Test ==
> Test kernels can be found here:
> https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/
> The X-hwe and Disco kernel were tested by the bug reporter, Aleksei,
> the patched kernel works as expected.
> == Regression Potential ==
> Low.
> This patch just unify the variable used in link state change commit
> logic to prevent the occurrence of an incorrect state. And the changes
> are limited to the bonding driver itself.
> (Although the include/net/bonding.h will be used in other drivers, but
> the changes to that file is only affecting this bond_main.c driver)
> Jay Vosburgh (1):
>   bonding: fix state transition issue in link monitoring
>  drivers/net/bonding/bond_main.c | 43 ++++++++++++++++++++---------------------
>  include/net/bonding.h           |  3 +--
>  2 files changed, 22 insertions(+), 24 deletions(-)
> --
> 2.7.4

More information about the kernel-team mailing list