APPLIED: [SRU][F][I][PATCH 0/1] Fix network issues on AWS instances

Kleber Souza kleber.souza at canonical.com
Wed Feb 23 10:55:11 UTC 2022


On 23.02.22 11:24, Kleber Sacilotto de Souza wrote:
> BugLink: https://bugs.launchpad.net/bugs/1961968
> 
> [Impact]
> With the latest focal/linux (5.4.0-101.114) and impish/linux (5.13.0-31.34)
> kernels built for SRU cycle 2022.02.21 some AWS instances fail to boot. This
> impacts mostly the instance types: c4.large, c3.xlarge and x1e.xlarge. However,
> not all instances deployed on those types will fail. This is affecting mostly
> c4.large which fails about 80-90% of all deployments.
> 
> This was traced to be caused by the network interface failing to come up. The
> following console log snippets from 5.4.0-101-generic on a c4.large show some
> hints of what's going on:
> 
> [...]
> [ 3.990368] unchecked MSR access error: RDMSR from 0xc90 at rIP:
> 0xffffffff8ea733c8 (native_read_msr+0x8/0x40) [ 3.998463] Call Trace:
> [ 4.001164] ? set_rdt_options+0x91/0x91
> [ 4.004864] resctrl_late_init+0x592/0x63c
> [ 4.008711] ? set_rdt_options+0x91/0x91
> [ 4.012452] do_one_initcall+0x4a/0x200
> [ 4.016115] kernel_init_freeable+0x1c0/0x263
> [ 4.020402] ? rest_init+0xb0/0xb0
> [ 4.024889] kernel_init+0xe/0x110
> [ 4.029245] ret_from_fork+0x35/0x40
> [...]
> [ 7.718268] ena: The ena device sent a completion but the driver didn't receive
> a MSI-X interrupt (cmd 8), autopolling mode is OFF [ 7.727036] ena: Failed to
> submit get_feature command 12 error: -62
> [ 7.731691] ena 0000:00:03.0: Cannot init indirect table
> [ 7.735636] ena 0000:00:03.0: Cannot init RSS rc: -62
> [ 7.740700] ena: probe of 0000:00:03.0 failed with error -62
> [...]
> 
> [Fix]
> Reverting the following upstream stable commit fixes the issue:
> 
> 83dbf898a2d4 PCI/MSI: Mask MSI-X vectors only on success
> 
> [Test Case]
> Boot an affected AWS instance type with focal/linux (5.4.0-101.114) and
> impish/linux (5.13.0-31.34) kernels with the mentioned patch reverted. Then boot
> with the original kernels. It should boot successfully with the reverted patch
> but fail with the original kernels.
> 
> [Regression Potential]
> The patch description mentions fixing a MSI-X issue with a Marvell NVME device,
> which doesn't seem to be following the PCI-E specification. Reverting this
> commit will keep the issue on systems with that particular NVME device unfixed.
> As of now there is no follow-up fix for this commit upstream, we might need to
> keep an eye on any change and re-apply it in case a fix is found.
> 
> Kleber Sacilotto de Souza (1):
>    UBUNTU: SAUCE: Revert "PCI/MSI: Mask MSI-X vectors only on success"
> 
>   drivers/pci/msi.c | 13 +++----------
>   1 file changed, 3 insertions(+), 10 deletions(-)
> 

Applied to focal/impish:linux.

Thanks,
Kleber




More information about the kernel-team mailing list