APPLIED(X,B,E)/cmt: [PATCH 0/1] Multiple kexecs in AWS nitro instances fail

Guilherme Piccoli gpiccoli at canonical.com
Fri Apr 3 11:39:52 UTC 2020


Hi Khaled, thanks for raising this flag!

I understand Disco is EOL but kernel 5.0 for some flavors will still
be released - this patch is only really useful on aws (ena driver is
present there), so if kernel 5.0 will get released to -aws flavor, I'd
say please apply in 5.0 too. Otherwise, I don't see a strong reason
for it.

Cheers,


Guilherme

On Thu, Apr 2, 2020 at 11:36 PM Khaled Elmously
<khalid.elmously at canonical.com> wrote:
>
> Applied to X and B and E .
>
> Guilherme - does this need to go in Disco as well? Disco is EOL but there are still several 5.0 derivatives.
> For what it's worth, the Eoan/Focal version of the patch applies cleanly to 5.0
>
> Thanks
>
>
>
> On 2020-04-01 18:40:25 , Guilherme G. Piccoli wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1869948
> >
> >
> > [Impact]
> >
> > * Currently, users cannot perform multiple kernel kexec loads on AWS Nitro
> > instances (KVM-based); after the 2nd or 3rd kexec, an initrd corruption is
> > observed, with the following signature:
> >
> > Initramfs unpacking failed: junk within compressed archive
> > [...]
> > Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc7-gpiccoli+ #26 Hardware name: Amazon EC2 t3.large/, BIOS 1.0 10/16/2017
> > Call Trace:
> >   dump_stack+0x6d/0x9a
> >   ? csum_partial_copy_generic+0x150/0x170
> >   panic+0x101/0x2e3
> >   ? do_execve+0x25/0x30
> >   ? rest_init+0xb0/0xb0
> >   kernel_init+0xfb/0x100
> >   ret_from_fork+0x35/0x40
> >
> > * After investigation (see LP comment 2), it was noticed the Amazon ena network
> > driver doesn't provide a shutdown() handler, hence it could be performing a DMA
> > transaction to a previous valid address during boot, which would then  corrupt
> > kernel memory. The following patch was proposed and fixed the issue,  allowing
> > 1000 kexecs to be executed successfully with no issues observed:
> > 428c491332bc ("net: ena: Add PCI shutdown handler to allow safe kexec")
> > [ git.kernel.org/linus/428c491332bc ].
> >
> > * Hence, we are hereby requesting SRU for this patch. It was tested in all
> > supported series (4.4, 4.15 and 5.3) in Amazon Nitro instances with success,
> > and reviewed/acked by ena driver team and a kexec developer from other distro.
> > Worth mentioning that we proposed an upstream multi-vendor discussion about
> > this issue: marc.info/?l=kexec&m=158299605013194 .
> >
> > [Test case]
> >
> > * The basic test procedure is about performing multiple kexecs sequentially;
> > AWS does not provide a full console, so in case of failures one could check
> > the instance screenshot or use pstore/ramoops in order to collect dmesg after
> > a crash in a preserved memory area. The commands used to perform kexec are:
> >
> > kexec -l <kernel file> --initrd <initrd file> --reuse-cmdline
> > systemctl kexec
> >
> > Alternatively, one could user "--append=" instead of "--reuse-cmdline" if a
> > change in kexec command-line is desired; also, to execute the kexec-loaded
> > kernel both "kexec -e" and "systemctl kexec" are equally valid.
> >
> > * On LP (comment 3) we proposed a script/approach to auto-test kexecs, used
> > here to perform 1000 kexecs with the proposed patch.
> >
> > [Regression Potential]
> >
> > * Although the patch proposed here introduce a PCI handler, it kept the remove
> > handler identical and based shutdown strongly on ena_remove(), changing just
> > netdev handling following other upstream drivers. It was extensively tested
> > and presented no issue. Also, it's self-contained and affect only one driver,
> > so any other cloud providers or non-cloud environment wouldn't be even affected
> > by the patch.
> >
> > * In case of a potential regression, it could manifest as a delay or issue
> > on reboot/shutdown path, only if ena driver is in use.
> >
> > Guilherme G. Piccoli (1):
> >   net: ena: Add PCI shutdown handler to allow safe kexec
> >
> >  drivers/net/ethernet/amazon/ena/ena_netdev.c | 51 ++++++++++++++++----
> >  1 file changed, 41 insertions(+), 10 deletions(-)
> >
> > --
> > 2.25.2
> >
> >
> > --
> > kernel-team mailing list
> > kernel-team at lists.ubuntu.com
> > https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list