APPLIED: [SRU][J][PATCH v2 0/2] KVM: arm64: fix softlockups in stage2_apply_range

Roxana Nicolescu roxana.nicolescu at canonical.com
Mon Mar 11 08:50:48 UTC 2024


On 06/03/2024 01:50, Krister Johansen wrote:
> BugLink:https://bugs.launchpad.net/bugs/2056227
>
> [Impact]
>
> Tearing down kvm VMs on arm64 can cause softlockups to appear on console.  When
> terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times
> often exceed 20 seconds, which can trigger the softlockup detector.  Portions of
> the unmap path also have interrupts disabled while tlb invalidation instructions
> run, which can further contribute to latency problems.  My team has observed
> networking latency problems if the cpu where the teardown is occurring is also
> mapped to handle a NIC interrupt.
>
> Fortunately, a solution has been in place since Linux 6.1.  A small pair of
> patches modify stage2_apply_range to operate on smaller memory ranges before
> performing a cond_resched.  With these patches applied, softlockups are no
> longer observed when tearing down VMs with large amounts of memory.
>
> Although I also submitted the patches to 5.15 LTS (link to LTS submission in
> "Backport" section), I'd appreciate it if Ubuntu were willing to take this
> submission in parallel since the impact has left us unable to utilize arm64 for
> kvm until we can either migrate our hypervisors to hugepages, pick up this fix,
> or some combination of the two.
>
> [Backport]
>
> Backport the following fixes from linux 6.1:
>
> 3b5c082bbf KVM: arm64: Work out supported block level at compile time
> 5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block
>
> The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as
> part of the series.  The original submission is here:
>
> https://lore.kernel.org/all/20221007234151.461779-1-oliver.upton@linux.dev/
>
> I've also submitted the patches to 5.15 LTS here:
>
> https://lore.kernel.org/stable/cover.1709665227.git.kjlx@templeofstupid.com/
>
> Both fixes cherry picked cleanly and there were no conflicts.
>
> [Test]
>
> Executed a variation of the test from 5994bc9e05 as well as my own run of
> kvm_page_table_test on a VM with 4k pages and a memory size > 100Gb.  Without
> the patches, softlockups were observed in both tests.  With the patches applied,
> the tests ran without incident.
>
> This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055.
>
> [Potential Regression]
>
> Regression potential is low.  These patches have been present in Linux since 6.1
> and appear to have needed no further maintenance.
>
> [Change in v2]
>
> I ran format-patch without the --from option which incorrectly generated the
> first series without leaving Oliver in place as the author.  The v2 should
> retain the correct authorship.  Apologies for the mistake.
>
>
> Oliver Upton (2):
>    KVM: arm64: Work out supported block level at compile time
>    KVM: arm64: Limit stage2_apply_range() batch size to largest block
>
>   arch/arm64/include/asm/kvm_pgtable.h    | 18 +++++++++++++-----
>   arch/arm64/include/asm/stage2_pgtable.h | 20 --------------------
>   arch/arm64/kvm/mmu.c                    |  9 ++++++++-
>   3 files changed, 21 insertions(+), 26 deletions(-)
Applied to jammy master-next branch. Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20240311/2691e169/attachment.html>


More information about the kernel-team mailing list