ACK: [SRU][N][PATCH 0/1] Add 'mm: hold PTL from the first PTE while reclaiming a large folio' to fix L2 Guest hang during LTP Test (LP: 2076147)

Mehmet Basaran mehmet.basaran at canonical.com
Thu Sep 26 15:16:14 UTC 2024


Acked-by: Mehmet Basaran <mehmet.basaran at canonical.com>

-------------- next part --------------
frank.heimes at canonical.com writes:

> BugLink: https://bugs.launchpad.net/bugs/2076147
>
> SRU Justification:
>
>  * KVM 2nd level guest (means KVM VM that runs nested on top of a Power 10
>    PowerVM hypervisor) hangs during LTP (Linux Test Projects) test suite.
>
>  * It hangs with:
>    "Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab"
>
>  * Diagnosing the issues points this this fix/upstream-commit:
>    [commit message, by Barry Song <v-songbaohua at oppo.com>]
>    Within try_to_unmap_one(), page_vma_mapped_walk() races with other PTE
>    modifications preceded by pte clear. While iterating over PTEs of a large folio,
>    it only starts acquiring PTL from the first valid (present) PTE.
>    PTE modifications can temporarily set PTEs to pte_none.
>    Consequently, the initial PTEs of a large folio might be skipped
>    in try_to_unmap_one().
>    For example, for an anon folio, if we skip PTE0, we may have PTE0 which is
>    still present, while PTE1 ~ PTE(nr_pages - 1) are swap entries after
>    try_to_unmap_one().
>    So folio will be still mapped, the folio fails to be reclaimed and is put
>    back to LRU in this round.
>    This also breaks up PTEs optimization such as CONT-PTE on this large folio
>    and may lead to accident folio_split() afterwards.
>    And since a part of PTEs are now swap entries, accessing those parts will
>    introduce overhead - do_swap_page.
>    Although the kernel can withstand all of the above issues, the situation
>    still seems quite awkward and warrants making it more ideal.
>    The same race also occurs with small folios, but they have only one PTE,
>    thus, it won't be possible for them to be partially unmapped.
>    This patch [see below] holds PTL from PTE0, allowing us to avoid reading
>    PTE values that are in the process of being transformed. With stable PTE
>    values, we can ensure that this large folio is either completely reclaimed
>    or that all PTEs remain untouched in this round.
>    A corner case is that if we hold PTL from PTE0 and most initial PTEs have
>    been really unmapped before that, we may increase the duration of holding
>    PTL. Thus we only apply this optimization to folios which are still entirely
>    mapped (not in deferred_split list).
>
> [ Fix ]
>
>  * 73bc32875ee9 73bc32875ee9b1881dd780308c6793fe463fe803
>    "mm: hold PTL from the first PTE while reclaiming a large folio"
>
> [ Test Plan ]
>
>  * An IBM Power 10 system (where PowerVM is mandatory)
>    running Ubuntu Server 24.04 (kernel 6.8) or later
>    with (nested) KVM setup (so KVM on top of PowerVM).
>
>  * Run LTP test suite
>    Tests running: SLS(io,base)
>
>  * Without the patch the above test will hang with
>    Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab
>
> [ Where problems could occur ]
>
>  * This is a common code change in the memory management sub-system,
>    hence great care needs to be taken, even if it was discussed upfront
>    at the https://lore.kernel.org/ mailing list and the upstream commit
>    provenance shows that many eyes had a look at this.
>
>  * The modification is relatively small with just one if statement
>    (across two lines) in mm/vmscan.c.
>
>  * This change is to assist 'try_to_unmap' to acquire page table locks (PTL)
>    from the first page table entry (PTE) and to eliminate the influence of
>    temporary and volatile PTE values.
>
>  * If done wrong it can especially have a negative impact in case of large folios.
>    and wrong hints might be given to try_to_unmap
>    which may lead to bad page swapping.
>
>  * In case of an issue with this patch the result can also be decreased
>    performance and efficiency in the page table handling - the opposite
>    of what the patch is supposed to address.
>
>  * Fortunately several developers had their eyes on this commit,
>    as the provenance of the patch and the discussion at LKML shows.
>
>  * Further upstream conversation:
>    Link: https://lkml.kernel.org/r/20240306095219.71086-1-21cnbao@gmail.com
>
> [ Other Info ]
>
>  * The commit is upstream since v6.10(-rc1), hence it will be included
>    in oracular with the planned target kernel of 6.11.
>
>  * And since (nested) KVM virtualization on ppc64el was (re-)introduced
>    just with noble, no older Ubuntu releases older than noble are affected.
>
> Barry Song (1):
>   mm: hold PTL from the first PTE while reclaiming a large folio
>
>  mm/vmscan.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> -- 
> 2.34.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 873 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20240926/0c5248e9/attachment.sig>


More information about the kernel-team mailing list