ACK/Cmnt: [SRU][J][I][F][PATCH v2 0/2] rcu stalls with many storage key guests (LP: 1975582)

Stefan Bader stefan.bader at canonical.com
Wed Jun 15 07:50:27 UTC 2022


On 10.06.22 14:55, frank.heimes at canonical.com wrote:
> BugLink: https://bugs.launchpad.net/bugs/1975582
> 
> SRU Justification:
> 
> [Impact]
> 
>   * Ubuntu on s390x KVM environments with lots of large guests with storage
>     keys can be affected by rcu stalls.
> 
>   * These rcu stalls can cause the system to crash/dump.
> 
> [Fix]
> 
>   * 3ae11dbcfac9 3ae11dbcfac906a8c3a480e98660a823130dc16a "s390/mm: use non-quiescing sske for KVM switch to keyed guest"
> 
>   * 6d5946274df1 6d5946274df1fff539a7eece458a43be733d1db8 "s390/gmap: voluntarily schedule during key setting"
> 
> [Test Plan]
> 
>   * There is no trigger or direct test or re-creation of the
>     problem situation possible, but...
> 
>   * and IBM z13 or LinuxONE (or never) LPAR is needed that
>     runs Ubuntu Server 20.04 LTS or 18.04 LTS with HWE kernel
>     and acts as KVM host with again several large guests running
>     on top with storage groups.
> 
>   * Let such a system running for days under significant load
>     and watch the logs for rcu issues.
> 
>   * Prior to the submission of this SRU patched test kernels
>     for focal 5.4 and bionic hwe-5.4 were created and tested.
>     They ran for days at a staging environemnt at IBM
>     without further issues.
> 
>   * The modifications are all limited to s390x.
> 
>   * A test kernel was build (see below) that ran in a test environment
>     at IBM under appropriate load for several days.
> 
> [Where problems could occur]
> 
>   * Due to the change for the KVM switch to keyed guest
>     from classic sske to non-quiescing sske
>     the KVM behaviour might have changed and the storage keys harmed.
> 
>   * The now more generous scheduling while setting keys
>     has an impact on the guest memory management and mapping
>     which will lead to a different performance.
> 
>   * This, with the introduction of __s390_enable_skey_pmd and
>     cond_resched, might increase the overhead in certain situations,
>     but eventually improves the responsiveness over time,
>     hence avoid rcu stalls.
> 
> [Other Info]
>   
>   * Since the patches are upstream in 5.19-rc1,
>     they will be included in the kernel that is planned for kinetic (5.19).
> 
>   * Hence this is an SRU to jammy, impish and focal.
> 
> v2: since this SRU is not only for J, but also for I and F
> 
> Christian Borntraeger (2):
>    s390/gmap: voluntarily schedule during key setting
>    s390/mm: use non-quiescing sske for KVM switch to keyed guest
> 
>   arch/s390/mm/gmap.c    | 14 ++++++++++++++
>   arch/s390/mm/pgtable.c |  2 +-
>   2 files changed, 15 insertions(+), 1 deletion(-)
> 

For Impish, there is a chance that this will not make it. There is only one 
cycle until EOL, so if this important it would be good if you explicitly 
mentioned this (as a reply here).

Acked-by: Stefan Bader <stefan.bader at canonical.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20220615/e529ceaa/attachment.sig>


More information about the kernel-team mailing list