APPLIED: [SRU X] [PATCH 0/2] Hard lockups due to unrestricted lapic timer delay
Khaled Elmously
khalid.elmously at canonical.com
Wed Feb 27 22:35:46 UTC 2019
On 2019-02-27 16:19:48 , Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1817918
>
> [Impact]
>
> * There is a long-time report of an issue with the TSC delay present
> in wait_lapic_expire() - basically the guest could have an expiration
> timer configured in a way it induces host to wait a long time (with
> preemption disabled), so there's a potential scenario for host lockups.
>
> * The stack trace we have access (from an user report of this issue)
> is (summarized) below:
>
> NMI watchdog: Watchdog detected hard LOCKUP on cpu 16
> [...]
> CPU: 16 PID: 3024910 Comm: CPU 0/KVM Not tainted 4.4.0-139-generic #165-Ubuntu
> RIP: 0010:[<addr>] [<addr>] delay_tsc+0x20/0x60
> [...]
> __delay+0x15/0x20
> wait_lapic_expire+0xc3/0x150 [kvm]
> vcpu_enter_guest+0x743/0x11d0 [kvm]
> kvm_arch_vcpu_ioctl_run+0xe6/0x410 [kvm]
> kvm_vcpu_ioctl+0x33d/0x620 [kvm]
> do_vfs_ioctl+0x2af/0x4b0
> ? __do_page_fault+0x1c1/0x410
> ? fire_user_return_notifiers+0x3e/0x50
> SyS_ioctl+0x79/0x90
> entry_SYSCALL_64_fastpath+0x22/0xc1
>
> This matches the reported problem in the KVM mailing-list:
> https://marc.info/?l=kvm&m=146374488028339
>
> * A fix was proposed in the above thread, but discarded in favor of the
> following approach: https://marc.info/?l=kvm&m=146647260109315
> The patch was merged in Linus tree, hence we hereby request the SRU:
> b606f189c7d5 ("KVM: LAPIC: cap __delay at lapic_timer_advance_ns").
> There's one additional patch needed, which is just the header adjustment
> for exporting a necessary function.
>
> * The patch is missing only in 4.4 kernel series; Bionic (4.15) and the other
> * newer releases have the patch already.
>
> [Test Case]
>
> * Unfortunately this is a hard to reproduce issue; we have reports of
> this lockup from an user, hence the SRU request here.
> Also, the patch was introduced originally in kernel 4.7, approx. 2.5 years
> ago. So, we are confident that community is running this code long enough
> without errors reported. Also, checked in the Linus tree and no fixes
> for this code were introduced since kernel 4.7.
>
> [Regression Potential]
>
> * The code modification requested here affects the amount of delay in
> a specific timer; the patch introduces a maximum time for delay, preventing
> unbounded delays in host.
> The regression potential is considered low, and given the nature of the
> modification, latency issues in guests are likely to be the most problematic
> regression potential we have.
>
> Marcelo Tosatti (2):
> KVM: x86: move nsec_to_cycles from x86.c to x86.h
> KVM: LAPIC: cap __delay at lapic_timer_advance_ns
>
> arch/x86/kvm/lapic.c | 3 ++-
> arch/x86/kvm/x86.c | 6 ------
> arch/x86/kvm/x86.h | 8 ++++++++
> 3 files changed, 10 insertions(+), 7 deletions(-)
>
> --
> 2.20.1
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
More information about the kernel-team
mailing list