APPLIED: [SRU X] [PATCH 0/2] Hard lockups due to unrestricted lapic timer delay

Khaled Elmously khalid.elmously at canonical.com
Wed Feb 27 22:35:46 UTC 2019


On 2019-02-27 16:19:48 , Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1817918
> 
> [Impact]
> 
> * There is a long-time report of an issue with the TSC delay present
> in wait_lapic_expire() - basically the guest could have an expiration
> timer configured in a way it induces host to wait a long time (with
> preemption disabled), so there's a potential scenario for host lockups.
> 
> * The stack trace we have access (from an user report of this issue)
> is (summarized) below:
> 
> NMI watchdog: Watchdog detected hard LOCKUP on cpu 16
> [...]
> CPU: 16 PID: 3024910 Comm: CPU 0/KVM Not tainted 4.4.0-139-generic #165-Ubuntu
> RIP: 0010:[<addr>] [<addr>] delay_tsc+0x20/0x60
> [...]
>  __delay+0x15/0x20
> wait_lapic_expire+0xc3/0x150 [kvm]
> vcpu_enter_guest+0x743/0x11d0 [kvm]
> kvm_arch_vcpu_ioctl_run+0xe6/0x410 [kvm]
> kvm_vcpu_ioctl+0x33d/0x620 [kvm]
> do_vfs_ioctl+0x2af/0x4b0
> ? __do_page_fault+0x1c1/0x410
> ? fire_user_return_notifiers+0x3e/0x50
> SyS_ioctl+0x79/0x90
> entry_SYSCALL_64_fastpath+0x22/0xc1
> 
> This matches the reported problem in the KVM mailing-list:
> https://marc.info/?l=kvm&m=146374488028339
> 
> * A fix was proposed in the above thread, but discarded in favor of the
> following approach: https://marc.info/?l=kvm&m=146647260109315
> The patch was merged in Linus tree, hence we hereby request the SRU:
> b606f189c7d5 ("KVM: LAPIC: cap __delay at lapic_timer_advance_ns").
> There's one additional patch needed, which is just the header adjustment
> for exporting a necessary function.
> 
> * The patch is missing only in 4.4 kernel series; Bionic (4.15) and the other
> * newer releases have the patch already.
> 
> [Test Case]
> 
> * Unfortunately this is a hard to reproduce issue; we have reports of
> this lockup from an user, hence the SRU request here.
> Also, the patch was introduced originally in kernel 4.7, approx. 2.5 years
> ago. So, we are confident that community is running this code long enough
> without errors reported. Also, checked in the Linus tree and no fixes
> for this code were introduced since kernel 4.7.
> 
> [Regression Potential]
> 
> * The code modification requested here affects the amount of delay in
> a specific timer; the patch introduces a maximum time for delay, preventing
> unbounded delays in host.
> The regression potential is considered low, and given the nature of the
> modification, latency issues in guests are likely to be the most problematic
> regression potential we have.
> 
> Marcelo Tosatti (2):
>   KVM: x86: move nsec_to_cycles from x86.c to x86.h
>   KVM: LAPIC: cap __delay at lapic_timer_advance_ns
> 
>  arch/x86/kvm/lapic.c | 3 ++-
>  arch/x86/kvm/x86.c   | 6 ------
>  arch/x86/kvm/x86.h   | 8 ++++++++
>  3 files changed, 10 insertions(+), 7 deletions(-)
> 
> -- 
> 2.20.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list