NACK/Cmnt: [SRU][J:linux-bluefield][PATCH] tick/rcu: Stop allowing RCU_SOFTIRQ in idle
Stefan Bader
stefan.bader at canonical.com
Wed Dec 7 08:40:02 UTC 2022
On 06.12.22 15:54, Bodong Wang wrote:
> 1. Only one patch in this thread, the commit msg is self explained. Why do we need a cover letter?
Because the same patch also is wanted for F and it is the same issue (bug report).
> 2. There is no conflict from J but only on F, are we OK to use the same patch?
So you would have
[SRU J,F:linux-bluefield][PATCH 0/1] ...
+- [SRU J:linux-bluefield][PATCH 1/1] ...
+- [SRU F:linux-bluefield][PATCH 1/1] ...
Then both patches can be reviewed in one run, also compare J and F versions more
easily and have them applied around the same time.
-Stefan
>
> -----Original Message-----
> From: Stefan Bader <stefan.bader at canonical.com>
> Sent: Tuesday, December 6, 2022 2:19 AM
> To: Bodong Wang <bodong at nvidia.com>; kernel-team at lists.ubuntu.com
> Cc: Vladimir Sokolovsky <vlad at nvidia.com>
> Subject: NACK/Cmnt: [SRU][J:linux-bluefield][PATCH] tick/rcu: Stop allowing RCU_SOFTIRQ in idle
>
> On 05.12.22 22:44, Bodong Wang wrote:
>> From: Frederic Weisbecker <frederic at kernel.org>
>>
>> BugLink: https://bugs.launchpad.net/bugs/1998857
>>
>> RCU_SOFTIRQ used to be special in that it could be raised on purpose
>> within the idle path to prevent from stopping the tick. Some code
>> still prevents from unnecessary warnings related to this specific
>> behaviour while entering in dynticks-idle mode.
>>
>> However the nohz layout has changed quite a bit in ten years, and the
>> removal of CONFIG_RCU_FAST_NO_HZ has been the final straw to this
>> safe-conduct. Now the RCU_SOFTIRQ vector is expected to be raised from
>> sane places.
>>
>> A remaining corner case is admitted though when the vector is invoked
>> in fragile hotplug path.
>>
>> Signed-off-by: Frederic Weisbecker <frederic at kernel.org>
>> Cc: Thomas Gleixner <tglx at linutronix.de>
>> Cc: Peter Zijlstra <peterz at infradead.org>
>> Cc: Paul E. McKenney <paulmck at kernel.org>
>> Cc: Paul Menzel <pmenzel at molgen.mpg.de> (cherry picked from commit
>> 0345691b24c076655ce8f0f4bfd24cba3467ccbd)
>> Signed-off-by: Bodong Wang <bodong at nvidia.com>
>>
>> ---
>
> This should be a thread with cover email and the J and F patches underneath.
>
> -Stefan
>
>> include/linux/interrupt.h | 8 ++++++-
>> kernel/time/tick-sched.c | 50 +++++++++++++++++++++++++++++++--------
>> 2 files changed, 47 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>> index 1f22a30c0963..3406da21e787 100644
>> --- a/include/linux/interrupt.h
>> +++ b/include/linux/interrupt.h
>> @@ -528,7 +528,13 @@ enum
>> NR_SOFTIRQS
>> };
>>
>> -#define SOFTIRQ_STOP_IDLE_MASK (~(1 << RCU_SOFTIRQ))
>> +/*
>> + * Ignoring the RCU vector after ksoftirqd is parked is fine
>> + * because:
>> + * 1) rcutree_migrate_callbacks() takes care of the queue.
>> + * 2) rcu_report_dead() reports the final quiescent states.
>> + */
>> +#define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(RCU_SOFTIRQ))
>>
>> /* map softirq index to softirq name. update 'softirq_to_name' in
>> * kernel/softirq.c when adding a new softirq.
>> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index
>> 9c6f661fb436..69ff819ddbbb 100644
>> --- a/kernel/time/tick-sched.c
>> +++ b/kernel/time/tick-sched.c
>> @@ -983,6 +983,45 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
>> __tick_nohz_full_update_tick(ts, ktime_get());
>> }
>>
>> +/*
>> + * A pending softirq outside an IRQ (or softirq disabled section)
>> +context
>> + * should be waiting for ksoftirqd to handle it. Therefore we
>> +shouldn't
>> + * reach here due to the need_resched() early check in can_stop_idle_tick().
>> + *
>> + * However if we are between CPUHP_AP_SMPBOOT_THREADS and
>> +CPU_TEARDOWN_CPU on the
>> + * cpu_down() process, softirqs can still be raised while ksoftirqd
>> +is parked,
>> + * triggering the below since wakep_softirqd() is ignored.
>> + *
>> + */
>> +static bool report_idle_softirq(void) {
>> + static int ratelimit;
>> + unsigned int pending = local_softirq_pending();
>> +
>> + if (likely(!pending))
>> + return false;
>> +
>> + /* Some softirqs claim to be safe against hotplug and ksoftirqd parking */
>> + if (!cpu_active(smp_processor_id())) {
>> + pending &= ~SOFTIRQ_HOTPLUG_SAFE_MASK;
>> + if (!pending)
>> + return false;
>> + }
>> +
>> + if (ratelimit < 10)
>> + return false;
>> +
>> + /* On RT, softirqs handling may be waiting on some lock */
>> + if (!local_bh_blocked())
>> + return false;
>> +
>> + pr_warn("NOHZ tick-stop error: local softirq work is pending, handler #%02x!!!\n",
>> + pending);
>> + ratelimit++;
>> +
>> + return true;
>> +}
>> +
>> static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>> {
>> /*
>> @@ -1009,17 +1048,8 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>> if (need_resched())
>> return false;
>>
>> - if (unlikely(local_softirq_pending())) {
>> - static int ratelimit;
>> -
>> - if (ratelimit < 10 && !local_bh_blocked() &&
>> - (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
>> - pr_warn("NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #%02x!!!\n",
>> - (unsigned int) local_softirq_pending());
>> - ratelimit++;
>> - }
>> + if (unlikely(report_idle_softirq()))
>> return false;
>> - }
>>
>> if (tick_nohz_full_enabled()) {
>> /*
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20221207/a104807a/attachment-0001.sig>
More information about the kernel-team
mailing list