1
0
Fork 0
mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-01-22 16:06:04 -05:00
linux/kernel/sched
Chengming Zhou 7d9da04057 psi: Fix race when task wakes up before psi_sched_switch() adjusts flags
When running hackbench in a cgroup with bandwidth throttling enabled,
following PSI splat was observed:

    psi: inconsistent task state! task=1831:hackbench cpu=8 psi_flags=14 clear=0 set=4

When investigating the series of events leading up to the splat,
following sequence was observed:

    [008] d..2.: sched_switch: ... ==> next_comm=hackbench next_pid=1831 next_prio=120
        ...
    [008] dN.2.: dequeue_entity(task delayed): task=hackbench pid=1831 cfs_rq->throttled=0
    [008] dN.2.: pick_task_fair: check_cfs_rq_runtime() throttled cfs_rq on CPU8
    # CPU8 goes into newidle balance and releases the rq lock
        ...
    # CPU15 on same LLC Domain is trying to wakeup hackbench(pid=1831)
    [015] d..4.: psi_flags_change: psi: task state: task=1831:hackbench cpu=8 psi_flags=14 clear=0 set=4 final=14 # Splat (cfs_rq->throttled=1)
    [015] d..4.: sched_wakeup: comm=hackbench pid=1831 prio=120 target_cpu=008 # Task has woken on a throttled hierarchy
    [008] d..2.: sched_switch: prev_comm=hackbench prev_pid=1831 prev_prio=120 prev_state=S ==> ...

psi_dequeue() relies on psi_sched_switch() to set the correct PSI flags
for the blocked entity, however, with the introduction of DELAY_DEQUEUE,
the block task can wakeup when newidle balance drops the runqueue lock
during __schedule().

If a task wakes before psi_sched_switch() adjusts the PSI flags, skip
any modifications in psi_enqueue() which would still see the flags of a
running task and not a blocked one. Instead, rely on psi_sched_switch()
to do the right thing.

Since the status returned by try_to_block_task() may no longer be true
by the time schedule reaches psi_sched_switch(), check if the task is
blocked or not using a combination of task_on_rq_queued() and
p->se.sched_delayed checks.

[ prateek: Commit message, testing, early bailout in psi_enqueue() ]

Fixes: 152e11f6df ("sched/fair: Implement delayed dequeue") # 1a6151017e
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Link: https://lore.kernel.org/r/20241227061941.2315-1-kprateek.nayak@amd.com
2025-01-13 14:10:26 +01:00
..
autogroup.c
autogroup.h
build_policy.c
build_utility.c
clock.c
completion.c
core.c psi: Fix race when task wakes up before psi_sched_switch() adjusts flags 2025-01-13 14:10:26 +01:00
core_sched.c
cpuacct.c
cpudeadline.c
cpudeadline.h
cpufreq.c
cpufreq_schedutil.c
cpupri.c
cpupri.h
cputime.c sched: Define sched_clock_irqtime as static key 2025-01-13 14:10:25 +01:00
deadline.c sched: deadline: Cleanup goto label in pick_earliest_pushable_dl_task 2024-12-10 15:07:06 +01:00
debug.c sched/debug: Change need_resched warnings to pr_err 2025-01-13 14:10:23 +01:00
ext.c
ext.h
fair.c sched/fair: Do not compute overloaded status unnecessarily during lb 2025-01-13 14:10:25 +01:00
features.h sched/fair: Untangle NEXT_BUDDY and pick_next_task() 2024-12-09 11:48:13 +01:00
idle.c
isolation.c sched/isolation: Consolidate housekeeping cpumasks that are always identical 2024-12-02 12:24:28 +01:00
loadavg.c
Makefile
membarrier.c
pelt.c sched/fair: Use the new cfs_rq.h_nr_runnable 2024-12-09 11:48:11 +01:00
pelt.h
psi.c sched, psi: Don't account irq time if sched_clock_irqtime is disabled 2025-01-13 14:10:26 +01:00
rt.c
sched-pelt.h
sched.h sched: Define sched_clock_irqtime as static key 2025-01-13 14:10:25 +01:00
smp.h
stats.c docs: Update Schedstat version to 17 2024-12-20 15:31:18 +01:00
stats.h psi: Fix race when task wakes up before psi_sched_switch() adjusts flags 2025-01-13 14:10:26 +01:00
stop_task.c
swait.c
syscalls.c sched/fair: Encapsulate set custom slice in a __setparam_fair() function 2025-01-13 14:10:22 +01:00
topology.c sched: Move sched domain name out of CONFIG_SCHED_DEBUG 2024-12-20 15:31:17 +01:00
wait.c
wait_bit.c