1
0
Fork 0
mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-01-22 16:06:04 -05:00

sched_ext: Change for v6.13

- Improve the default select_cpu() implementation making it topology aware
   and handle WAKE_SYNC better.
 
 - set_arg_maybe_null() was used to inform the verifier which ops args could
   be NULL in a rather hackish way. Use the new __nullable CFI stub tags
   instead.
 
 - On Sapphire Rapids multi-socket systems, a BPF scheduler, by hammering on
   the same queue across sockets, could live-lock the system to the point
   where the system couldn't make reasonable forward progress. This could
   lead to soft-lockup triggered resets or stalling out bypass mode switch
   and thus BPF scheduler ejection for tens of minutes if not hours. After
   trying a number of mitigations, the following set worked reliably:
 
   - Injecting artificial cpu_relax() loops in two places while sched_ext is
     trying to turn on the bypass mode.
 
   - Triggering scheduler ejection when soft-lockup detection is imminent (a
     quarter of threshold left).
 
   While not the prettiest, the impact both in terms of code complexity and
   overhead is minimal.
 
 - A common complaint on the API is the overuse of the word "dispatch" and
   the confusion around "consume". This is due to how the dispatch queues
   became more generic over time. Rename the affected kfuncs for clarity.
   Thanks to BPF's compatibility features, this change can be made in a way
   that's both forward and backward compatible. The compatibility code will
   be dropped in a few releases.
 
 - Pull sched_ext/for-6.12-fixes to receive a prerequisite change. Other misc
   changes.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZztuXA4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGePUAP4nFTDaUDngVlxGv5hpYz8/Gcv1bPsWEydRRmH/
 3F+pNgEAmGIGAEwFYfc9Zn8Kbjf0eJAduf2RhGRatQO6F/+GSwo=
 =AcyC
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext updates from Tejun Heo:

 - Improve the default select_cpu() implementation making it topology
   aware and handle WAKE_SYNC better.

 - set_arg_maybe_null() was used to inform the verifier which ops args
   could be NULL in a rather hackish way. Use the new __nullable CFI
   stub tags instead.

 - On Sapphire Rapids multi-socket systems, a BPF scheduler, by
   hammering on the same queue across sockets, could live-lock the
   system to the point where the system couldn't make reasonable forward
   progress.

   This could lead to soft-lockup triggered resets or stalling out
   bypass mode switch and thus BPF scheduler ejection for tens of
   minutes if not hours. After trying a number of mitigations, the
   following set worked reliably:

     - Injecting artificial cpu_relax() loops in two places while
       sched_ext is trying to turn on the bypass mode.

     - Triggering scheduler ejection when soft-lockup detection is
       imminent (a quarter of threshold left).

   While not the prettiest, the impact both in terms of code complexity
   and overhead is minimal.

 - A common complaint on the API is the overuse of the word "dispatch"
   and the confusion around "consume". This is due to how the dispatch
   queues became more generic over time. Rename the affected kfuncs for
   clarity. Thanks to BPF's compatibility features, this change can be
   made in a way that's both forward and backward compatible. The
   compatibility code will be dropped in a few releases.

 - Other misc changes

* tag 'sched_ext-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (21 commits)
  sched_ext: Replace scx_next_task_picked() with switch_class() in comment
  sched_ext: Rename scx_bpf_dispatch[_vtime]_from_dsq*() -> scx_bpf_dsq_move[_vtime]*()
  sched_ext: Rename scx_bpf_consume() to scx_bpf_dsq_move_to_local()
  sched_ext: Rename scx_bpf_dispatch[_vtime]() to scx_bpf_dsq_insert[_vtime]()
  sched_ext: scx_bpf_dispatch_from_dsq_set_*() are allowed from unlocked context
  sched_ext: add a missing rcu_read_lock/unlock pair at scx_select_cpu_dfl()
  sched_ext: Clarify sched_ext_ops table for userland scheduler
  sched_ext: Enable the ops breather and eject BPF scheduler on softlockup
  sched_ext: Avoid live-locking bypass mode switching
  sched_ext: Fix incorrect use of bitwise AND
  sched_ext: Do not enable LLC/NUMA optimizations when domains overlap
  sched_ext: Introduce NUMA awareness to the default idle selection policy
  sched_ext: Replace set_arg_maybe_null() with __nullable CFI stub tags
  sched_ext: Rename CFI stubs to names that are recognized by BPF
  sched_ext: Introduce LLC awareness to the default idle selection policy
  sched_ext: Clarify ops.select_cpu() for single-CPU tasks
  sched_ext: improve WAKE_SYNC behavior for default idle CPU selection
  sched_ext: Use btf_ids to resolve task_struct
  sched/ext: Use tg_cgroup() to elieminate duplicate code
  sched/ext: Fix unmatch trailing comment of CONFIG_EXT_GROUP_SCHED
  ...
This commit is contained in:
Linus Torvalds 2024-11-20 10:08:00 -08:00
commit 8f7c8b88bd
11 changed files with 877 additions and 393 deletions

View file

@ -130,7 +130,7 @@ optional. The following modified excerpt is from
* Decide which CPU a task should be migrated to before being
* enqueued (either at wakeup, fork time, or exec time). If an
* idle core is found by the default ops.select_cpu() implementation,
* then dispatch the task directly to SCX_DSQ_LOCAL and skip the
* then insert the task directly into SCX_DSQ_LOCAL and skip the
* ops.enqueue() callback.
*
* Note that this implementation has exactly the same behavior as the
@ -148,15 +148,15 @@ optional. The following modified excerpt is from
cpu = scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, &direct);
if (direct)
scx_bpf_dispatch(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
return cpu;
}
/*
* Do a direct dispatch of a task to the global DSQ. This ops.enqueue()
* callback will only be invoked if we failed to find a core to dispatch
* to in ops.select_cpu() above.
* Do a direct insertion of a task to the global DSQ. This ops.enqueue()
* callback will only be invoked if we failed to find a core to insert
* into in ops.select_cpu() above.
*
* Note that this implementation has exactly the same behavior as the
* default ops.enqueue implementation, which just dispatches the task
@ -166,7 +166,7 @@ optional. The following modified excerpt is from
*/
void BPF_STRUCT_OPS(simple_enqueue, struct task_struct *p, u64 enq_flags)
{
scx_bpf_dispatch(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
}
s32 BPF_STRUCT_OPS_SLEEPABLE(simple_init)
@ -202,14 +202,13 @@ and one local dsq per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage
an arbitrary number of dsq's using ``scx_bpf_create_dsq()`` and
``scx_bpf_destroy_dsq()``.
A CPU always executes a task from its local DSQ. A task is "dispatched" to a
DSQ. A non-local DSQ is "consumed" to transfer a task to the consuming CPU's
local DSQ.
A CPU always executes a task from its local DSQ. A task is "inserted" into a
DSQ. A task in a non-local DSQ is "move"d into the target CPU's local DSQ.
When a CPU is looking for the next task to run, if the local DSQ is not
empty, the first task is picked. Otherwise, the CPU tries to consume the
global DSQ. If that doesn't yield a runnable task either, ``ops.dispatch()``
is invoked.
empty, the first task is picked. Otherwise, the CPU tries to move a task
from the global DSQ. If that doesn't yield a runnable task either,
``ops.dispatch()`` is invoked.
Scheduling Cycle
----------------
@ -229,26 +228,26 @@ The following briefly shows how a waking task is scheduled and executed.
scheduler can wake up any cpu using the ``scx_bpf_kick_cpu()`` helper,
using ``ops.select_cpu()`` judiciously can be simpler and more efficient.
A task can be immediately dispatched to a DSQ from ``ops.select_cpu()`` by
calling ``scx_bpf_dispatch()``. If the task is dispatched to
``SCX_DSQ_LOCAL`` from ``ops.select_cpu()``, it will be dispatched to the
A task can be immediately inserted into a DSQ from ``ops.select_cpu()``
by calling ``scx_bpf_dsq_insert()``. If the task is inserted into
``SCX_DSQ_LOCAL`` from ``ops.select_cpu()``, it will be inserted into the
local DSQ of whichever CPU is returned from ``ops.select_cpu()``.
Additionally, dispatching directly from ``ops.select_cpu()`` will cause the
Additionally, inserting directly from ``ops.select_cpu()`` will cause the
``ops.enqueue()`` callback to be skipped.
Note that the scheduler core will ignore an invalid CPU selection, for
example, if it's outside the allowed cpumask of the task.
2. Once the target CPU is selected, ``ops.enqueue()`` is invoked (unless the
task was dispatched directly from ``ops.select_cpu()``). ``ops.enqueue()``
task was inserted directly from ``ops.select_cpu()``). ``ops.enqueue()``
can make one of the following decisions:
* Immediately dispatch the task to either the global or local DSQ by
calling ``scx_bpf_dispatch()`` with ``SCX_DSQ_GLOBAL`` or
* Immediately insert the task into either the global or local DSQ by
calling ``scx_bpf_dsq_insert()`` with ``SCX_DSQ_GLOBAL`` or
``SCX_DSQ_LOCAL``, respectively.
* Immediately dispatch the task to a custom DSQ by calling
``scx_bpf_dispatch()`` with a DSQ ID which is smaller than 2^63.
* Immediately insert the task into a custom DSQ by calling
``scx_bpf_dsq_insert()`` with a DSQ ID which is smaller than 2^63.
* Queue the task on the BPF side.
@ -257,23 +256,23 @@ The following briefly shows how a waking task is scheduled and executed.
run, ``ops.dispatch()`` is invoked which can use the following two
functions to populate the local DSQ.
* ``scx_bpf_dispatch()`` dispatches a task to a DSQ. Any target DSQ can
be used - ``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``,
``SCX_DSQ_GLOBAL`` or a custom DSQ. While ``scx_bpf_dispatch()``
* ``scx_bpf_dsq_insert()`` inserts a task to a DSQ. Any target DSQ can be
used - ``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``,
``SCX_DSQ_GLOBAL`` or a custom DSQ. While ``scx_bpf_dsq_insert()``
currently can't be called with BPF locks held, this is being worked on
and will be supported. ``scx_bpf_dispatch()`` schedules dispatching
and will be supported. ``scx_bpf_dsq_insert()`` schedules insertion
rather than performing them immediately. There can be up to
``ops.dispatch_max_batch`` pending tasks.
* ``scx_bpf_consume()`` tranfers a task from the specified non-local DSQ
to the dispatching DSQ. This function cannot be called with any BPF
locks held. ``scx_bpf_consume()`` flushes the pending dispatched tasks
before trying to consume the specified DSQ.
* ``scx_bpf_move_to_local()`` moves a task from the specified non-local
DSQ to the dispatching DSQ. This function cannot be called with any BPF
locks held. ``scx_bpf_move_to_local()`` flushes the pending insertions
tasks before trying to move from the specified DSQ.
4. After ``ops.dispatch()`` returns, if there are tasks in the local DSQ,
the CPU runs the first one. If empty, the following steps are taken:
* Try to consume the global DSQ. If successful, run the task.
* Try to move from the global DSQ. If successful, run the task.
* If ``ops.dispatch()`` has dispatched any tasks, retry #3.
@ -286,14 +285,14 @@ Note that the BPF scheduler can always choose to dispatch tasks immediately
in ``ops.enqueue()`` as illustrated in the above simple example. If only the
built-in DSQs are used, there is no need to implement ``ops.dispatch()`` as
a task is never queued on the BPF scheduler and both the local and global
DSQs are consumed automatically.
DSQs are executed automatically.
``scx_bpf_dispatch()`` queues the task on the FIFO of the target DSQ. Use
``scx_bpf_dispatch_vtime()`` for the priority queue. Internal DSQs such as
``scx_bpf_dsq_insert()`` inserts the task on the FIFO of the target DSQ. Use
``scx_bpf_dsq_insert_vtime()`` for the priority queue. Internal DSQs such as
``SCX_DSQ_LOCAL`` and ``SCX_DSQ_GLOBAL`` do not support priority-queue
dispatching, and must be dispatched to with ``scx_bpf_dispatch()``. See the
function documentation and usage in ``tools/sched_ext/scx_simple.bpf.c`` for
more information.
dispatching, and must be dispatched to with ``scx_bpf_dsq_insert()``. See
the function documentation and usage in ``tools/sched_ext/scx_simple.bpf.c``
for more information.
Where to Look
=============

View file

@ -204,11 +204,13 @@ struct sched_ext_entity {
void sched_ext_free(struct task_struct *p);
void print_scx_info(const char *log_lvl, struct task_struct *p);
void scx_softlockup(u32 dur_s);
#else /* !CONFIG_SCHED_CLASS_EXT */
static inline void sched_ext_free(struct task_struct *p) {}
static inline void print_scx_info(const char *log_lvl, struct task_struct *p) {}
static inline void scx_softlockup(u32 dur_s) {}
#endif /* CONFIG_SCHED_CLASS_EXT */
#endif /* _LINUX_SCHED_EXT_H */

File diff suppressed because it is too large Load diff

View file

@ -644,6 +644,14 @@ static int is_softlockup(unsigned long touch_ts,
need_counting_irqs())
start_counting_irqs();
/*
* A poorly behaving BPF scheduler can live-lock the system into
* soft lockups. Tell sched_ext to try ejecting the BPF
* scheduler when close to a soft lockup.
*/
if (time_after_eq(now, period_ts + get_softlockup_thresh() * 3 / 4))
scx_softlockup(now - touch_ts);
/* Warn about unreasonable delays. */
if (time_after(now, period_ts + get_softlockup_thresh()))
return now - touch_ts;

View file

@ -36,15 +36,15 @@ static inline void ___vmlinux_h_sanity_check___(void)
s32 scx_bpf_create_dsq(u64 dsq_id, s32 node) __ksym;
s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, bool *is_idle) __ksym;
void scx_bpf_dispatch(struct task_struct *p, u64 dsq_id, u64 slice, u64 enq_flags) __ksym;
void scx_bpf_dispatch_vtime(struct task_struct *p, u64 dsq_id, u64 slice, u64 vtime, u64 enq_flags) __ksym;
void scx_bpf_dsq_insert(struct task_struct *p, u64 dsq_id, u64 slice, u64 enq_flags) __ksym __weak;
void scx_bpf_dsq_insert_vtime(struct task_struct *p, u64 dsq_id, u64 slice, u64 vtime, u64 enq_flags) __ksym __weak;
u32 scx_bpf_dispatch_nr_slots(void) __ksym;
void scx_bpf_dispatch_cancel(void) __ksym;
bool scx_bpf_consume(u64 dsq_id) __ksym;
void scx_bpf_dispatch_from_dsq_set_slice(struct bpf_iter_scx_dsq *it__iter, u64 slice) __ksym __weak;
void scx_bpf_dispatch_from_dsq_set_vtime(struct bpf_iter_scx_dsq *it__iter, u64 vtime) __ksym __weak;
bool scx_bpf_dispatch_from_dsq(struct bpf_iter_scx_dsq *it__iter, struct task_struct *p, u64 dsq_id, u64 enq_flags) __ksym __weak;
bool scx_bpf_dispatch_vtime_from_dsq(struct bpf_iter_scx_dsq *it__iter, struct task_struct *p, u64 dsq_id, u64 enq_flags) __ksym __weak;
bool scx_bpf_dsq_move_to_local(u64 dsq_id) __ksym;
void scx_bpf_dsq_move_set_slice(struct bpf_iter_scx_dsq *it__iter, u64 slice) __ksym;
void scx_bpf_dsq_move_set_vtime(struct bpf_iter_scx_dsq *it__iter, u64 vtime) __ksym;
bool scx_bpf_dsq_move(struct bpf_iter_scx_dsq *it__iter, struct task_struct *p, u64 dsq_id, u64 enq_flags) __ksym __weak;
bool scx_bpf_dsq_move_vtime(struct bpf_iter_scx_dsq *it__iter, struct task_struct *p, u64 dsq_id, u64 enq_flags) __ksym __weak;
u32 scx_bpf_reenqueue_local(void) __ksym;
void scx_bpf_kick_cpu(s32 cpu, u64 flags) __ksym;
s32 scx_bpf_dsq_nr_queued(u64 dsq_id) __ksym;
@ -74,8 +74,8 @@ struct rq *scx_bpf_cpu_rq(s32 cpu) __ksym;
struct cgroup *scx_bpf_task_cgroup(struct task_struct *p) __ksym __weak;
/*
* Use the following as @it__iter when calling
* scx_bpf_dispatch[_vtime]_from_dsq() from within bpf_for_each() loops.
* Use the following as @it__iter when calling scx_bpf_dsq_move[_vtime]() from
* within bpf_for_each() loops.
*/
#define BPF_FOR_EACH_ITER (&___it)

View file

@ -20,19 +20,110 @@
(bpf_ksym_exists(scx_bpf_task_cgroup) ? \
scx_bpf_task_cgroup((p)) : NULL)
/* v6.12: 4c30f5ce4f7a ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()") */
#define __COMPAT_scx_bpf_dispatch_from_dsq_set_slice(it, slice) \
(bpf_ksym_exists(scx_bpf_dispatch_from_dsq_set_slice) ? \
scx_bpf_dispatch_from_dsq_set_slice((it), (slice)) : (void)0)
#define __COMPAT_scx_bpf_dispatch_from_dsq_set_vtime(it, vtime) \
(bpf_ksym_exists(scx_bpf_dispatch_from_dsq_set_vtime) ? \
scx_bpf_dispatch_from_dsq_set_vtime((it), (vtime)) : (void)0)
#define __COMPAT_scx_bpf_dispatch_from_dsq(it, p, dsq_id, enq_flags) \
(bpf_ksym_exists(scx_bpf_dispatch_from_dsq) ? \
scx_bpf_dispatch_from_dsq((it), (p), (dsq_id), (enq_flags)) : false)
#define __COMPAT_scx_bpf_dispatch_vtime_from_dsq(it, p, dsq_id, enq_flags) \
(bpf_ksym_exists(scx_bpf_dispatch_vtime_from_dsq) ? \
scx_bpf_dispatch_vtime_from_dsq((it), (p), (dsq_id), (enq_flags)) : false)
/*
* v6.13: The verb `dispatch` was too overloaded and confusing. kfuncs are
* renamed to unload the verb.
*
* Build error is triggered if old names are used. New binaries work with both
* new and old names. The compat macros will be removed on v6.15 release.
*
* scx_bpf_dispatch_from_dsq() and friends were added during v6.12 by
* 4c30f5ce4f7a ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()").
* Preserve __COMPAT macros until v6.15.
*/
void scx_bpf_dispatch___compat(struct task_struct *p, u64 dsq_id, u64 slice, u64 enq_flags) __ksym __weak;
void scx_bpf_dispatch_vtime___compat(struct task_struct *p, u64 dsq_id, u64 slice, u64 vtime, u64 enq_flags) __ksym __weak;
bool scx_bpf_consume___compat(u64 dsq_id) __ksym __weak;
void scx_bpf_dispatch_from_dsq_set_slice___compat(struct bpf_iter_scx_dsq *it__iter, u64 slice) __ksym __weak;
void scx_bpf_dispatch_from_dsq_set_vtime___compat(struct bpf_iter_scx_dsq *it__iter, u64 vtime) __ksym __weak;
bool scx_bpf_dispatch_from_dsq___compat(struct bpf_iter_scx_dsq *it__iter, struct task_struct *p, u64 dsq_id, u64 enq_flags) __ksym __weak;
bool scx_bpf_dispatch_vtime_from_dsq___compat(struct bpf_iter_scx_dsq *it__iter, struct task_struct *p, u64 dsq_id, u64 enq_flags) __ksym __weak;
#define scx_bpf_dsq_insert(p, dsq_id, slice, enq_flags) \
(bpf_ksym_exists(scx_bpf_dsq_insert) ? \
scx_bpf_dsq_insert((p), (dsq_id), (slice), (enq_flags)) : \
scx_bpf_dispatch___compat((p), (dsq_id), (slice), (enq_flags)))
#define scx_bpf_dsq_insert_vtime(p, dsq_id, slice, vtime, enq_flags) \
(bpf_ksym_exists(scx_bpf_dsq_insert_vtime) ? \
scx_bpf_dsq_insert_vtime((p), (dsq_id), (slice), (vtime), (enq_flags)) : \
scx_bpf_dispatch_vtime___compat((p), (dsq_id), (slice), (vtime), (enq_flags)))
#define scx_bpf_dsq_move_to_local(dsq_id) \
(bpf_ksym_exists(scx_bpf_dsq_move_to_local) ? \
scx_bpf_dsq_move_to_local((dsq_id)) : \
scx_bpf_consume___compat((dsq_id)))
#define __COMPAT_scx_bpf_dsq_move_set_slice(it__iter, slice) \
(bpf_ksym_exists(scx_bpf_dsq_move_set_slice) ? \
scx_bpf_dsq_move_set_slice((it__iter), (slice)) : \
(bpf_ksym_exists(scx_bpf_dispatch_from_dsq_set_slice___compat) ? \
scx_bpf_dispatch_from_dsq_set_slice___compat((it__iter), (slice)) : \
(void)0))
#define __COMPAT_scx_bpf_dsq_move_set_vtime(it__iter, vtime) \
(bpf_ksym_exists(scx_bpf_dsq_move_set_vtime) ? \
scx_bpf_dsq_move_set_vtime((it__iter), (vtime)) : \
(bpf_ksym_exists(scx_bpf_dispatch_from_dsq_set_vtime___compat) ? \
scx_bpf_dispatch_from_dsq_set_vtime___compat((it__iter), (vtime)) : \
(void) 0))
#define __COMPAT_scx_bpf_dsq_move(it__iter, p, dsq_id, enq_flags) \
(bpf_ksym_exists(scx_bpf_dsq_move) ? \
scx_bpf_dsq_move((it__iter), (p), (dsq_id), (enq_flags)) : \
(bpf_ksym_exists(scx_bpf_dispatch_from_dsq___compat) ? \
scx_bpf_dispatch_from_dsq___compat((it__iter), (p), (dsq_id), (enq_flags)) : \
false))
#define __COMPAT_scx_bpf_dsq_move_vtime(it__iter, p, dsq_id, enq_flags) \
(bpf_ksym_exists(scx_bpf_dsq_move_vtime) ? \
scx_bpf_dsq_move_vtime((it__iter), (p), (dsq_id), (enq_flags)) : \
(bpf_ksym_exists(scx_bpf_dispatch_vtime_from_dsq___compat) ? \
scx_bpf_dispatch_vtime_from_dsq___compat((it__iter), (p), (dsq_id), (enq_flags)) : \
false))
#define scx_bpf_dispatch(p, dsq_id, slice, enq_flags) \
_Static_assert(false, "scx_bpf_dispatch() renamed to scx_bpf_dsq_insert()")
#define scx_bpf_dispatch_vtime(p, dsq_id, slice, vtime, enq_flags) \
_Static_assert(false, "scx_bpf_dispatch_vtime() renamed to scx_bpf_dsq_insert_vtime()")
#define scx_bpf_consume(dsq_id) ({ \
_Static_assert(false, "scx_bpf_consume() renamed to scx_bpf_dsq_move_to_local()"); \
false; \
})
#define scx_bpf_dispatch_from_dsq_set_slice(it__iter, slice) \
_Static_assert(false, "scx_bpf_dispatch_from_dsq_set_slice() renamed to scx_bpf_dsq_move_set_slice()")
#define scx_bpf_dispatch_from_dsq_set_vtime(it__iter, vtime) \
_Static_assert(false, "scx_bpf_dispatch_from_dsq_set_vtime() renamed to scx_bpf_dsq_move_set_vtime()")
#define scx_bpf_dispatch_from_dsq(it__iter, p, dsq_id, enq_flags) ({ \
_Static_assert(false, "scx_bpf_dispatch_from_dsq() renamed to scx_bpf_dsq_move()"); \
false; \
})
#define scx_bpf_dispatch_vtime_from_dsq(it__iter, p, dsq_id, enq_flags) ({ \
_Static_assert(false, "scx_bpf_dispatch_vtime_from_dsq() renamed to scx_bpf_dsq_move_vtime()"); \
false; \
})
#define __COMPAT_scx_bpf_dispatch_from_dsq_set_slice(it__iter, slice) \
_Static_assert(false, "__COMPAT_scx_bpf_dispatch_from_dsq_set_slice() renamed to __COMPAT_scx_bpf_dsq_move_set_slice()")
#define __COMPAT_scx_bpf_dispatch_from_dsq_set_vtime(it__iter, vtime) \
_Static_assert(false, "__COMPAT_scx_bpf_dispatch_from_dsq_set_vtime() renamed to __COMPAT_scx_bpf_dsq_move_set_vtime()")
#define __COMPAT_scx_bpf_dispatch_from_dsq(it__iter, p, dsq_id, enq_flags) ({ \
_Static_assert(false, "__COMPAT_scx_bpf_dispatch_from_dsq() renamed to __COMPAT_scx_bpf_dsq_move()"); \
false; \
})
#define __COMPAT_scx_bpf_dispatch_vtime_from_dsq(it__iter, p, dsq_id, enq_flags) ({ \
_Static_assert(false, "__COMPAT_scx_bpf_dispatch_vtime_from_dsq() renamed to __COMPAT_scx_bpf_dsq_move_vtime()"); \
false; \
})
/*
* Define sched_ext_ops. This may be expanded to define multiple variants for

View file

@ -118,14 +118,14 @@ void BPF_STRUCT_OPS(central_enqueue, struct task_struct *p, u64 enq_flags)
*/
if ((p->flags & PF_KTHREAD) && p->nr_cpus_allowed == 1) {
__sync_fetch_and_add(&nr_locals, 1);
scx_bpf_dispatch(p, SCX_DSQ_LOCAL, SCX_SLICE_INF,
enq_flags | SCX_ENQ_PREEMPT);
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_INF,
enq_flags | SCX_ENQ_PREEMPT);
return;
}
if (bpf_map_push_elem(&central_q, &pid, 0)) {
__sync_fetch_and_add(&nr_overflows, 1);
scx_bpf_dispatch(p, FALLBACK_DSQ_ID, SCX_SLICE_INF, enq_flags);
scx_bpf_dsq_insert(p, FALLBACK_DSQ_ID, SCX_SLICE_INF, enq_flags);
return;
}
@ -158,7 +158,7 @@ static bool dispatch_to_cpu(s32 cpu)
*/
if (!bpf_cpumask_test_cpu(cpu, p->cpus_ptr)) {
__sync_fetch_and_add(&nr_mismatches, 1);
scx_bpf_dispatch(p, FALLBACK_DSQ_ID, SCX_SLICE_INF, 0);
scx_bpf_dsq_insert(p, FALLBACK_DSQ_ID, SCX_SLICE_INF, 0);
bpf_task_release(p);
/*
* We might run out of dispatch buffer slots if we continue dispatching
@ -172,7 +172,7 @@ static bool dispatch_to_cpu(s32 cpu)
}
/* dispatch to local and mark that @cpu doesn't need more */
scx_bpf_dispatch(p, SCX_DSQ_LOCAL_ON | cpu, SCX_SLICE_INF, 0);
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL_ON | cpu, SCX_SLICE_INF, 0);
if (cpu != central_cpu)
scx_bpf_kick_cpu(cpu, SCX_KICK_IDLE);
@ -219,13 +219,13 @@ void BPF_STRUCT_OPS(central_dispatch, s32 cpu, struct task_struct *prev)
}
/* look for a task to run on the central CPU */
if (scx_bpf_consume(FALLBACK_DSQ_ID))
if (scx_bpf_dsq_move_to_local(FALLBACK_DSQ_ID))
return;
dispatch_to_cpu(central_cpu);
} else {
bool *gimme;
if (scx_bpf_consume(FALLBACK_DSQ_ID))
if (scx_bpf_dsq_move_to_local(FALLBACK_DSQ_ID))
return;
gimme = ARRAY_ELEM_PTR(cpu_gimme_task, cpu, nr_cpu_ids);

View file

@ -341,7 +341,7 @@ s32 BPF_STRUCT_OPS(fcg_select_cpu, struct task_struct *p, s32 prev_cpu, u64 wake
if (is_idle) {
set_bypassed_at(p, taskc);
stat_inc(FCG_STAT_LOCAL);
scx_bpf_dispatch(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
}
return cpu;
@ -377,10 +377,12 @@ void BPF_STRUCT_OPS(fcg_enqueue, struct task_struct *p, u64 enq_flags)
*/
if (p->nr_cpus_allowed == 1 && (p->flags & PF_KTHREAD)) {
stat_inc(FCG_STAT_LOCAL);
scx_bpf_dispatch(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, enq_flags);
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL,
enq_flags);
} else {
stat_inc(FCG_STAT_GLOBAL);
scx_bpf_dispatch(p, FALLBACK_DSQ, SCX_SLICE_DFL, enq_flags);
scx_bpf_dsq_insert(p, FALLBACK_DSQ, SCX_SLICE_DFL,
enq_flags);
}
return;
}
@ -391,7 +393,7 @@ void BPF_STRUCT_OPS(fcg_enqueue, struct task_struct *p, u64 enq_flags)
goto out_release;
if (fifo_sched) {
scx_bpf_dispatch(p, cgrp->kn->id, SCX_SLICE_DFL, enq_flags);
scx_bpf_dsq_insert(p, cgrp->kn->id, SCX_SLICE_DFL, enq_flags);
} else {
u64 tvtime = p->scx.dsq_vtime;
@ -402,8 +404,8 @@ void BPF_STRUCT_OPS(fcg_enqueue, struct task_struct *p, u64 enq_flags)
if (vtime_before(tvtime, cgc->tvtime_now - SCX_SLICE_DFL))
tvtime = cgc->tvtime_now - SCX_SLICE_DFL;
scx_bpf_dispatch_vtime(p, cgrp->kn->id, SCX_SLICE_DFL,
tvtime, enq_flags);
scx_bpf_dsq_insert_vtime(p, cgrp->kn->id, SCX_SLICE_DFL,
tvtime, enq_flags);
}
cgrp_enqueued(cgrp, cgc);
@ -663,7 +665,7 @@ static bool try_pick_next_cgroup(u64 *cgidp)
goto out_free;
}
if (!scx_bpf_consume(cgid)) {
if (!scx_bpf_dsq_move_to_local(cgid)) {
bpf_cgroup_release(cgrp);
stat_inc(FCG_STAT_PNC_EMPTY);
goto out_stash;
@ -743,7 +745,7 @@ void BPF_STRUCT_OPS(fcg_dispatch, s32 cpu, struct task_struct *prev)
goto pick_next_cgroup;
if (vtime_before(now, cpuc->cur_at + cgrp_slice_ns)) {
if (scx_bpf_consume(cpuc->cur_cgid)) {
if (scx_bpf_dsq_move_to_local(cpuc->cur_cgid)) {
stat_inc(FCG_STAT_CNS_KEEP);
return;
}
@ -783,7 +785,7 @@ void BPF_STRUCT_OPS(fcg_dispatch, s32 cpu, struct task_struct *prev)
pick_next_cgroup:
cpuc->cur_at = now;
if (scx_bpf_consume(FALLBACK_DSQ)) {
if (scx_bpf_dsq_move_to_local(FALLBACK_DSQ)) {
cpuc->cur_cgid = 0;
return;
}

View file

@ -226,7 +226,7 @@ void BPF_STRUCT_OPS(qmap_enqueue, struct task_struct *p, u64 enq_flags)
*/
if (tctx->force_local) {
tctx->force_local = false;
scx_bpf_dispatch(p, SCX_DSQ_LOCAL, slice_ns, enq_flags);
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, slice_ns, enq_flags);
return;
}
@ -234,7 +234,7 @@ void BPF_STRUCT_OPS(qmap_enqueue, struct task_struct *p, u64 enq_flags)
if (!(enq_flags & SCX_ENQ_CPU_SELECTED) &&
(cpu = pick_direct_dispatch_cpu(p, scx_bpf_task_cpu(p))) >= 0) {
__sync_fetch_and_add(&nr_ddsp_from_enq, 1);
scx_bpf_dispatch(p, SCX_DSQ_LOCAL_ON | cpu, slice_ns, enq_flags);
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL_ON | cpu, slice_ns, enq_flags);
return;
}
@ -247,7 +247,7 @@ void BPF_STRUCT_OPS(qmap_enqueue, struct task_struct *p, u64 enq_flags)
if (enq_flags & SCX_ENQ_REENQ) {
s32 cpu;
scx_bpf_dispatch(p, SHARED_DSQ, 0, enq_flags);
scx_bpf_dsq_insert(p, SHARED_DSQ, 0, enq_flags);
cpu = scx_bpf_pick_idle_cpu(p->cpus_ptr, 0);
if (cpu >= 0)
scx_bpf_kick_cpu(cpu, SCX_KICK_IDLE);
@ -262,7 +262,7 @@ void BPF_STRUCT_OPS(qmap_enqueue, struct task_struct *p, u64 enq_flags)
/* Queue on the selected FIFO. If the FIFO overflows, punt to global. */
if (bpf_map_push_elem(ring, &pid, 0)) {
scx_bpf_dispatch(p, SHARED_DSQ, slice_ns, enq_flags);
scx_bpf_dsq_insert(p, SHARED_DSQ, slice_ns, enq_flags);
return;
}
@ -294,10 +294,10 @@ static void update_core_sched_head_seq(struct task_struct *p)
}
/*
* To demonstrate the use of scx_bpf_dispatch_from_dsq(), implement silly
* selective priority boosting mechanism by scanning SHARED_DSQ looking for
* highpri tasks, moving them to HIGHPRI_DSQ and then consuming them first. This
* makes minor difference only when dsp_batch is larger than 1.
* To demonstrate the use of scx_bpf_dsq_move(), implement silly selective
* priority boosting mechanism by scanning SHARED_DSQ looking for highpri tasks,
* moving them to HIGHPRI_DSQ and then consuming them first. This makes minor
* difference only when dsp_batch is larger than 1.
*
* scx_bpf_dispatch[_vtime]_from_dsq() are allowed both from ops.dispatch() and
* non-rq-lock holding BPF programs. As demonstration, this function is called
@ -318,11 +318,11 @@ static bool dispatch_highpri(bool from_timer)
if (tctx->highpri) {
/* exercise the set_*() and vtime interface too */
__COMPAT_scx_bpf_dispatch_from_dsq_set_slice(
__COMPAT_scx_bpf_dsq_move_set_slice(
BPF_FOR_EACH_ITER, slice_ns * 2);
__COMPAT_scx_bpf_dispatch_from_dsq_set_vtime(
__COMPAT_scx_bpf_dsq_move_set_vtime(
BPF_FOR_EACH_ITER, highpri_seq++);
__COMPAT_scx_bpf_dispatch_vtime_from_dsq(
__COMPAT_scx_bpf_dsq_move_vtime(
BPF_FOR_EACH_ITER, p, HIGHPRI_DSQ, 0);
}
}
@ -340,9 +340,9 @@ static bool dispatch_highpri(bool from_timer)
else
cpu = scx_bpf_pick_any_cpu(p->cpus_ptr, 0);
if (__COMPAT_scx_bpf_dispatch_from_dsq(BPF_FOR_EACH_ITER, p,
SCX_DSQ_LOCAL_ON | cpu,
SCX_ENQ_PREEMPT)) {
if (__COMPAT_scx_bpf_dsq_move(BPF_FOR_EACH_ITER, p,
SCX_DSQ_LOCAL_ON | cpu,
SCX_ENQ_PREEMPT)) {
if (cpu == this_cpu) {
dispatched = true;
__sync_fetch_and_add(&nr_expedited_local, 1);
@ -374,7 +374,7 @@ void BPF_STRUCT_OPS(qmap_dispatch, s32 cpu, struct task_struct *prev)
if (dispatch_highpri(false))
return;
if (!nr_highpri_queued && scx_bpf_consume(SHARED_DSQ))
if (!nr_highpri_queued && scx_bpf_dsq_move_to_local(SHARED_DSQ))
return;
if (dsp_inf_loop_after && nr_dispatched > dsp_inf_loop_after) {
@ -385,7 +385,7 @@ void BPF_STRUCT_OPS(qmap_dispatch, s32 cpu, struct task_struct *prev)
*/
p = bpf_task_from_pid(2);
if (p) {
scx_bpf_dispatch(p, SCX_DSQ_LOCAL, slice_ns, 0);
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, slice_ns, 0);
bpf_task_release(p);
return;
}
@ -431,7 +431,7 @@ void BPF_STRUCT_OPS(qmap_dispatch, s32 cpu, struct task_struct *prev)
update_core_sched_head_seq(p);
__sync_fetch_and_add(&nr_dispatched, 1);
scx_bpf_dispatch(p, SHARED_DSQ, slice_ns, 0);
scx_bpf_dsq_insert(p, SHARED_DSQ, slice_ns, 0);
bpf_task_release(p);
batch--;
@ -439,7 +439,7 @@ void BPF_STRUCT_OPS(qmap_dispatch, s32 cpu, struct task_struct *prev)
if (!batch || !scx_bpf_dispatch_nr_slots()) {
if (dispatch_highpri(false))
return;
scx_bpf_consume(SHARED_DSQ);
scx_bpf_dsq_move_to_local(SHARED_DSQ);
return;
}
if (!cpuc->dsp_cnt)

View file

@ -35,6 +35,8 @@ print(f'enabled : {read_static_key("__scx_ops_enabled")}')
print(f'switching_all : {read_int("scx_switching_all")}')
print(f'switched_all : {read_static_key("__scx_switched_all")}')
print(f'enable_state : {ops_state_str(enable_state)} ({enable_state})')
print(f'in_softlockup : {prog["scx_in_softlockup"].value_()}')
print(f'breather_depth: {read_atomic("scx_ops_breather_depth")}')
print(f'bypass_depth : {prog["scx_ops_bypass_depth"].value_()}')
print(f'nr_rejected : {read_atomic("scx_nr_rejected")}')
print(f'enable_seq : {read_atomic("scx_enable_seq")}')

View file

@ -31,10 +31,10 @@ UEI_DEFINE(uei);
/*
* Built-in DSQs such as SCX_DSQ_GLOBAL cannot be used as priority queues
* (meaning, cannot be dispatched to with scx_bpf_dispatch_vtime()). We
* (meaning, cannot be dispatched to with scx_bpf_dsq_insert_vtime()). We
* therefore create a separate DSQ with ID 0 that we dispatch to and consume
* from. If scx_simple only supported global FIFO scheduling, then we could
* just use SCX_DSQ_GLOBAL.
* from. If scx_simple only supported global FIFO scheduling, then we could just
* use SCX_DSQ_GLOBAL.
*/
#define SHARED_DSQ 0
@ -65,7 +65,7 @@ s32 BPF_STRUCT_OPS(simple_select_cpu, struct task_struct *p, s32 prev_cpu, u64 w
cpu = scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, &is_idle);
if (is_idle) {
stat_inc(0); /* count local queueing */
scx_bpf_dispatch(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
}
return cpu;
@ -76,7 +76,7 @@ void BPF_STRUCT_OPS(simple_enqueue, struct task_struct *p, u64 enq_flags)
stat_inc(1); /* count global queueing */
if (fifo_sched) {
scx_bpf_dispatch(p, SHARED_DSQ, SCX_SLICE_DFL, enq_flags);
scx_bpf_dsq_insert(p, SHARED_DSQ, SCX_SLICE_DFL, enq_flags);
} else {
u64 vtime = p->scx.dsq_vtime;
@ -87,14 +87,14 @@ void BPF_STRUCT_OPS(simple_enqueue, struct task_struct *p, u64 enq_flags)
if (vtime_before(vtime, vtime_now - SCX_SLICE_DFL))
vtime = vtime_now - SCX_SLICE_DFL;
scx_bpf_dispatch_vtime(p, SHARED_DSQ, SCX_SLICE_DFL, vtime,
enq_flags);
scx_bpf_dsq_insert_vtime(p, SHARED_DSQ, SCX_SLICE_DFL, vtime,
enq_flags);
}
}
void BPF_STRUCT_OPS(simple_dispatch, s32 cpu, struct task_struct *prev)
{
scx_bpf_consume(SHARED_DSQ);
scx_bpf_dsq_move_to_local(SHARED_DSQ);
}
void BPF_STRUCT_OPS(simple_running, struct task_struct *p)