1
0
Fork 0
mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-01-22 07:53:11 -05:00
linux/mm
Linus Torvalds 1d6d399223 Kthreads affinity follow either of 4 existing different patterns:
1) Per-CPU kthreads must stay affine to a single CPU and never execute
    relevant code on any other CPU. This is currently handled by smpboot
    code which takes care of CPU-hotplug operations. Affinity here is
    a correctness constraint.
 
 2) Some kthreads _have_ to be affine to a specific set of CPUs and can't
    run anywhere else. The affinity is set through kthread_bind_mask()
    and the subsystem takes care by itself to handle CPU-hotplug
    operations. Affinity here is assumed to be a correctness constraint.
 
 3) Per-node kthreads _prefer_ to be affine to a specific NUMA node. This
    is not a correctness constraint but merely a preference in terms of
    memory locality. kswapd and kcompactd both fall into this category.
    The affinity is set manually like for any other task and CPU-hotplug
    is supposed to be handled by the relevant subsystem so that the task
    is properly reaffined whenever a given CPU from the node comes up.
    Also care should be taken so that the node affinity doesn't cross
    isolated (nohz_full) cpumask boundaries.
 
 4) Similar to the previous point except kthreads have a _preferred_
    affinity different than a node. Both RCU boost kthreads and RCU
    exp kworkers fall into this category as they refer to "RCU nodes"
    from a distinctly distributed tree.
 
 Currently the preferred affinity patterns (3 and 4) have at least 4
 identified users, with more or less success when it comes to handle
 CPU-hotplug operations and CPU isolation. Each of which do it in its own
 ad-hoc way.
 
 This is an infrastructure proposal to handle this with the following API
 changes:
 
 _ kthread_create_on_node() automatically affines the created kthread to
   its target node unless it has been set as per-cpu or bound with
   kthread_bind[_mask]() before the first wake-up.
 
 - kthread_affine_preferred() is a new function that can be called right
   after kthread_create_on_node() to specify a preferred affinity
   different than the specified node.
 
 When the preferred affinity can't be applied because the possible
 targets are offline or isolated (nohz_full), the kthread is affine
 to the housekeeping CPUs (which means to all online CPUs most of the
 time or only the non-nohz_full CPUs when nohz_full= is set).
 
 kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been
 converted, along with a few old drivers.
 
 Summary of the changes:
 
 * Consolidate a bunch of ad-hoc implementations of kthread_run_on_cpu()
 
 * Introduce task_cpu_fallback_mask() that defines the default last
   resort affinity of a task to become nohz_full aware
 
 * Add some correctness check to ensure kthread_bind() is always called
   before the first kthread wake up.
 
 * Default affine kthread to its preferred node.
 
 * Convert kswapd / kcompactd and remove their halfway working ad-hoc
   affinity implementation
 
 * Implement kthreads preferred affinity
 
 * Unify kthread worker and kthread API's style
 
 * Convert RCU kthreads to the new API and remove the ad-hoc affinity
   implementation.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmeNf8gACgkQhSRUR1CO
 jHedQQ/+IxTjjqQiItzrq41TES2S0desHDq8lNJFb7rsR/DtKFyLx3s67cOYV+cM
 Yx54QHg2m/Fz4nXMQ7Po5ygOtJGCKBc5C5QQy7y0lVKeTQK+daDfEtBSa3oG7j3C
 u+E3tTY6qxkbCzymUyaKkHN4/ay2vLvjFS50luV7KMyI3x47Aji+t7VdCX4LCPP2
 eAwOALWD0+7qLJ/VF6gsmQLKA4Qx7PQAzBa3KSBmUN9UcN8Gk1bQHCTIQKDHP9LQ
 v8BXrNZtYX1o2+snNYpX2z6/ECjxkdwriOgqqZY5306hd9RAQ1u46Dx3byrIqjGn
 ULG/XQ2istPyhTqb/h+RbrobdOcwEUIeqk8hRRbBXE8bPpqUz9EMuaCMxWDbQjgH
 NTuKG4ifKJ/IqstkkuDkdOiByE/ysMmwqrTXgSnu2ITNL9yY3BEgFbvA95hgo42s
 f7QCxEfZb1MHcNEMENSMwM3xw5lLMGMpxVZcMQ3gLwyotMBRrhFZm1qZJG7TITYW
 IDIeCbH4JOMdQwLs3CcWTXio0N5/85NhRNFV+IDn96OrgxObgnMtV8QwNgjXBAJ5
 wGeJWt8s34W1Zo3qS9gEuVzEhW4XaxISQQMkHe8faKkK6iHmIB/VjSQikDwwUNQ/
 AspYj82RyWBCDZsqhiYh71kpxjvS6Xp0bj39Ce1sNsOnuksxKkQ=
 =g8In
 -----END PGP SIGNATURE-----

Merge tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks

Pull kthread updates from Frederic Weisbecker:
 "Kthreads affinity follow either of 4 existing different patterns:

   1) Per-CPU kthreads must stay affine to a single CPU and never
      execute relevant code on any other CPU. This is currently handled
      by smpboot code which takes care of CPU-hotplug operations.
      Affinity here is a correctness constraint.

   2) Some kthreads _have_ to be affine to a specific set of CPUs and
      can't run anywhere else. The affinity is set through
      kthread_bind_mask() and the subsystem takes care by itself to
      handle CPU-hotplug operations. Affinity here is assumed to be a
      correctness constraint.

   3) Per-node kthreads _prefer_ to be affine to a specific NUMA node.
      This is not a correctness constraint but merely a preference in
      terms of memory locality. kswapd and kcompactd both fall into this
      category. The affinity is set manually like for any other task and
      CPU-hotplug is supposed to be handled by the relevant subsystem so
      that the task is properly reaffined whenever a given CPU from the
      node comes up. Also care should be taken so that the node affinity
      doesn't cross isolated (nohz_full) cpumask boundaries.

   4) Similar to the previous point except kthreads have a _preferred_
      affinity different than a node. Both RCU boost kthreads and RCU
      exp kworkers fall into this category as they refer to "RCU nodes"
      from a distinctly distributed tree.

  Currently the preferred affinity patterns (3 and 4) have at least 4
  identified users, with more or less success when it comes to handle
  CPU-hotplug operations and CPU isolation. Each of which do it in its
  own ad-hoc way.

  This is an infrastructure proposal to handle this with the following
  API changes:

   - kthread_create_on_node() automatically affines the created kthread
     to its target node unless it has been set as per-cpu or bound with
     kthread_bind[_mask]() before the first wake-up.

   - kthread_affine_preferred() is a new function that can be called
     right after kthread_create_on_node() to specify a preferred
     affinity different than the specified node.

  When the preferred affinity can't be applied because the possible
  targets are offline or isolated (nohz_full), the kthread is affine to
  the housekeeping CPUs (which means to all online CPUs most of the time
  or only the non-nohz_full CPUs when nohz_full= is set).

  kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been
  converted, along with a few old drivers.

  Summary of the changes:

   - Consolidate a bunch of ad-hoc implementations of
     kthread_run_on_cpu()

   - Introduce task_cpu_fallback_mask() that defines the default last
     resort affinity of a task to become nohz_full aware

   - Add some correctness check to ensure kthread_bind() is always
     called before the first kthread wake up.

   - Default affine kthread to its preferred node.

   - Convert kswapd / kcompactd and remove their halfway working ad-hoc
     affinity implementation

   - Implement kthreads preferred affinity

   - Unify kthread worker and kthread API's style

   - Convert RCU kthreads to the new API and remove the ad-hoc affinity
     implementation"

* tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks:
  kthread: modify kernel-doc function name to match code
  rcu: Use kthread preferred affinity for RCU exp kworkers
  treewide: Introduce kthread_run_worker[_on_cpu]()
  kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format
  rcu: Use kthread preferred affinity for RCU boost
  kthread: Implement preferred affinity
  mm: Create/affine kswapd to its preferred node
  mm: Create/affine kcompactd to its preferred node
  kthread: Default affine kthread to its preferred NUMA node
  kthread: Make sure kthread hasn't started while binding it
  sched,arm64: Handle CPU isolation on last resort fallback rq selection
  arm64: Exclude nohz_full CPUs from 32bits el0 support
  lib: test_objpool: Use kthread_run_on_cpu()
  kallsyms: Use kthread_run_on_cpu()
  soc/qman: test: Use kthread_run_on_cpu()
  arm/bL_switcher: Use kthread_run_on_cpu()
2025-01-21 17:10:05 -08:00
..
damon mm/damon/core: fix ignored quota goals and filters of newly committed schemes 2024-12-30 17:59:11 -08:00
kasan 24 hotfixes. 17 are cc:stable. 15 are MM and 9 are non-MM. 2024-12-08 11:26:13 -08:00
kfence mm/kfence: add a new kunit test test_use_after_free_read_nofault() 2024-11-14 22:49:19 -08:00
kmsan mm, kasan, kmsan: instrument copy_from/to_kernel_nofault 2024-11-06 20:11:14 -08:00
backing-dev.c
balloon_compaction.c
bootmem_info.c bootmem: stop using page->index 2024-11-07 14:38:07 -08:00
cma.c cma: enforce non-zero pageblock_order during cma_init_reserved_mem() 2024-11-14 22:49:19 -08:00
cma.h
cma_debug.c
cma_sysfs.c
compaction.c mm: Create/affine kcompactd to its preferred node 2025-01-08 18:15:03 +01:00
debug.c mm: open-code page_folio() in dump_page() 2024-12-05 19:54:45 -08:00
debug_page_alloc.c
debug_page_ref.c
debug_vm_pgtable.c
dmapool.c
dmapool_test.c
early_ioremap.c
execmem.c alloc_tag: populate memory for module tags as needed 2024-11-07 14:25:16 -08:00
fadvise.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
fail_page_alloc.c
failslab.c
filemap.c mm: fix assertion in folio_end_read() 2025-01-12 19:03:38 -08:00
folio-compat.c mm/writeback: add folio_mark_dirty_lock() 2024-11-05 11:14:32 +01:00
gup.c Performance events changes for v6.14: 2025-01-21 10:52:03 -08:00
gup_test.c
gup_test.h
highmem.c
hmm.c
huge_memory.c mm: clear uffd-wp PTE/PMD state on mremap() 2025-01-12 19:03:37 -08:00
hugetlb.c mm: clear uffd-wp PTE/PMD state on mremap() 2025-01-12 19:03:37 -08:00
hugetlb_cgroup.c
hugetlb_vmemmap.c
hugetlb_vmemmap.h
hwpoison-inject.c
init-mm.c mm: convert mm_lock_seq to a proper seqcount 2024-12-02 12:01:38 +01:00
internal.h mm, madvise: fix potential workingset node list_lru leaks 2024-12-30 17:59:11 -08:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig arm64 updates for 6.13: 2024-11-18 18:10:37 -08:00
Kconfig.debug
khugepaged.c mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma 2025-01-15 21:15:43 -08:00
kmemleak.c mm/kmemleak: fix percpu memory leak detection failure 2025-01-12 19:03:34 -08:00
ksm.c - The series "zram: optimal post-processing target selection" from 2024-11-23 09:58:07 -08:00
list_lru.c mm/list_lru: fix false warning of negative counter 2024-12-30 17:59:10 -08:00
maccess.c kasan: migrate copy_user_test to kunit 2024-11-11 00:26:44 -08:00
madvise.c mm: madvise: implement lightweight guard page mechanism 2024-11-11 00:26:45 -08:00
Makefile mm: move the page fragment allocator from page_alloc into its own file 2024-11-11 10:56:26 -08:00
mapping_dirty_helpers.c
memblock.c memblock: allow zero threshold in validate_numa_converage() 2024-12-01 21:08:56 +02:00
memcontrol-v1.c - The series "zram: optimal post-processing target selection" from 2024-11-23 09:58:07 -08:00
memcontrol-v1.h mm: memcg: declare do_memsw_account inline 2024-12-05 19:54:46 -08:00
memcontrol.c memcg/hugetlb: add hugeTLB counters to memcg 2024-11-14 22:49:19 -08:00
memfd.c mm: reinstate ability to map write-sealed memfd mappings read-only 2024-12-30 17:59:06 -08:00
memory-failure.c mm/memory-failure: replace sprintf() with sysfs_emit() 2024-11-11 00:26:46 -08:00
memory-tiers.c memory tiers: use default_dram_perf_ref_source in log message 2024-09-26 14:01:44 -07:00
memory.c mm: use clear_user_(high)page() for arch with special user folio handling 2024-12-18 19:04:43 -08:00
memory_hotplug.c kaslr: rename physmem_end and PHYSMEM_END to direct_map_physmem_end 2024-11-06 20:11:11 -08:00
mempolicy.c mm/mempolicy: count MPOL_WEIGHTED_INTERLEAVE to "interleave_hit" 2025-01-12 19:03:35 -08:00
mempool.c
memremap.c
memtest.c
migrate.c mm/codetag: swap tags when migrate pages 2024-12-05 19:54:46 -08:00
migrate_device.c
mincore.c
mlock.c mm/mlock: set the correct prev on failure 2024-11-07 14:14:58 -08:00
mm_init.c memblock: updates for 6.13-rc1 2024-11-27 11:13:25 -08:00
mm_slot.h
mmap.c mm: don't try THP alignment for FS without get_unmapped_area 2024-12-30 17:59:06 -08:00
mmap_lock.c mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount 2024-11-11 17:22:28 -08:00
mmu_gather.c
mmu_notifier.c
mmzone.c
mprotect.c mm: add PTE_MARKER_GUARD PTE marker 2024-11-11 00:26:44 -08:00
mremap.c mm: clear uffd-wp PTE/PMD state on mremap() 2025-01-12 19:03:37 -08:00
mseal.c mm: madvise: implement lightweight guard page mechanism 2024-11-11 00:26:45 -08:00
msync.c
nommu.c nommu: pass NULL argument to vma_iter_prealloc() 2024-11-11 17:20:23 -08:00
numa.c
numa_emulation.c
numa_memblks.c mm: numa_clear_kernel_node_hotplug: Add NUMA_NO_NODE check for node id 2024-10-28 21:40:40 -07:00
oom_kill.c mm: move mm flags to mm_types.h 2024-11-05 16:56:26 -08:00
page-writeback.c mm: fix div by zero in bdi_ratio_from_pages 2025-01-12 19:03:36 -08:00
page_alloc.c mm: page_alloc: fix missed updates of lowmem_reserve in adjust_managed_page_count 2025-01-15 21:15:42 -08:00
page_counter.c kernel/cgroup: Add "dmem" memory accounting cgroup 2025-01-06 17:24:38 +01:00
page_ext.c
page_frag_cache.c mm: page_frag: use __alloc_pages() to replace alloc_pages_node() 2024-11-11 10:56:27 -08:00
page_idle.c
page_io.c mm: add per-order mTHP swpin counters 2024-11-11 00:26:43 -08:00
page_isolation.c
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c
page_vma_mapped.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
pagewalk.c mm: pagewalk: add the ability to install PTEs 2024-11-11 00:26:44 -08:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: use page->private instead of page->index in percpu 2024-11-07 14:38:07 -08:00
pgalloc-track.h
pgtable-generic.c mm: add RCU annotation to pte_offset_map(_lock) 2024-12-18 19:04:43 -08:00
process_vm_access.c mm: refactor mm_access() to not return NULL 2024-11-05 16:56:23 -08:00
ptdump.c
readahead.c mm/readahead: fix large folio support in async readahead 2024-12-30 17:59:07 -08:00
rmap.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
rodata_test.c
secretmem.c secretmem: disable memfd_secret() if arch cannot set direct map 2024-10-09 12:47:19 -07:00
shmem.c vfs-6.14-rc1.libfs 2025-01-20 11:00:53 -08:00
shmem_quota.c
show_mem.c mm/show_mem: use str_yes_no() helper in show_free_areas() 2024-11-07 14:38:08 -08:00
shrinker.c mm: shrinker: avoid memleak in alloc_shrinker_info 2024-10-31 20:27:04 -07:00
shrinker_debug.c
shuffle.c
shuffle.h
slab.h mm/slab: fix kernel-doc func param names 2025-01-13 10:22:04 +01:00
slab_common.c mm/slab: Move kvfree_rcu() into SLAB 2025-01-11 20:39:43 +01:00
slub.c memcg: slub: fix SUnreclaim for post charged objects 2024-12-10 09:25:39 +01:00
sparse-vmemmap.c mm: define general function pXd_init() 2024-11-11 17:22:27 -08:00
sparse.c bootmem: stop using page->index 2024-11-07 14:38:07 -08:00
swap.c - The series "zram: optimal post-processing target selection" from 2024-11-23 09:58:07 -08:00
swap.h
swap_cgroup.c
swap_slots.c
swap_state.c mm: swap: use str_true_false() helper function 2024-11-06 20:11:14 -08:00
swapfile.c mm, swap: fix allocation and scanning race with swapoff 2024-11-14 15:25:07 -08:00
truncate.c - The series "zram: optimal post-processing target selection" from 2024-11-23 09:58:07 -08:00
usercopy.c
userfaultfd.c mm: remove unused hugepage for vma_alloc_folio() 2024-11-06 20:11:12 -08:00
util.c mm/util: make memdup_user_nul() similar to memdup_user() 2024-12-30 17:59:11 -08:00
vma.c mm: correctly reference merged VMA 2024-12-18 19:04:42 -08:00
vma.h mm: isolate mmap internal logic to mm/vma.c 2024-11-06 20:11:19 -08:00
vma_internal.h mm: isolate mmap internal logic to mm/vma.c 2024-11-06 20:11:19 -08:00
vmalloc.c vmalloc: fix accounting with i915 2024-12-18 19:04:45 -08:00
vmpressure.c
vmscan.c Kthreads affinity follow either of 4 existing different patterns: 2025-01-21 17:10:05 -08:00
vmstat.c vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-12 19:03:38 -08:00
workingset.c mm/list_lru: simplify the list_lru walk callback function 2024-11-11 17:22:26 -08:00
z3fold.c
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc: use memcpy_from/to_page whereever possible 2024-11-07 14:38:07 -08:00
zswap.c mm: zswap: move allocations during CPU init outside the lock 2025-01-15 21:15:43 -08:00