linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-01-22 07:53:11 -05:00

History

Linus Torvalds 1d6d399223 Kthreads affinity follow either of 4 existing different patterns: 1) Per-CPU kthreads must stay affine to a single CPU and never execute relevant code on any other CPU. This is currently handled by smpboot code which takes care of CPU-hotplug operations. Affinity here is a correctness constraint. 2) Some kthreads _have_ to be affine to a specific set of CPUs and can't run anywhere else. The affinity is set through kthread_bind_mask() and the subsystem takes care by itself to handle CPU-hotplug operations. Affinity here is assumed to be a correctness constraint. 3) Per-node kthreads _prefer_ to be affine to a specific NUMA node. This is not a correctness constraint but merely a preference in terms of memory locality. kswapd and kcompactd both fall into this category. The affinity is set manually like for any other task and CPU-hotplug is supposed to be handled by the relevant subsystem so that the task is properly reaffined whenever a given CPU from the node comes up. Also care should be taken so that the node affinity doesn't cross isolated (nohz_full) cpumask boundaries. 4) Similar to the previous point except kthreads have a _preferred_ affinity different than a node. Both RCU boost kthreads and RCU exp kworkers fall into this category as they refer to "RCU nodes" from a distinctly distributed tree. Currently the preferred affinity patterns (3 and 4) have at least 4 identified users, with more or less success when it comes to handle CPU-hotplug operations and CPU isolation. Each of which do it in its own ad-hoc way. This is an infrastructure proposal to handle this with the following API changes: _ kthread_create_on_node() automatically affines the created kthread to its target node unless it has been set as per-cpu or bound with kthread_bind[_mask]() before the first wake-up. - kthread_affine_preferred() is a new function that can be called right after kthread_create_on_node() to specify a preferred affinity different than the specified node. When the preferred affinity can't be applied because the possible targets are offline or isolated (nohz_full), the kthread is affine to the housekeeping CPUs (which means to all online CPUs most of the time or only the non-nohz_full CPUs when nohz_full= is set). kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been converted, along with a few old drivers. Summary of the changes: * Consolidate a bunch of ad-hoc implementations of kthread_run_on_cpu() * Introduce task_cpu_fallback_mask() that defines the default last resort affinity of a task to become nohz_full aware * Add some correctness check to ensure kthread_bind() is always called before the first kthread wake up. * Default affine kthread to its preferred node. * Convert kswapd / kcompactd and remove their halfway working ad-hoc affinity implementation * Implement kthreads preferred affinity * Unify kthread worker and kthread API's style * Convert RCU kthreads to the new API and remove the ad-hoc affinity implementation. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmeNf8gACgkQhSRUR1CO jHedQQ/+IxTjjqQiItzrq41TES2S0desHDq8lNJFb7rsR/DtKFyLx3s67cOYV+cM Yx54QHg2m/Fz4nXMQ7Po5ygOtJGCKBc5C5QQy7y0lVKeTQK+daDfEtBSa3oG7j3C u+E3tTY6qxkbCzymUyaKkHN4/ay2vLvjFS50luV7KMyI3x47Aji+t7VdCX4LCPP2 eAwOALWD0+7qLJ/VF6gsmQLKA4Qx7PQAzBa3KSBmUN9UcN8Gk1bQHCTIQKDHP9LQ v8BXrNZtYX1o2+snNYpX2z6/ECjxkdwriOgqqZY5306hd9RAQ1u46Dx3byrIqjGn ULG/XQ2istPyhTqb/h+RbrobdOcwEUIeqk8hRRbBXE8bPpqUz9EMuaCMxWDbQjgH NTuKG4ifKJ/IqstkkuDkdOiByE/ysMmwqrTXgSnu2ITNL9yY3BEgFbvA95hgo42s f7QCxEfZb1MHcNEMENSMwM3xw5lLMGMpxVZcMQ3gLwyotMBRrhFZm1qZJG7TITYW IDIeCbH4JOMdQwLs3CcWTXio0N5/85NhRNFV+IDn96OrgxObgnMtV8QwNgjXBAJ5 wGeJWt8s34W1Zo3qS9gEuVzEhW4XaxISQQMkHe8faKkK6iHmIB/VjSQikDwwUNQ/ AspYj82RyWBCDZsqhiYh71kpxjvS6Xp0bj39Ce1sNsOnuksxKkQ= =g8In -----END PGP SIGNATURE----- Merge tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks Pull kthread updates from Frederic Weisbecker: "Kthreads affinity follow either of 4 existing different patterns: 1) Per-CPU kthreads must stay affine to a single CPU and never execute relevant code on any other CPU. This is currently handled by smpboot code which takes care of CPU-hotplug operations. Affinity here is a correctness constraint. 2) Some kthreads _have_ to be affine to a specific set of CPUs and can't run anywhere else. The affinity is set through kthread_bind_mask() and the subsystem takes care by itself to handle CPU-hotplug operations. Affinity here is assumed to be a correctness constraint. 3) Per-node kthreads _prefer_ to be affine to a specific NUMA node. This is not a correctness constraint but merely a preference in terms of memory locality. kswapd and kcompactd both fall into this category. The affinity is set manually like for any other task and CPU-hotplug is supposed to be handled by the relevant subsystem so that the task is properly reaffined whenever a given CPU from the node comes up. Also care should be taken so that the node affinity doesn't cross isolated (nohz_full) cpumask boundaries. 4) Similar to the previous point except kthreads have a _preferred_ affinity different than a node. Both RCU boost kthreads and RCU exp kworkers fall into this category as they refer to "RCU nodes" from a distinctly distributed tree. Currently the preferred affinity patterns (3 and 4) have at least 4 identified users, with more or less success when it comes to handle CPU-hotplug operations and CPU isolation. Each of which do it in its own ad-hoc way. This is an infrastructure proposal to handle this with the following API changes: - kthread_create_on_node() automatically affines the created kthread to its target node unless it has been set as per-cpu or bound with kthread_bind[_mask]() before the first wake-up. - kthread_affine_preferred() is a new function that can be called right after kthread_create_on_node() to specify a preferred affinity different than the specified node. When the preferred affinity can't be applied because the possible targets are offline or isolated (nohz_full), the kthread is affine to the housekeeping CPUs (which means to all online CPUs most of the time or only the non-nohz_full CPUs when nohz_full= is set). kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been converted, along with a few old drivers. Summary of the changes: - Consolidate a bunch of ad-hoc implementations of kthread_run_on_cpu() - Introduce task_cpu_fallback_mask() that defines the default last resort affinity of a task to become nohz_full aware - Add some correctness check to ensure kthread_bind() is always called before the first kthread wake up. - Default affine kthread to its preferred node. - Convert kswapd / kcompactd and remove their halfway working ad-hoc affinity implementation - Implement kthreads preferred affinity - Unify kthread worker and kthread API's style - Convert RCU kthreads to the new API and remove the ad-hoc affinity implementation" * tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: kthread: modify kernel-doc function name to match code rcu: Use kthread preferred affinity for RCU exp kworkers treewide: Introduce kthread_run_worker[_on_cpu]() kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format rcu: Use kthread preferred affinity for RCU boost kthread: Implement preferred affinity mm: Create/affine kswapd to its preferred node mm: Create/affine kcompactd to its preferred node kthread: Default affine kthread to its preferred NUMA node kthread: Make sure kthread hasn't started while binding it sched,arm64: Handle CPU isolation on last resort fallback rq selection arm64: Exclude nohz_full CPUs from 32bits el0 support lib: test_objpool: Use kthread_run_on_cpu() kallsyms: Use kthread_run_on_cpu() soc/qman: test: Use kthread_run_on_cpu() arm/bL_switcher: Use kthread_run_on_cpu()		2025-01-21 17:10:05 -08:00
..
damon	mm/damon/core: fix ignored quota goals and filters of newly committed schemes	2024-12-30 17:59:11 -08:00
kasan	24 hotfixes. 17 are cc:stable. 15 are MM and 9 are non-MM.	2024-12-08 11:26:13 -08:00
kfence	mm/kfence: add a new kunit test test_use_after_free_read_nofault()	2024-11-14 22:49:19 -08:00
kmsan	mm, kasan, kmsan: instrument copy_from/to_kernel_nofault	2024-11-06 20:11:14 -08:00
backing-dev.c
balloon_compaction.c
bootmem_info.c	bootmem: stop using page->index	2024-11-07 14:38:07 -08:00
cma.c	cma: enforce non-zero pageblock_order during cma_init_reserved_mem()	2024-11-14 22:49:19 -08:00
cma.h
cma_debug.c
cma_sysfs.c
compaction.c	mm: Create/affine kcompactd to its preferred node	2025-01-08 18:15:03 +01:00
debug.c	mm: open-code page_folio() in dump_page()	2024-12-05 19:54:45 -08:00
debug_page_alloc.c
debug_page_ref.c
debug_vm_pgtable.c	mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries	2024-09-17 01:07:01 -07:00
dmapool.c
dmapool_test.c
early_ioremap.c
execmem.c	alloc_tag: populate memory for module tags as needed	2024-11-07 14:25:16 -08:00
fadvise.c	fdget(), trivial conversions	2024-11-03 01:28:06 -05:00
fail_page_alloc.c	fault-inject: improve build for CONFIG_FAULT_INJECTION=n	2024-09-01 20:43:33 -07:00
failslab.c	fault-inject: improve build for CONFIG_FAULT_INJECTION=n	2024-09-01 20:43:33 -07:00
filemap.c	mm: fix assertion in folio_end_read()	2025-01-12 19:03:38 -08:00
folio-compat.c	mm/writeback: add folio_mark_dirty_lock()	2024-11-05 11:14:32 +01:00
gup.c	Performance events changes for v6.14:	2025-01-21 10:52:03 -08:00
gup_test.c
gup_test.h
highmem.c
hmm.c	mm: provide mm_struct and address to huge_ptep_get()	2024-07-12 15:52:15 -07:00
huge_memory.c	mm: clear uffd-wp PTE/PMD state on mremap()	2025-01-12 19:03:37 -08:00
hugetlb.c	mm: clear uffd-wp PTE/PMD state on mremap()	2025-01-12 19:03:37 -08:00
hugetlb_cgroup.c	mm: memcg: don't call propagate_protected_usage() needlessly	2024-09-01 20:25:50 -07:00
hugetlb_vmemmap.c	mm/hugetlb_vmemmap: don't synchronize_rcu() without HVO	2024-09-01 20:25:45 -07:00
hugetlb_vmemmap.h
hwpoison-inject.c
init-mm.c	mm: convert mm_lock_seq to a proper seqcount	2024-12-02 12:01:38 +01:00
internal.h	mm, madvise: fix potential workingset node list_lru leaks	2024-12-30 17:59:11 -08:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig	arm64 updates for 6.13:	2024-11-18 18:10:37 -08:00
Kconfig.debug	slub: Introduce CONFIG_SLUB_RCU_DEBUG	2024-08-27 14:12:51 +02:00
khugepaged.c	mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma	2025-01-15 21:15:43 -08:00
kmemleak.c	mm/kmemleak: fix percpu memory leak detection failure	2025-01-12 19:03:34 -08:00
ksm.c	- The series "zram: optimal post-processing target selection" from	2024-11-23 09:58:07 -08:00
list_lru.c	mm/list_lru: fix false warning of negative counter	2024-12-30 17:59:10 -08:00
maccess.c	kasan: migrate copy_user_test to kunit	2024-11-11 00:26:44 -08:00
madvise.c	mm: madvise: implement lightweight guard page mechanism	2024-11-11 00:26:45 -08:00
Makefile	mm: move the page fragment allocator from page_alloc into its own file	2024-11-11 10:56:26 -08:00
mapping_dirty_helpers.c
memblock.c	memblock: allow zero threshold in validate_numa_converage()	2024-12-01 21:08:56 +02:00
memcontrol-v1.c	- The series "zram: optimal post-processing target selection" from	2024-11-23 09:58:07 -08:00
memcontrol-v1.h	mm: memcg: declare do_memsw_account inline	2024-12-05 19:54:46 -08:00
memcontrol.c	memcg/hugetlb: add hugeTLB counters to memcg	2024-11-14 22:49:19 -08:00
memfd.c	mm: reinstate ability to map write-sealed memfd mappings read-only	2024-12-30 17:59:06 -08:00
memory-failure.c	mm/memory-failure: replace sprintf() with sysfs_emit()	2024-11-11 00:26:46 -08:00
memory-tiers.c	memory tiers: use default_dram_perf_ref_source in log message	2024-09-26 14:01:44 -07:00
memory.c	mm: use clear_user_(high)page() for arch with special user folio handling	2024-12-18 19:04:43 -08:00
memory_hotplug.c	kaslr: rename physmem_end and PHYSMEM_END to direct_map_physmem_end	2024-11-06 20:11:11 -08:00
mempolicy.c	mm/mempolicy: count MPOL_WEIGHTED_INTERLEAVE to "interleave_hit"	2025-01-12 19:03:35 -08:00
mempool.c
memremap.c
memtest.c
migrate.c	mm/codetag: swap tags when migrate pages	2024-12-05 19:54:46 -08:00
migrate_device.c	mm: remap unused subpages to shared zeropage when splitting isolated thp	2024-09-09 16:39:03 -07:00
mincore.c	mm: provide mm_struct and address to huge_ptep_get()	2024-07-12 15:52:15 -07:00
mlock.c	mm/mlock: set the correct prev on failure	2024-11-07 14:14:58 -08:00
mm_init.c	memblock: updates for 6.13-rc1	2024-11-27 11:13:25 -08:00
mm_slot.h
mmap.c	mm: don't try THP alignment for FS without get_unmapped_area	2024-12-30 17:59:06 -08:00
mmap_lock.c	mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount	2024-11-11 17:22:28 -08:00
mmu_gather.c
mmu_notifier.c	mm: move internal core VMA manipulation functions to own file	2024-09-01 20:25:54 -07:00
mmzone.c	mm: improve code consistency with zonelist_* helper functions	2024-09-01 20:25:55 -07:00
mprotect.c	mm: add PTE_MARKER_GUARD PTE marker	2024-11-11 00:26:44 -08:00
mremap.c	mm: clear uffd-wp PTE/PMD state on mremap()	2025-01-12 19:03:37 -08:00
mseal.c	mm: madvise: implement lightweight guard page mechanism	2024-11-11 00:26:45 -08:00
msync.c
nommu.c	nommu: pass NULL argument to vma_iter_prealloc()	2024-11-11 17:20:23 -08:00
numa.c	mm: make range-to-target_node lookup facility a part of numa_memblks	2024-09-03 21:15:32 -07:00
numa_emulation.c	mm: introduce numa_emulation	2024-09-03 21:15:31 -07:00
numa_memblks.c	mm: numa_clear_kernel_node_hotplug: Add NUMA_NO_NODE check for node id	2024-10-28 21:40:40 -07:00
oom_kill.c	mm: move mm flags to mm_types.h	2024-11-05 16:56:26 -08:00
page-writeback.c	mm: fix div by zero in bdi_ratio_from_pages	2025-01-12 19:03:36 -08:00
page_alloc.c	mm: page_alloc: fix missed updates of lowmem_reserve in adjust_managed_page_count	2025-01-15 21:15:42 -08:00
page_counter.c	kernel/cgroup: Add "dmem" memory accounting cgroup	2025-01-06 17:24:38 +01:00
page_ext.c	mm: don't account memmap per-node	2024-08-15 22:16:14 -07:00
page_frag_cache.c	mm: page_frag: use __alloc_pages() to replace alloc_pages_node()	2024-11-11 10:56:27 -08:00
page_idle.c
page_io.c	mm: add per-order mTHP swpin counters	2024-11-11 00:26:43 -08:00
page_isolation.c	mm: remove migration for HugePage in isolate_single_pageblock()	2024-09-03 21:15:40 -07:00
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c
page_vma_mapped.c	mm: mass constification of folio/page pointers	2024-11-07 14:38:07 -08:00
pagewalk.c	mm: pagewalk: add the ability to install PTEs	2024-11-11 00:26:44 -08:00
percpu-internal.h	mm: remove CONFIG_MEMCG_KMEM	2024-07-10 12:14:54 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c	mm: use page->private instead of page->index in percpu	2024-11-07 14:38:07 -08:00
pgalloc-track.h
pgtable-generic.c	mm: add RCU annotation to pte_offset_map(_lock)	2024-12-18 19:04:43 -08:00
process_vm_access.c	mm: refactor mm_access() to not return NULL	2024-11-05 16:56:23 -08:00
ptdump.c
readahead.c	mm/readahead: fix large folio support in async readahead	2024-12-30 17:59:07 -08:00
rmap.c	mm: mass constification of folio/page pointers	2024-11-07 14:38:07 -08:00
rodata_test.c
secretmem.c	secretmem: disable memfd_secret() if arch cannot set direct map	2024-10-09 12:47:19 -07:00
shmem.c	vfs-6.14-rc1.libfs	2025-01-20 11:00:53 -08:00
shmem_quota.c	shmem_quota: build the object file conditionally to the config option	2024-09-01 20:25:45 -07:00
show_mem.c	mm/show_mem: use str_yes_no() helper in show_free_areas()	2024-11-07 14:38:08 -08:00
shrinker.c	mm: shrinker: avoid memleak in alloc_shrinker_info	2024-10-31 20:27:04 -07:00
shrinker_debug.c	mm: shrinker: use min() to improve shrinker_debugfs_scan_write()	2024-09-03 21:15:40 -07:00
shuffle.c
shuffle.h
slab.h	mm/slab: fix kernel-doc func param names	2025-01-13 10:22:04 +01:00
slab_common.c	mm/slab: Move kvfree_rcu() into SLAB	2025-01-11 20:39:43 +01:00
slub.c	memcg: slub: fix SUnreclaim for post charged objects	2024-12-10 09:25:39 +01:00
sparse-vmemmap.c	mm: define general function pXd_init()	2024-11-11 17:22:27 -08:00
sparse.c	bootmem: stop using page->index	2024-11-07 14:38:07 -08:00
swap.c	- The series "zram: optimal post-processing target selection" from	2024-11-23 09:58:07 -08:00
swap.h	mm: fix swap_read_folio_zeromap() for large folios with partial zeromap	2024-09-17 01:07:01 -07:00
swap_cgroup.c	mm: attempt to batch free swap entries for zap_pte_range()	2024-09-03 21:15:33 -07:00
swap_slots.c
swap_state.c	mm: swap: use str_true_false() helper function	2024-11-06 20:11:14 -08:00
swapfile.c	mm, swap: fix allocation and scanning race with swapoff	2024-11-14 15:25:07 -08:00
truncate.c	- The series "zram: optimal post-processing target selection" from	2024-11-23 09:58:07 -08:00
usercopy.c
userfaultfd.c	mm: remove unused hugepage for vma_alloc_folio()	2024-11-06 20:11:12 -08:00
util.c	mm/util: make memdup_user_nul() similar to memdup_user()	2024-12-30 17:59:11 -08:00
vma.c	mm: correctly reference merged VMA	2024-12-18 19:04:42 -08:00
vma.h	mm: isolate mmap internal logic to mm/vma.c	2024-11-06 20:11:19 -08:00
vma_internal.h	mm: isolate mmap internal logic to mm/vma.c	2024-11-06 20:11:19 -08:00
vmalloc.c	vmalloc: fix accounting with i915	2024-12-18 19:04:45 -08:00
vmpressure.c
vmscan.c	Kthreads affinity follow either of 4 existing different patterns:	2025-01-21 17:10:05 -08:00
vmstat.c	vmstat: disable vmstat_work on vmstat_cpu_down_prep()	2025-01-12 19:03:38 -08:00
workingset.c	mm/list_lru: simplify the list_lru walk callback function	2024-11-11 17:22:26 -08:00
z3fold.c	mm/z3fold: add __percpu annotation to *unbuddied pointer in struct z3fold_pool	2024-09-01 20:25:56 -07:00
zbud.c
zpool.c
zsmalloc.c	mm/zsmalloc: use memcpy_from/to_page whereever possible	2024-11-07 14:38:07 -08:00
zswap.c	mm: zswap: move allocations during CPU init outside the lock	2025-01-15 21:15:43 -08:00