1
0
Fork 0
mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-01-24 01:09:38 -05:00
linux/drivers
David Hildenbrand 3fcebf9020 mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy
Currently, the "auto-movable" online policy does not allow for hotplugged
KERNEL (ZONE_NORMAL) memory to increase the amount of MOVABLE memory we
can have, primarily, because there is no coordiantion across memory
devices and we don't want to create zone-imbalances accidentially when
unplugging memory.

However, within a single memory device it's different.  Let's allow for
KERNEL memory within a dynamic memory group to allow for more MOVABLE
within the same memory group.  The only thing we have to take care of is
that the managing driver avoids zone imbalances by unplugging MOVABLE
memory first, otherwise there can be corner cases where unplug of memory
could result in (accidential) zone imbalances.

virtio-mem is the only user of dynamic memory groups and recently added
support for prioritizing unplug of ZONE_MOVABLE over ZONE_NORMAL, so we
don't need a new toggle to enable it for dynamic memory groups.

We limit this handling to dynamic memory groups, because:

* We want to keep the runtime overhead for collecting stats when
  onlining a single memory block small.  We tend to have only a handful of
  dynamic memory groups, but we can have quite some static memory groups
  (e.g., 256 DIMMs).

* It doesn't make too much sense for static memory groups, as we try
  onlining all applicable memory blocks either completely to ZONE_MOVABLE
  or not.  In ordinary operation, we won't have a mixture of zones within
  a static memory group.

When adding memory to a dynamic memory group, we'll first online memory to
ZONE_MOVABLE as long as early KERNEL memory allows for it.  Then, we'll
online the next unit(s) to ZONE_NORMAL, until we can online the next
unit(s) to ZONE_MOVABLE.

For a simple virtio-mem device with a MOVABLE:KERNEL ratio of 3:1, it will
result in a layout like:

  [M][M][M][M][M][M][M][M][N][M][M][M][N][M][M][M]...
  ^ movable memory due to early kernel memory
			   ^ allows for more movable memory ...
			      ^-----^ ... here
				       ^ allows for more movable memory ...
				          ^-----^ ... here

While the created layout is sub-optimal when it comes to contiguous zones,
it gives us the maximum flexibility when dynamically growing/shrinking a
device; we can grow small VMs really big in small steps, and still shrink
reliably to e.g., 1/4 of the maximum VM size in this example, removing
full memory blocks along with meta data more reliably.

Mark dynamic memory groups in the xarray such that we can efficiently
iterate over them when collecting stats.  In usual setups, we have one
virtio-mem device per NUMA node, and usually only a small number of NUMA
nodes.

Note: for now, there seems to be no compelling reason to make this
behavior configurable.

Link: https://lkml.kernel.org/r/20210806124715.17090-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hui Zhu <teawater@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:23 -07:00
..
accessibility
acpi ACPI: memhotplug: use a single static memory group for a single memory device 2021-09-08 11:50:23 -07:00
amba
android
ata
atm
auxdisplay
base mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy 2021-09-08 11:50:23 -07:00
bcma
block block-5.14-2021-08-27 2021-08-27 16:08:29 -07:00
bluetooth
bus Networking fixes for 5.14(-rc8?), including fixes from can and bpf. 2021-08-26 13:20:22 -07:00
cdrom
char
clk One hot fix for a NULL pointer deref in the Renesas usb clk driver 2021-08-29 12:52:17 -07:00
clocksource
comedi
connector
counter
cpufreq Merge branch 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm 2021-08-17 20:52:07 +02:00
cpuidle
crypto
cxl
dax dax/kmem: use a single static memory group for a single probed unit 2021-09-08 11:50:23 -07:00
dca
devfreq
dio
dma dmaengine fixes for v5.14 2021-08-06 11:08:24 -07:00
dma-buf
edac
eisa
extcon
firewire
firmware Ard says: 2021-08-15 06:38:26 -10:00
fpga
fsi
gnss
gpio
gpu drm/imx: imx-drm alignment and plane offset fixes 2021-08-27 10:49:53 +10:00
greybus
hid
hsi
hv
hwmon
hwspinlock
hwtracing
i2c i2c: dev: zero out array used for i2c reads from userspace 2021-08-10 22:54:10 +02:00
i3c
idle
iio
infiniband RDMA/rxe: Zero out index member of struct rxe_queue 2021-08-20 15:48:58 -03:00
input
interconnect Revert "interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate" 2021-08-12 09:24:39 +03:00
iommu iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry() 2021-08-18 13:15:58 +02:00
ipack ipack: tpci200: fix memory leak in the tpci200_register 2021-08-13 10:24:37 +02:00
irqchip
isdn
leds
lightnvm
macintosh
mailbox
mcb
md block-5.14-2021-08-07 2021-08-07 10:26:21 -07:00
media media: ipu3-cio2: Drop reference on error path in cio2_bridge_connect_sensor() 2021-08-26 18:52:30 +02:00
memory
memstick
message
mfd
misc
mmc Revert "mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711" 2021-08-27 16:30:36 +02:00
most
mtd MTD core fixes: 2021-08-16 06:36:01 -10:00
mux
net Revert "net: really fix the build..." 2021-08-26 11:08:32 -07:00
nfc
ntb
nubus
nvdimm libnvdimm/region: Fix label activation vs errors 2021-08-11 11:54:43 -07:00
nvme
nvmem
of
opp opp: core: Check for pending links before reading required_opp pointers 2021-08-23 12:44:55 +05:30
parisc
parport
pci PCI/MSI: Skip masking MSI-X on Xen PV 2021-08-27 00:27:15 +02:00
pcmcia
perf
phy
pinctrl pinctrl: amd: Fix an issue with shutdown when system set to s0ix 2021-08-12 11:16:40 +02:00
platform platform/x86: gigabyte-wmi: add support for B450M S2H V2 2021-08-18 19:39:31 +02:00
pnp
power
powercap
pps
ps3
ptp ptp_pch: Restore dependency on PCI 2021-08-16 11:11:06 +01:00
pwm
rapidio
ras
regulator
remoteproc
reset reset: reset-zynqmp: Fixed the argument data type 2021-08-23 12:55:18 +02:00
rpmsg
rtc
s390 Networking fixes for 5.14-rc6, including fixes from netfilter, bpf, 2021-08-12 16:24:03 -10:00
sbus
scsi SCSI fixes on 20210828 2021-08-28 11:39:16 -07:00
sh
siox
slimbus slimbus: ngd: reset dma setup during runtime pm 2021-08-13 10:22:30 +02:00
soc NXP/FSL SoC driver fixes for v5.14 2021-08-16 22:42:02 +02:00
soundwire
spi spi: Fixes for v5.14 2021-08-06 11:15:02 -07:00
spmi
ssb
staging Revert "media: dvb header files: move some headers to staging" 2021-08-23 09:49:09 -07:00
target
tc
tee
thermal
thunderbolt
tty
uio
usb usb: gadget: u_audio: fix race condition on endpoint stop 2021-08-27 16:07:23 +02:00
vdpa virtio,vhost,vdpa: bugfixes 2021-08-16 06:16:25 -10:00
vfio
vhost vringh: Use wiov->used to check for read/write desc order 2021-08-11 06:44:24 -04:00
video
virt
virtio virtio-mem: use a single dynamic memory group for a single virtio-mem device 2021-09-08 11:50:23 -07:00
visorbus
vlynq
vme
w1
watchdog
xen xen: branch for v5.14-rc6 2021-08-14 06:31:22 -10:00
zorro
Kconfig
Makefile