mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-23 08:35:19 -05:00
9d33edb20f
- Core: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages contrary to the uniform and specification defined storage mechanisms for PCI/MSI and PCI/MSI-X. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stranglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. - Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUsygTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoYXiD/40tXKzCzf0qFIqUlZLia1N3RRrwrNC DVTixuLtR9MrjwE+jWLQILa85SHInV8syXHSd35SzhsGDxkURFGi+HBgVWmysODf br9VSh3Gi+kt7iXtIwAg8WNWviGNmS3kPksxCko54F0YnJhMY5r5bhQVUBQkwFG2 wES1C9Uzd4pdV2bl24Z+WKL85cSmZ+pHunyKw1n401lBABXnTF9c4f13zC14jd+y wDxNrmOxeL3mEH4Pg6VyrDuTOURSf3TjJjeEq3EYqvUo0FyLt9I/cKX0AELcZQX7 fkRjrQQAvXNj39RJfeSkojDfllEPUHp7XSluhdBu5aIovSamdYGCDnuEoZ+l4MJ+ CojIErp3Dwj/uSaf5c7C3OaDAqH2CpOFWIcrUebShJE60hVKLEpUwd6W8juplaoT gxyXRb1Y+BeJvO8VhMN4i7f3232+sj8wuj+HTRTTbqMhkElnin94tAx8rgwR1sgR BiOGMJi4K2Y8s9Rqqp0Dvs01CW4guIYvSR4YY+WDbbi1xgiev89OYs6zZTJCJe4Y NUwwpqYSyP1brmtdDdBOZLqegjQm+TwUb6oOaasFem4vT1swgawgLcDnPOx45bk5 /FWt3EmnZxMz99x9jdDn1+BCqAZsKyEbEY1avvhPVMTwoVIuSX2ceTBMLseGq+jM 03JfvdxnueM3gw== =9erA -----END PGP SIGNATURE----- Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place" * tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ...
206 lines
6.2 KiB
Text
206 lines
6.2 KiB
Text
# SPDX-License-Identifier: GPL-2.0-only
|
|
#
|
|
# Performance Monitor Drivers
|
|
#
|
|
|
|
menu "Performance monitor support"
|
|
depends on PERF_EVENTS
|
|
|
|
config ARM_CCI_PMU
|
|
tristate "ARM CCI PMU driver"
|
|
depends on (ARM && CPU_V7) || ARM64
|
|
select ARM_CCI
|
|
help
|
|
Support for PMU events monitoring on the ARM CCI (Cache Coherent
|
|
Interconnect) family of products.
|
|
|
|
If compiled as a module, it will be called arm-cci.
|
|
|
|
config ARM_CCI400_PMU
|
|
bool "support CCI-400"
|
|
default y
|
|
depends on ARM_CCI_PMU
|
|
select ARM_CCI400_COMMON
|
|
help
|
|
CCI-400 provides 4 independent event counters counting events related
|
|
to the connected slave/master interfaces, plus a cycle counter.
|
|
|
|
config ARM_CCI5xx_PMU
|
|
bool "support CCI-500/CCI-550"
|
|
default y
|
|
depends on ARM_CCI_PMU
|
|
help
|
|
CCI-500/CCI-550 both provide 8 independent event counters, which can
|
|
count events pertaining to the slave/master interfaces as well as the
|
|
internal events to the CCI.
|
|
|
|
config ARM_CCN
|
|
tristate "ARM CCN driver support"
|
|
depends on ARM || ARM64 || COMPILE_TEST
|
|
help
|
|
PMU (perf) driver supporting the ARM CCN (Cache Coherent Network)
|
|
interconnect.
|
|
|
|
config ARM_CMN
|
|
tristate "Arm CMN-600 PMU support"
|
|
depends on ARM64 || COMPILE_TEST
|
|
help
|
|
Support for PMU events monitoring on the Arm CMN-600 Coherent Mesh
|
|
Network interconnect.
|
|
|
|
config ARM_PMU
|
|
depends on ARM || ARM64
|
|
bool "ARM PMU framework"
|
|
default y
|
|
help
|
|
Say y if you want to use CPU performance monitors on ARM-based
|
|
systems.
|
|
|
|
config RISCV_PMU
|
|
depends on RISCV
|
|
bool "RISC-V PMU framework"
|
|
default y
|
|
help
|
|
Say y if you want to use CPU performance monitors on RISCV-based
|
|
systems. This provides the core PMU framework that abstracts common
|
|
PMU functionalities in a core library so that different PMU drivers
|
|
can reuse it.
|
|
|
|
config RISCV_PMU_LEGACY
|
|
depends on RISCV_PMU
|
|
bool "RISC-V legacy PMU implementation"
|
|
default y
|
|
help
|
|
Say y if you want to use the legacy CPU performance monitor
|
|
implementation on RISC-V based systems. This only allows counting
|
|
of cycle/instruction counter and doesn't support counter overflow,
|
|
or programmable counters. It will be removed in future.
|
|
|
|
config RISCV_PMU_SBI
|
|
depends on RISCV_PMU && RISCV_SBI
|
|
bool "RISC-V PMU based on SBI PMU extension"
|
|
default y
|
|
help
|
|
Say y if you want to use the CPU performance monitor
|
|
using SBI PMU extension on RISC-V based systems. This option provides
|
|
full perf feature support i.e. counter overflow, privilege mode
|
|
filtering, counter configuration.
|
|
|
|
config ARM_PMU_ACPI
|
|
depends on ARM_PMU && ACPI
|
|
def_bool y
|
|
|
|
config ARM_SMMU_V3_PMU
|
|
tristate "ARM SMMUv3 Performance Monitors Extension"
|
|
depends on (ARM64 && ACPI) || (COMPILE_TEST && 64BIT)
|
|
depends on GENERIC_MSI_IRQ
|
|
help
|
|
Provides support for the ARM SMMUv3 Performance Monitor Counter
|
|
Groups (PMCG), which provide monitoring of transactions passing
|
|
through the SMMU and allow the resulting information to be filtered
|
|
based on the Stream ID of the corresponding master.
|
|
|
|
config ARM_DSU_PMU
|
|
tristate "ARM DynamIQ Shared Unit (DSU) PMU"
|
|
depends on ARM64
|
|
help
|
|
Provides support for performance monitor unit in ARM DynamIQ Shared
|
|
Unit (DSU). The DSU integrates one or more cores with an L3 memory
|
|
system, control logic. The PMU allows counting various events related
|
|
to DSU.
|
|
|
|
config FSL_IMX8_DDR_PMU
|
|
tristate "Freescale i.MX8 DDR perf monitor"
|
|
depends on ARCH_MXC || COMPILE_TEST
|
|
help
|
|
Provides support for the DDR performance monitor in i.MX8, which
|
|
can give information about memory throughput and other related
|
|
events.
|
|
|
|
config QCOM_L2_PMU
|
|
bool "Qualcomm Technologies L2-cache PMU"
|
|
depends on ARCH_QCOM && ARM64 && ACPI
|
|
select QCOM_KRYO_L2_ACCESSORS
|
|
help
|
|
Provides support for the L2 cache performance monitor unit (PMU)
|
|
in Qualcomm Technologies processors.
|
|
Adds the L2 cache PMU into the perf events subsystem for
|
|
monitoring L2 cache events.
|
|
|
|
config QCOM_L3_PMU
|
|
bool "Qualcomm Technologies L3-cache PMU"
|
|
depends on ARCH_QCOM && ARM64 && ACPI
|
|
select QCOM_IRQ_COMBINER
|
|
help
|
|
Provides support for the L3 cache performance monitor unit (PMU)
|
|
in Qualcomm Technologies processors.
|
|
Adds the L3 cache PMU into the perf events subsystem for
|
|
monitoring L3 cache events.
|
|
|
|
config THUNDERX2_PMU
|
|
tristate "Cavium ThunderX2 SoC PMU UNCORE"
|
|
depends on ARCH_THUNDER2 || COMPILE_TEST
|
|
depends on NUMA && ACPI
|
|
default m
|
|
help
|
|
Provides support for ThunderX2 UNCORE events.
|
|
The SoC has PMU support in its L3 cache controller (L3C) and
|
|
in the DDR4 Memory Controller (DMC).
|
|
|
|
config XGENE_PMU
|
|
depends on ARCH_XGENE || (COMPILE_TEST && 64BIT)
|
|
bool "APM X-Gene SoC PMU"
|
|
default n
|
|
help
|
|
Say y if you want to use APM X-Gene SoC performance monitors.
|
|
|
|
config ARM_SPE_PMU
|
|
tristate "Enable support for the ARMv8.2 Statistical Profiling Extension"
|
|
depends on ARM64
|
|
help
|
|
Enable perf support for the ARMv8.2 Statistical Profiling
|
|
Extension, which provides periodic sampling of operations in
|
|
the CPU pipeline and reports this via the perf AUX interface.
|
|
|
|
config ARM_DMC620_PMU
|
|
tristate "Enable PMU support for the ARM DMC-620 memory controller"
|
|
depends on (ARM64 && ACPI) || COMPILE_TEST
|
|
help
|
|
Support for PMU events monitoring on the ARM DMC-620 memory
|
|
controller.
|
|
|
|
config MARVELL_CN10K_TAD_PMU
|
|
tristate "Marvell CN10K LLC-TAD PMU"
|
|
depends on ARCH_THUNDER || (COMPILE_TEST && 64BIT)
|
|
help
|
|
Provides support for Last-Level cache Tag-and-data Units (LLC-TAD)
|
|
performance monitors on CN10K family silicons.
|
|
|
|
config APPLE_M1_CPU_PMU
|
|
bool "Apple M1 CPU PMU support"
|
|
depends on ARM_PMU && ARCH_APPLE
|
|
help
|
|
Provides support for the non-architectural CPU PMUs present on
|
|
the Apple M1 SoCs and derivatives.
|
|
|
|
config ALIBABA_UNCORE_DRW_PMU
|
|
tristate "Alibaba T-Head Yitian 710 DDR Sub-system Driveway PMU driver"
|
|
depends on (ARM64 && ACPI) || COMPILE_TEST
|
|
help
|
|
Support for Driveway PMU events monitoring on Yitian 710 DDR
|
|
Sub-system.
|
|
|
|
source "drivers/perf/hisilicon/Kconfig"
|
|
|
|
config MARVELL_CN10K_DDR_PMU
|
|
tristate "Enable MARVELL CN10K DRAM Subsystem(DSS) PMU Support"
|
|
depends on ARCH_THUNDER || (COMPILE_TEST && 64BIT)
|
|
help
|
|
Enable perf support for Marvell DDR Performance monitoring
|
|
event on CN10K platform.
|
|
|
|
source "drivers/perf/arm_cspmu/Kconfig"
|
|
|
|
source "drivers/perf/amlogic/Kconfig"
|
|
|
|
endmenu
|