mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-24 09:13:20 -05:00
docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
The "KERNEL I/O BARRIER EFFECTS" section of memory-barriers.txt is vague, x86-centric, out-of-date, incomplete and demonstrably incorrect in places. This is largely because I/O ordering is a horrible can of worms, but also because the document has stagnated as our understanding has evolved. Attempt to address some of that, by rewriting the section based on recent(-ish) discussions with Arnd, BenH and others. Maybe one day we'll find a way to formalise this stuff, but for now let's at least try to make the English easier to understand. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Parri <andrea.parri@amarulasolutions.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Daniel Lustig <dlustig@nvidia.com> Cc: David Howells <dhowells@redhat.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
This commit is contained in:
parent
79a3aaa7b8
commit
4614bbdee3
1 changed files with 74 additions and 49 deletions
|
@ -2599,72 +2599,97 @@ likely, then interrupt-disabling locks should be used to guarantee ordering.
|
||||||
KERNEL I/O BARRIER EFFECTS
|
KERNEL I/O BARRIER EFFECTS
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
When accessing I/O memory, drivers should use the appropriate accessor
|
Interfacing with peripherals via I/O accesses is deeply architecture and device
|
||||||
functions:
|
specific. Therefore, drivers which are inherently non-portable may rely on
|
||||||
|
specific behaviours of their target systems in order to achieve synchronization
|
||||||
(*) inX(), outX():
|
in the most lightweight manner possible. For drivers intending to be portable
|
||||||
|
between multiple architectures and bus implementations, the kernel offers a
|
||||||
These are intended to talk to I/O space rather than memory space, but
|
series of accessor functions that provide various degrees of ordering
|
||||||
that's primarily a CPU-specific concept. The i386 and x86_64 processors
|
guarantees:
|
||||||
do indeed have special I/O space access cycles and instructions, but many
|
|
||||||
CPUs don't have such a concept.
|
|
||||||
|
|
||||||
The PCI bus, amongst others, defines an I/O space concept which - on such
|
|
||||||
CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O
|
|
||||||
space. However, it may also be mapped as a virtual I/O space in the CPU's
|
|
||||||
memory map, particularly on those CPUs that don't support alternate I/O
|
|
||||||
spaces.
|
|
||||||
|
|
||||||
Accesses to this space may be fully synchronous (as on i386), but
|
|
||||||
intermediary bridges (such as the PCI host bridge) may not fully honour
|
|
||||||
that.
|
|
||||||
|
|
||||||
They are guaranteed to be fully ordered with respect to each other.
|
|
||||||
|
|
||||||
They are not guaranteed to be fully ordered with respect to other types of
|
|
||||||
memory and I/O operation.
|
|
||||||
|
|
||||||
(*) readX(), writeX():
|
(*) readX(), writeX():
|
||||||
|
|
||||||
Whether these are guaranteed to be fully ordered and uncombined with
|
The readX() and writeX() MMIO accessors take a pointer to the peripheral
|
||||||
respect to each other on the issuing CPU depends on the characteristics
|
being accessed as an __iomem * parameter. For pointers mapped with the
|
||||||
defined for the memory window through which they're accessing. On later
|
default I/O attributes (e.g. those returned by ioremap()), then the
|
||||||
i386 architecture machines, for example, this is controlled by way of the
|
ordering guarantees are as follows:
|
||||||
MTRR registers.
|
|
||||||
|
|
||||||
Ordinarily, these will be guaranteed to be fully ordered and uncombined,
|
1. All readX() and writeX() accesses to the same peripheral are ordered
|
||||||
provided they're not accessing a prefetchable device.
|
with respect to each other. For example, this ensures that MMIO register
|
||||||
|
writes by the CPU to a particular device will arrive in program order.
|
||||||
|
|
||||||
However, intermediary hardware (such as a PCI bridge) may indulge in
|
2. A writeX() by the CPU to the peripheral will first wait for the
|
||||||
deferral if it so wishes; to flush a store, a load from the same location
|
completion of all prior CPU writes to memory. For example, this ensures
|
||||||
is preferred[*], but a load from the same device or from configuration
|
that writes by the CPU to an outbound DMA buffer allocated by
|
||||||
space should suffice for PCI.
|
dma_alloc_coherent() will be visible to a DMA engine when the CPU writes
|
||||||
|
to its MMIO control register to trigger the transfer.
|
||||||
|
|
||||||
[*] NOTE! attempting to load from the same location as was written to may
|
3. A readX() by the CPU from the peripheral will complete before any
|
||||||
cause a malfunction - consider the 16550 Rx/Tx serial registers for
|
subsequent CPU reads from memory can begin. For example, this ensures
|
||||||
example.
|
that reads by the CPU from an incoming DMA buffer allocated by
|
||||||
|
dma_alloc_coherent() will not see stale data after reading from the DMA
|
||||||
|
engine's MMIO status register to establish that the DMA transfer has
|
||||||
|
completed.
|
||||||
|
|
||||||
Used with prefetchable I/O memory, an mmiowb() barrier may be required to
|
4. A readX() by the CPU from the peripheral will complete before any
|
||||||
force stores to be ordered.
|
subsequent delay() loop can begin execution. For example, this ensures
|
||||||
|
that two MMIO register writes by the CPU to a peripheral will arrive at
|
||||||
|
least 1us apart if the first write is immediately read back with readX()
|
||||||
|
and udelay(1) is called prior to the second writeX().
|
||||||
|
|
||||||
Please refer to the PCI specification for more information on interactions
|
__iomem pointers obtained with non-default attributes (e.g. those returned
|
||||||
between PCI transactions.
|
by ioremap_wc()) are unlikely to provide many of these guarantees.
|
||||||
|
|
||||||
(*) readX_relaxed(), writeX_relaxed()
|
(*) readX_relaxed(), writeX_relaxed():
|
||||||
|
|
||||||
These are similar to readX() and writeX(), but provide weaker memory
|
These are similar to readX() and writeX(), but provide weaker memory
|
||||||
ordering guarantees. Specifically, they do not guarantee ordering with
|
ordering guarantees. Specifically, they do not guarantee ordering with
|
||||||
respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee
|
respect to normal memory accesses or delay() loops (i.e bullets 2-4 above)
|
||||||
ordering with respect to LOCK or UNLOCK operations. If the latter is
|
but they are still guaranteed to be ordered with respect to other accesses
|
||||||
required, an mmiowb() barrier can be used. Note that relaxed accesses to
|
to the same peripheral when operating on __iomem pointers mapped with the
|
||||||
the same peripheral are guaranteed to be ordered with respect to each
|
default I/O attributes.
|
||||||
other.
|
|
||||||
|
(*) readsX(), writesX():
|
||||||
|
|
||||||
|
The readsX() and writesX() MMIO accessors are designed for accessing
|
||||||
|
register-based, memory-mapped FIFOs residing on peripherals that are not
|
||||||
|
capable of performing DMA. Consequently, they provide only the ordering
|
||||||
|
guarantees of readX_relaxed() and writeX_relaxed(), as documented above.
|
||||||
|
|
||||||
|
(*) inX(), outX():
|
||||||
|
|
||||||
|
The inX() and outX() accessors are intended to access legacy port-mapped
|
||||||
|
I/O peripherals, which may require special instructions on some
|
||||||
|
architectures (notably x86). The port number of the peripheral being
|
||||||
|
accessed is passed as an argument.
|
||||||
|
|
||||||
|
Since many CPU architectures ultimately access these peripherals via an
|
||||||
|
internal virtual memory mapping, the portable ordering guarantees provided
|
||||||
|
by inX() and outX() are the same as those provided by readX() and writeX()
|
||||||
|
respectively when accessing a mapping with the default I/O attributes.
|
||||||
|
|
||||||
|
Device drivers may expect outX() to emit a non-posted write transaction
|
||||||
|
that waits for a completion response from the I/O peripheral before
|
||||||
|
returning. This is not guaranteed by all architectures and is therefore
|
||||||
|
not part of the portable ordering semantics.
|
||||||
|
|
||||||
|
(*) insX(), outsX():
|
||||||
|
|
||||||
|
As above, the insX() and outsX() accessors provide the same ordering
|
||||||
|
guarantees as readsX() and writesX() respectively when accessing a mapping
|
||||||
|
with the default I/O attributes.
|
||||||
|
|
||||||
(*) ioreadX(), iowriteX()
|
(*) ioreadX(), iowriteX()
|
||||||
|
|
||||||
These will perform appropriately for the type of access they're actually
|
These will perform appropriately for the type of access they're actually
|
||||||
doing, be it inX()/outX() or readX()/writeX().
|
doing, be it inX()/outX() or readX()/writeX().
|
||||||
|
|
||||||
|
All of these accessors assume that the underlying peripheral is little-endian,
|
||||||
|
and will therefore perform byte-swapping operations on big-endian architectures.
|
||||||
|
|
||||||
|
Composing I/O ordering barriers with SMP ordering barriers and LOCK/UNLOCK
|
||||||
|
operations is a dangerous sport which may require the use of mmiowb(). See the
|
||||||
|
subsection "Acquires vs I/O accesses" for more information.
|
||||||
|
|
||||||
========================================
|
========================================
|
||||||
ASSUMED MINIMUM EXECUTION ORDERING MODEL
|
ASSUMED MINIMUM EXECUTION ORDERING MODEL
|
||||||
|
|
Loading…
Add table
Reference in a new issue