mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-22 07:53:11 -05:00
5c1806c41c
While fuzzing an arm64 kernel, Alexander Potapenko reported:
| BUG: KCSAN: data-race in ktime_get_mono_fast_ns / timekeeping_update
|
| write to 0xffffffc082e74248 of 56 bytes by interrupt on cpu 0:
| update_fast_timekeeper kernel/time/timekeeping.c:430 [inline]
| timekeeping_update+0x1d8/0x2d8 kernel/time/timekeeping.c:768
| timekeeping_advance+0x9e8/0xb78 kernel/time/timekeeping.c:2344
| update_wall_time+0x18/0x38 kernel/time/timekeeping.c:2360
| [...]
|
| read to 0xffffffc082e74258 of 8 bytes by task 5260 on cpu 1:
| __ktime_get_fast_ns kernel/time/timekeeping.c:372 [inline]
| ktime_get_mono_fast_ns+0x88/0x174 kernel/time/timekeeping.c:489
| init_srcu_struct_fields+0x40c/0x530 kernel/rcu/srcutree.c:263
| init_srcu_struct+0x14/0x20 kernel/rcu/srcutree.c:311
| [...]
|
| value changed: 0x000002f875d33266 -> 0x000002f877416866
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 1 UID: 0 PID: 5260 Comm: syz.2.7483 Not tainted 6.12.0-rc3-dirty #78
This is a false positive data race between a seqcount latch writer and a reader
accessing stale data. Since its introduction, KCSAN has never understood the
seqcount_latch interface (due to being unannotated).
Unlike the regular seqlock interface, the seqcount_latch interface for latch
writers never has had a well-defined critical section, making it difficult to
teach tooling where the critical section starts and ends.
Introduce an instrumentable (non-raw) seqcount_latch interface, with
which we can clearly denote writer critical sections. This both helps
readability and tooling like KCSAN to understand when the writer is done
updating all latch copies.
Fixes: 88ecd153be
("seqlock, kcsan: Add annotations for KCSAN")
Reported-by: Alexander Potapenko <glider@google.com>
Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241104161910.780003-4-elver@google.com
239 lines
7.5 KiB
ReStructuredText
239 lines
7.5 KiB
ReStructuredText
======================================
|
|
Sequence counters and sequential locks
|
|
======================================
|
|
|
|
Introduction
|
|
============
|
|
|
|
Sequence counters are a reader-writer consistency mechanism with
|
|
lockless readers (read-only retry loops), and no writer starvation. They
|
|
are used for data that's rarely written to (e.g. system time), where the
|
|
reader wants a consistent set of information and is willing to retry if
|
|
that information changes.
|
|
|
|
A data set is consistent when the sequence count at the beginning of the
|
|
read side critical section is even and the same sequence count value is
|
|
read again at the end of the critical section. The data in the set must
|
|
be copied out inside the read side critical section. If the sequence
|
|
count has changed between the start and the end of the critical section,
|
|
the reader must retry.
|
|
|
|
Writers increment the sequence count at the start and the end of their
|
|
critical section. After starting the critical section the sequence count
|
|
is odd and indicates to the readers that an update is in progress. At
|
|
the end of the write side critical section the sequence count becomes
|
|
even again which lets readers make progress.
|
|
|
|
A sequence counter write side critical section must never be preempted
|
|
or interrupted by read side sections. Otherwise the reader will spin for
|
|
the entire scheduler tick due to the odd sequence count value and the
|
|
interrupted writer. If that reader belongs to a real-time scheduling
|
|
class, it can spin forever and the kernel will livelock.
|
|
|
|
This mechanism cannot be used if the protected data contains pointers,
|
|
as the writer can invalidate a pointer that the reader is following.
|
|
|
|
|
|
.. _seqcount_t:
|
|
|
|
Sequence counters (``seqcount_t``)
|
|
==================================
|
|
|
|
This is the raw counting mechanism, which does not protect against
|
|
multiple writers. Write side critical sections must thus be serialized
|
|
by an external lock.
|
|
|
|
If the write serialization primitive is not implicitly disabling
|
|
preemption, preemption must be explicitly disabled before entering the
|
|
write side section. If the read section can be invoked from hardirq or
|
|
softirq contexts, interrupts or bottom halves must also be respectively
|
|
disabled before entering the write section.
|
|
|
|
If it's desired to automatically handle the sequence counter
|
|
requirements of writer serialization and non-preemptibility, use
|
|
:ref:`seqlock_t` instead.
|
|
|
|
Initialization::
|
|
|
|
/* dynamic */
|
|
seqcount_t foo_seqcount;
|
|
seqcount_init(&foo_seqcount);
|
|
|
|
/* static */
|
|
static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
|
|
|
|
/* C99 struct init */
|
|
struct {
|
|
.seq = SEQCNT_ZERO(foo.seq),
|
|
} foo;
|
|
|
|
Write path::
|
|
|
|
/* Serialized context with disabled preemption */
|
|
|
|
write_seqcount_begin(&foo_seqcount);
|
|
|
|
/* ... [[write-side critical section]] ... */
|
|
|
|
write_seqcount_end(&foo_seqcount);
|
|
|
|
Read path::
|
|
|
|
do {
|
|
seq = read_seqcount_begin(&foo_seqcount);
|
|
|
|
/* ... [[read-side critical section]] ... */
|
|
|
|
} while (read_seqcount_retry(&foo_seqcount, seq));
|
|
|
|
|
|
.. _seqcount_locktype_t:
|
|
|
|
Sequence counters with associated locks (``seqcount_LOCKNAME_t``)
|
|
-----------------------------------------------------------------
|
|
|
|
As discussed at :ref:`seqcount_t`, sequence count write side critical
|
|
sections must be serialized and non-preemptible. This variant of
|
|
sequence counters associate the lock used for writer serialization at
|
|
initialization time, which enables lockdep to validate that the write
|
|
side critical sections are properly serialized.
|
|
|
|
This lock association is a NOOP if lockdep is disabled and has neither
|
|
storage nor runtime overhead. If lockdep is enabled, the lock pointer is
|
|
stored in struct seqcount and lockdep's "lock is held" assertions are
|
|
injected at the beginning of the write side critical section to validate
|
|
that it is properly protected.
|
|
|
|
For lock types which do not implicitly disable preemption, preemption
|
|
protection is enforced in the write side function.
|
|
|
|
The following sequence counters with associated locks are defined:
|
|
|
|
- ``seqcount_spinlock_t``
|
|
- ``seqcount_raw_spinlock_t``
|
|
- ``seqcount_rwlock_t``
|
|
- ``seqcount_mutex_t``
|
|
- ``seqcount_ww_mutex_t``
|
|
|
|
The sequence counter read and write APIs can take either a plain
|
|
seqcount_t or any of the seqcount_LOCKNAME_t variants above.
|
|
|
|
Initialization (replace "LOCKNAME" with one of the supported locks)::
|
|
|
|
/* dynamic */
|
|
seqcount_LOCKNAME_t foo_seqcount;
|
|
seqcount_LOCKNAME_init(&foo_seqcount, &lock);
|
|
|
|
/* static */
|
|
static seqcount_LOCKNAME_t foo_seqcount =
|
|
SEQCNT_LOCKNAME_ZERO(foo_seqcount, &lock);
|
|
|
|
/* C99 struct init */
|
|
struct {
|
|
.seq = SEQCNT_LOCKNAME_ZERO(foo.seq, &lock),
|
|
} foo;
|
|
|
|
Write path: same as in :ref:`seqcount_t`, while running from a context
|
|
with the associated write serialization lock acquired.
|
|
|
|
Read path: same as in :ref:`seqcount_t`.
|
|
|
|
|
|
.. _seqcount_latch_t:
|
|
|
|
Latch sequence counters (``seqcount_latch_t``)
|
|
----------------------------------------------
|
|
|
|
Latch sequence counters are a multiversion concurrency control mechanism
|
|
where the embedded seqcount_t counter even/odd value is used to switch
|
|
between two copies of protected data. This allows the sequence counter
|
|
read path to safely interrupt its own write side critical section.
|
|
|
|
Use seqcount_latch_t when the write side sections cannot be protected
|
|
from interruption by readers. This is typically the case when the read
|
|
side can be invoked from NMI handlers.
|
|
|
|
Check `write_seqcount_latch()` for more information.
|
|
|
|
|
|
.. _seqlock_t:
|
|
|
|
Sequential locks (``seqlock_t``)
|
|
================================
|
|
|
|
This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
|
|
embedded spinlock for writer serialization and non-preemptibility.
|
|
|
|
If the read side section can be invoked from hardirq or softirq context,
|
|
use the write side function variants which disable interrupts or bottom
|
|
halves respectively.
|
|
|
|
Initialization::
|
|
|
|
/* dynamic */
|
|
seqlock_t foo_seqlock;
|
|
seqlock_init(&foo_seqlock);
|
|
|
|
/* static */
|
|
static DEFINE_SEQLOCK(foo_seqlock);
|
|
|
|
/* C99 struct init */
|
|
struct {
|
|
.seql = __SEQLOCK_UNLOCKED(foo.seql)
|
|
} foo;
|
|
|
|
Write path::
|
|
|
|
write_seqlock(&foo_seqlock);
|
|
|
|
/* ... [[write-side critical section]] ... */
|
|
|
|
write_sequnlock(&foo_seqlock);
|
|
|
|
Read path, three categories:
|
|
|
|
1. Normal Sequence readers which never block a writer but they must
|
|
retry if a writer is in progress by detecting change in the sequence
|
|
number. Writers do not wait for a sequence reader::
|
|
|
|
do {
|
|
seq = read_seqbegin(&foo_seqlock);
|
|
|
|
/* ... [[read-side critical section]] ... */
|
|
|
|
} while (read_seqretry(&foo_seqlock, seq));
|
|
|
|
2. Locking readers which will wait if a writer or another locking reader
|
|
is in progress. A locking reader in progress will also block a writer
|
|
from entering its critical section. This read lock is
|
|
exclusive. Unlike rwlock_t, only one locking reader can acquire it::
|
|
|
|
read_seqlock_excl(&foo_seqlock);
|
|
|
|
/* ... [[read-side critical section]] ... */
|
|
|
|
read_sequnlock_excl(&foo_seqlock);
|
|
|
|
3. Conditional lockless reader (as in 1), or locking reader (as in 2),
|
|
according to a passed marker. This is used to avoid lockless readers
|
|
starvation (too much retry loops) in case of a sharp spike in write
|
|
activity. First, a lockless read is tried (even marker passed). If
|
|
that trial fails (odd sequence counter is returned, which is used as
|
|
the next iteration marker), the lockless read is transformed to a
|
|
full locking read and no retry loop is necessary::
|
|
|
|
/* marker; even initialization */
|
|
int seq = 0;
|
|
do {
|
|
read_seqbegin_or_lock(&foo_seqlock, &seq);
|
|
|
|
/* ... [[read-side critical section]] ... */
|
|
|
|
} while (need_seqretry(&foo_seqlock, seq));
|
|
done_seqretry(&foo_seqlock, seq);
|
|
|
|
|
|
API documentation
|
|
=================
|
|
|
|
.. kernel-doc:: include/linux/seqlock.h
|