Caching the cluster list allows us to fill the two fields in the
InodeMetadata. While at it, don't cache the metadata as when we
have write support having to keep both InodeMetadata and FATEntry
correct is going to get very annoying.
dbgln() will always take its arguments by reference when possible, which
causes UB when dealing with packed structs. To avoid this, we now
explicitly copy all members whose alignment requirements aren't met.
This removes the allocate_tls syscall and adds an archctl option to set
the fs_base for the current thread on x86-64, since you can't set that
register from userspace. enter_thread_context loads the fs_base for the
next thread on each context switch.
This also moves tpidr_el0 (the thread pointer register on AArch64) to
the register state, so it gets properly saved/restored on context
switches.
The userspace TLS allocation code is kept pretty similar to the original
kernel TLS code, aside from a couple of style changes.
We also have to add a new argument "tls_pointer" to
SC_create_thread_params, as we otherwise can't prevent race conditions
between setting the thread pointer register and signal handling code
that might be triggered before the thread pointer was set, which could
use TLS.
If there's no fds to copy in a message with proper space for an
SCM_RIGHTS set of cmsg headers, then don't try to copy them.
This avoids a Kernel panic when recvmsg-ing, as copy_to_user(p, 0, 0)
hits a VERIFY.
These changes are compatible with clang-format 16 and will be mandatory
when we eventually bump clang-format version. So, since there are no
real downsides, let's commit them now.
Small position independent code model (which we end up using after this
change) is suitable for us since the kernel is not expected to grow more
than 2Gb in size. This might be a bit risky since this model is not
mentioned anywhere except for System V ABI document but experiments show
that the kernel compiled with this change works just fine.
This otherwise caused a race condition between the signal dispatcher
(which sets sepc to the signal trampoline) and sepc being updated in the
trap handler.
We obviously have to keep the sepc set by the signal dispatcher and not
increment it afterwards.
While this clutters Process.cpp a tiny bit, I feel that it's worth it:
- 2x speed on the kcov_loop benchmark. Likely more during fuzzing.
- Overall code complexity is going down with this change.
- By reducing the code reachable from __sanitizer_cov_trace_pc code,
we can now instrument more code.
Sticking this to the function source has multiple benefits:
- We instrument more code, by not excluding entire files.
- NO_SANITIZE_COVERAGE can be used in Header files.
- Keeping the info with the source code, means if a function or
file is moved around, the NO_SANITIZE_COVERAGE moves with it.
This reverts commit 9dbec601b0.
For KCOV to be performant (or at least not even slower) we need to
mmap the PC buffer from both user and kernel space at the same time.
You can't mmap a character device, so this change didn't make sense.
Plus even if we did invent a new method to exfiltrate the coverage
information out of the kernel, it would be incompatible with existing
kernel fuzzers. That would be kind of annoying. 🙃
This simple delay loop uses the time CSR to wait for the given amount
of time. The tick frequency of the CSR is read from the
/cpus/timebase-frequency devicetree property.
Instead, rewrite the region page fault handling code to not use
PageFault::type() on RISC-V.
I split Region::handle_fault into having a RISC-V-specific
implementation, as I am not sure if I cover all page fault handling edge
cases by solely relying on MM's own region metadata.
We should probably also take the processor-provided page fault reason
into account, if we decide to merge these two implementations in the
future.
This commit also removes the unnecessary user_sp RegisterState member.
We never use the kernel stack pointer on entry, so we can simply always
store the stack pointer of the previous privilege mode in sp.
Also remove the sp member from mcontext, as RISC-V doesn't have a
dedicated stack pointer register.
sp is defined to be x2 (x[1] in our case) by the ABI.
I probably accidentally included sp while copying the struct from
aarch64.
sepc has to be incremented before the call to syscall_handler,
as we otherwise would return to the ecall instruction, resulting in an
infinite trap loop.
We can't increment it after syscall_handler, as sepc might get changed
while handling the syscall.
The value of this field is incremented by one, as a value of 0 for this
field means 1 entry supported.
A value of 0xffff for CAP.MQES would incorrectly by truncated to 0x0000,
if we don't increase the bit width of the return type.
RFC9293 states that from the TimeWait state the TCPSocket
should wait the MSL (2mins) for delayed segments to expire
so that their sequence numbers do not clash with a new
connection's sequence numbers using the same ip address
and port number. The wait also ensures the remote TCP peer
has received the ACK to their FIN segment.
Please be aware that I only have NIC with chip version 6 so
this is the only one that I have tested. Rest was implemented
via looking at Linux rtl8169 driver. Also thanks to IdanHo
for some initial work.
Either we mount from a loop device or other source, the user might want
to obfuscate the given source for security reasons, so this option will
ensure this will happen.
If passed during a mount, the source will be hidden when reading from
the /sys/kernel/df node.
Similarly to DevPtsFS, this filesystem is about exposing loop device
nodes easily in /dev/loop, so userspace doesn't need to do anything in
order to use new devices immediately.
Instead of returning a raw pointer, which could be technically invalid
when using it in the caller function, we return a valid RefPtr of such
device.
This ensures that the code in DevPtsFS is now safe from a rare race
condition in which the SlavePTY device is gone but we still have a
pointer to it.
This device is a block device that allows a user to effectively treat an
Inode as a block device.
The static construction method is given an OpenFileDescription reference
but validates that:
- The description has a valid custody (so it's not some arbitrary file).
Failing this requirement will yield EINVAL.
- The description custody points to an Inode which is a regular file, as
we only support (seekable) regular files. Failing this requirement
will yield ENOTSUP.
LoopDevice can be used to mount a regular file on the filesystem like
other supported types of (physical) block devices.
Instead of using C-arrays, and manually counting their lengths, use
AK::Array. And pass these arrays around as spans, instead of as pointer-
and-length pairs.