Commit graph

972 commits

Author SHA1 Message Date
Andreas Kling
8ed06ad814 Kernel: Guard Process "protected data" with a spinlock
This ensures that both mutable and immutable access to the protected
data of a process is serialized.

Note that there may still be multiple TOCTOU issues around this, as we
have a bunch of convenience accessors that make it easy to introduce
them. We'll need to audit those as well.
2022-08-21 12:25:14 +02:00
Andreas Kling
728c3fbd14 Kernel: Use RefPtr instead of LockRefPtr for Custody
By protecting all the RefPtr<Custody> objects that may be accessed from
multiple threads at the same time (with spinlocks), we remove the need
for using LockRefPtr<Custody> (which is basically a RefPtr with a
built-in spinlock.)
2022-08-21 12:25:14 +02:00
Liav A
5331d243c6 Kernel/Syscall: Make anon_create to not use Process::allocate_fd method
Instead, allocate when acquiring the lock on m_fds struct, which is
safer to do in terms of safely mutating the m_fds struct, because we
don't use the big process lock in this syscall.
2022-08-21 10:56:48 +01:00
Andreas Kling
619ac65302 Kernel: Get GID from credentials object in sys$setgroups()
I missed one instance of these. Thanks Anthony Iacono for spotting it!
2022-08-20 22:41:49 +02:00
Andreas Kling
9eeee24a39 Kernel+LibC: Enforce a limit on the number of supplementary group IDs
This patch adds the NGROUPS_MAX constant and enforces it in
sys$setgroups() to ensure that no process has more than 32 supplementary
group IDs.

The number doesn't mean anything in particular, just had to pick a
number. Perhaps one day we'll have a reason to change it.
2022-08-20 22:39:56 +02:00
Andreas Kling
998c1152ef Kernel: Mark syscalls that get/set user/group ID as not needing big lock
Now that these operate on the neatly atomic and immutable Credentials
object, they should no longer require the process big lock for
synchronization. :^)
2022-08-20 18:36:47 +02:00
Andreas Kling
122d7d9533 Kernel: Add Credentials to hold a set of user and group IDs
This patch adds a new object to hold a Process's user credentials:

- UID, EUID, SUID
- GID, EGID, SGID, extra GIDs

Credentials are immutable and child processes initially inherit the
Credentials object from their parent.

Whenever a process changes one or more of its user/group IDs, a new
Credentials object is constructed.

Any code that wants to inspect and act on a set of credentials can now
do so without worrying about data races.
2022-08-20 18:32:50 +02:00
Andreas Kling
11eee67b85 Kernel: Make self-contained locking smart pointers their own classes
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:

- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable

This patch renames the Kernel classes so that they can coexist with
the original AK classes:

- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable

The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
2022-08-20 17:20:43 +02:00
Andreas Kling
e1476788ad Kernel: Make sys$anon_create() allocate physical pages immediately
This fixes an issue where a sharing process would map the "lazy
committed page" early and then get stuck with that page even after
it had been replaced in the VMObject by a page fault.

Regressed in 27c1135d30, which made it
happen every time with the backing bitmaps used for WebContent.
2022-08-18 20:59:04 +02:00
Andreas Kling
04c362b4dd Kernel: Fix TOCTOU in sys$unveil()
Make sure we reject the unveil attempt with EPERM if the veil was locked
by another thread while we were parsing argument (and not holding the
veil state spinlock.)

Thanks Brian for spotting this! :^)

Amendment to #14907.
2022-08-18 01:04:28 +02:00
Andreas Kling
ae8558dd5c Kernel: Don't do path resolution in sys$chdir() while holding spinlock
Path resolution may do blocking I/O so we must not do it while holding
a spinlock. There are tons of problems like this throughout the kernel
and we need to find and fix all of them.
2022-08-18 00:58:34 +02:00
Samuel Bowman
b5a2f59320 Kernel: Make sys$unveil() not take the big process lock
The unveil syscall uses the UnveilData struct which is already
SpinlockProtected, so there is no need to take the big lock.
2022-08-18 00:04:31 +02:00
Linus Groh
146903a3b5 Kernel: Require semicolon after VERIFY_{NO_,}PROCESS_BIG_LOCK_ACQUIRED
This matches out general macro use, and specifically other verification
macros like VERIFY(), VERIFY_NOT_REACHED(), VERIFY_INTERRUPTS_ENABLED(),
and VERIFY_INTERRUPTS_DISABLED().
2022-08-17 22:56:51 +02:00
Andreas Kling
ce6e93d96b Kernel: Make sys$socketpair() not take the big lock
This system call mainly accesses the file descriptor table, and this is
already guarded by MutexProtected.
2022-08-16 20:43:23 +02:00
Andreas Kling
164c9617c3 Kernel: Only lock file descriptor table once in sys$pipe()
Instead of locking it twice, we now frontload all the work that doesn't
touch the fd table, and then only lock it towards the end of the
syscall.

The benefit here is simplicity. The downside is that we do a bit of
unnecessary work in the EMFILE error case, but we don't need to optimize
that case anyway.
2022-08-16 20:39:45 +02:00
Andreas Kling
b6d0636656 Kernel: Don't leak file descriptors in sys$pipe()
If the final copy_to_user() call fails when writing the file descriptors
to the output array, we have to make sure the file descriptors don't
remain in the process file descriptor table. Otherwise they are
basically leaked, as userspace is not aware of them.

This matches the behavior of our sys$socketpair() implementation.
2022-08-16 20:35:32 +02:00
Andreas Kling
307932857e Kernel: Make sys$pipe() not take the big lock
This system call mainly accesses the file descriptor table, and this is
already guarded by MutexProtected.
2022-08-16 20:20:11 +02:00
Andreas Kling
0b58fd5aef Kernel: Remove unnecessary TOCTOU bug in sys$pipe()
We don't need to explicitly check for EMFILE conditions before doing
anything in sys$pipe(). The fd allocation code will take care of it
for us anyway.
2022-08-16 20:16:17 +02:00
Andreas Kling
ae8f1c7dc8 Kernel: Leak a ref() on the new Process ASAP in sys$fork()
This fixes an issue where failing the fork due to OOM or other error,
we'd end up destroying the Process too early. By the time we got to
WaitBlockerSet::finalize(), it was long gone.
2022-08-15 00:53:28 +02:00
Brian Gianforcaro
09d5360be3 Kernel: Validate the sys$alarm signal send always succeeds
Previously we were ignoring this return code, instead use MUST(..)
to make sure it always succeeds.
2022-08-10 11:38:18 -04:00
Undefine
97cc33ca47 Everywhere: Make the codebase more architecture aware 2022-07-27 21:46:42 +00:00
zzLinus
ca74443012 Kernel/LibC: Implement posix syscall clock_getres() 2022-07-25 15:33:50 +02:00
Tim Schumacher
e79f0e2ee9 Kernel+LibC: Don't hardcode the maximum signal number everywhere 2022-07-22 10:07:15 -07:00
Idan Horowitz
3a80b25ed6 Kernel: Support F_SETLKW in fcntl 2022-07-21 16:39:22 +02:00
Idan Horowitz
9db10887a1 Kernel: Clean up sys$futex and add support for cross-process futexes 2022-07-21 16:39:22 +02:00
Idan Horowitz
55c7496200 Kernel: Propagate OOM conditions out of sys$futex 2022-07-21 16:39:22 +02:00
Idan Horowitz
364f6a9bf0 Kernel: Remove the Socket::{protocol,}connect ShouldBlock argument
This argument is always set to description.is_blocking(), but
description is also given as a separate argument, so there's no point
to piping it through separately.
2022-07-21 16:39:22 +02:00
Hendiadyoin1
c3e57bfccb Kernel: Try to set [cm]time in Inode::did_modify_contents
This indirectly resolves a fixme in sys$msync
2022-07-15 12:42:43 +02:00
Hendiadyoin1
10d9bb93be Kernel: Handle multiple regions in sys$msync 2022-07-15 12:42:43 +02:00
Hendiadyoin1
d783389877 Kernel+LibC: Add posix_fallocate syscall 2022-07-15 12:42:43 +02:00
Hendiadyoin1
ad904cdcab Kernel: Use find_last_split_view to get the executable name in do_exec 2022-07-15 12:42:43 +02:00
sin-ack
fbc771efe9 Everywhere: Use default StringView constructor over nullptr
While null StringViews are just as bad, these prevent the removal of
StringView(char const*) as that constructor accepts a nullptr.

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
Idan Horowitz
c1fe844da4 Kernel: Stop leaking first thread on errors in sys$fork
Until the thread is first set as Runnable at the end of sys$fork, its
state is Invalid, and as a result, the Finalizer which is searching for
Dying threads will never find it if the syscall short-circuits due to
an error condition like OOM. This also meant the parent Process of the
thread would be leaked as well.
2022-07-10 22:17:21 +03:00
gggggg-gggggg
d728017578 Kernel+LibC+LibCore: Pass fcntl extra argument as pointer-sized variable
The extra argument to fcntl is a pointer in the case of F_GETLK/F_SETLK
and we were pulling out a u32, leading to pointer truncation on x86_64.
Among other things, this fixes Assistant on x86_64 :^)
2022-07-10 20:09:11 +02:00
Idan Horowitz
68980bf711 Kernel: Stop reporting POLLHUP exclusively when available in sys$poll
As per Dr. Posix, unlike POLLERR and POLLNVAL, POLLHUP is only mutually
exclusive with POLLOUT, all other events may be reported together with
it.
2022-07-10 14:24:34 +02:00
Idan Horowitz
275e5cdb64 Kernel: Report POLLNVAL events in sys$poll instead of returning EBADF
As required by Dr. Posix.
2022-07-10 14:24:34 +02:00
Idan Horowitz
e32f6903f6 Kernel: Stop providing POLLRDHUP events in sys$poll by default
Dr. Posix specifies that only POLLERR, POLLHUP & POLLNVAL are provided
by default.
2022-07-10 14:24:34 +02:00
Idan Horowitz
5ca46abb51 Kernel: Set POLLHUP on WriteHangUp in sys$poll instead of POLLNVAL
POLLNVAL signifies an invalid fd, not a write hang up.
2022-07-10 14:24:34 +02:00
Idan Horowitz
a6f237a247 Kernel: Accept SHUT_RD and SHUT_WR as shutdown() how values
The previous check for valid how values assumed this field was a bitmap
and that SHUT_RDWR was simply a bitwise or of SHUT_RD and SHUT_WR,
which is not the case.
2022-07-10 14:24:34 +02:00
Tim Schumacher
cf0ad3715e Kernel: Implement sigsuspend using a SignalBlocker
`sigsuspend` was previously implemented using a poll on an empty set of
file descriptors. However, this broke quite a few assumptions in
`SelectBlocker`, as it verifies at least one file descriptor to be
ready after waking up and as it relies on being notified by the file
descriptor.

A bare-bones `sigsuspend` may also be implemented by relying on any of
the `sigwait` functions, but as `sigsuspend` features several (currently
unimplemented) restrictions on how returns work, it is a syscall on its
own.
2022-07-08 22:27:38 +00:00
Tim Schumacher
edbffb3c7a Kernel: Unblock SignalBlocker if a signal was just unmarked as pending
When updating the signal mask, there is a small frame where we might set
up the receiving process for handing the signal and therefore remove
that signal from the list of pending signals before SignalBlocker has a
chance to block. In turn, this might cause SignalBlocker to never notice
that the signal arrives and it will never unblock once blocked.

Track the currently handled signal separately and include it when
determining if SignalBlocker should be unblocking.
2022-07-08 22:27:38 +00:00
Tim Schumacher
5efa8e507b Kernel: Implement an axallowed mount option
Similar to `W^X` and `wxallowed`, this allows for anonymous executable
mappings.
2022-07-08 22:27:38 +00:00
Tim Schumacher
add4dd3589 Kernel: Do a POSIX-correct signal handler reset on exec 2022-07-05 20:58:38 +03:00
Andrew Kaster
455038d6fc Kernel: Add sysconf for IOV_MAX 2022-06-19 09:05:35 +02:00
Timon Kruiper
a4534678f9 Kernel: Implement InterruptDisabler using generic Processor functions
Now that the code does not use architectural specific code, it is moved
to the generic Arch directory and the paths are modified accordingly.
2022-06-02 13:14:12 +01:00
Liav A
58acdce41f Kernel/FileSystem: Simplify even more the mount syscall
As with the previous commit, we put a distinction between filesystems
that require a file description and those which don't, but now in a much
more readable mechanism - all initialization properties as well as the
create static method are grouped to create the FileSystemInitializer
structure. Then when we need to initialize an instance, we iterate over
a table of these structures, checking for matching structure and then
validating the given arguments from userspace against the requirements
to ensure we can create a valid instance of the requested filesystem.
2022-05-29 19:31:02 +01:00
Liav A
4c588441e3 Kernel: Simplify mount syscall flow for regular calls
We do this by putting a distinction between two types of filesystems -
the first type is backed in RAM, and includes TmpFS, ProcFS, SysFS,
DevPtsFS and DevTmpFS. Because these filesystems are backed in RAM,
trying to mount them doesn't require source open file description.
The second type is filesystems that are backed by a file, therefore the
userspace program has to open them (hence it has a open file description
on them) and provide the appropriate source open file description.
By putting this distinction, we can early check if the user tried to
mount the second type of filesystems without a valid file description,
and fail with EBADF then.
Otherwise, we can proceed to either mount either type of filesystem,
provided that the fs_type is valid.
2022-05-29 19:31:02 +01:00
Peter Elliott
f6943c85b0 Kernel: Fix EINVAL when mmaping with address and no MAP_FIXED
The current behavior accidently trys to allocate 0 bytes when a non-null
address is provided and MAP_FIXED is specified. This is clearly a bug.
2022-05-23 00:13:26 +02:00
Ariel Don
8a854ba309 Kernel+LibC: Implement futimens(3)
Implement futimes() in terms of utimensat(). Now, utimensat() strays
from POSIX compliance because it also accepts a combination of a file
descriptor of a regular file and an empty path. utimensat() then uses
this file descriptor instead of the path to update the last access
and/or modification time of a file. That being said, its prior behavior
remains intact.

With the new behavior of utimensat(), `path` must point to a valid
string; given a null pointer instead of an empty string, utimensat()
sets `errno` to `EFAULT` and returns a failure.
2022-05-21 18:15:00 +02:00