Commit graph

972 commits

Author SHA1 Message Date
Andreas Kling
09e644f0ba Kernel: Mark sys$emuctl() as not needing the big lock
This syscall doesn't do anything at all, and definitely doesn't need the
big lock. :^)
2022-03-09 16:43:00 +01:00
Andreas Kling
b4fefedd1d Kernel: Mark sys$chmod() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling
aa381c4a67 Kernel: Mark sys$fchmod() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling
d074aae422 Kernel: Mark sys$dup2() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling
8aad9e7448 Kernel: Mark sys$ftruncate() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling
69a6a4d927 Kernel: Mark sys$fstatvfs() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Hendiadyoin1
790d620b39 Kernel: Use an ArmedScopeGuard to revert changes after failed mmap 2022-03-08 15:58:51 -08:00
Andreas Kling
6354a9a030 Kernel: Mark sys$fsync() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
ef45ff4703 Kernel: Mark sys$readlink() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
2688ee28ff Kernel: Mark sys$stat() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
be7ec52ed0 Kernel: Mark sys$fstat() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
23822febd2 Kernel: Mark sys$fchdir() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
156ab0c47d Kernel: Mark sys$chdir() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
7597bef771 Kernel: Mark sys$getcwd() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
f630d0f095 Kernel: Mark sys$realpath() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
580d89f093 Kernel: Put Process unveil state in a SpinlockProtected container
This makes path resolution safe to perform without holding the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
24f02bd421 Kernel: Put Process's current directory in a SpinlockProtected
Also let's call it "current_directory" instead of "cwd" everywhere.
2022-03-08 00:19:49 +01:00
Andreas Kling
7543c34d07 Kernel: Mark sys$anon_create() as not needing the big lock
This syscall is already safe for no-big-lock since it doesn't access any
unprotected data.
2022-03-08 00:19:49 +01:00
Andreas Kling
baa6ff5649 Kernel: Wrap HIDManagement keymap data in SpinlockProtected
This serializes access to the current keymap data everywhere in the
kernel, allowing to mark sys$setkeymap() as not needing the big lock.
2022-03-07 16:35:23 +01:00
Ali Mohammad Pur
6608812e4b Kernel: Over-align the FPUState on the stack in sigreturn
The stack is misaligned at this point for some reason, this is a hack
that makes the resulting object "correctly" aligned, thus avoiding a
KUBSAN error.
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur
88d7bf7362 Kernel: Save and restore FPU state on signal dispatch on i386/x86_64 2022-03-04 20:07:05 +01:00
Ali Mohammad Pur
4bd01b7fe9 Kernel: Add support for SA_SIGINFO
We currently don't really populate most of the fields, but that can
wait :^)
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur
585054d68b Kernel: Comment the living daylights out of signal trampoline/sigreturn
Mere mortals like myself cannot understand more than two lines of
assembly without a million comments explaining what's happening, so do
that and make sure no one has to go on a wild stack state chase when
hacking on these.
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur
848eaf2220 Kernel: Reject sigaction() with SA_SIGINFO
We can't handle this, so let sigaction() fail instead of PANIC()'ing
later when we try to dispatch a signal with SA_SIGINFO set.
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur
cf63447044 Kernel: Move signal handlers from being thread state to process state
POSIX requires that sigaction() and friends set a _process-wide_ signal
handler, so move signal handlers and flags inside Process.
This also fixes a "pid/tid confusion" FIXME, as we can now send the
signal to the process and let that decide which thread should get the
signal (which is the thread with tid==pid, but that's now the Process's
problem).
Note that each thread still retains its signal mask, as that is local to
each thread.
2022-03-04 20:07:05 +01:00
Lucas CHOLLET
839d3d9f74 Kernel: Add getrusage() syscall
Only the two timeval fields are maintained, as required by the POSIX
standard.
2022-02-28 20:09:37 +01:00
Idan Horowitz
011bd06053 Kernel: Set CS selector when initializing thread context on x86_64
These are not technically required, since the Thread constructor
already sets these, but they are set on i686, so let's try and keep
consistent behaviour between the different archs.
2022-02-27 00:38:00 +02:00
Brian Gianforcaro
d05fa14e52 Kernel: Use TRY() when validating clock_id in TimeManagement
Gets rid of a bit of code duplication, and makes the API more consistent
with the style we are moving towards.
2022-02-21 15:47:51 -08:00
Brian Gianforcaro
70f3fa2dd2 Kernel: Set new process name in do_exec before waiting for the tracer
While investigating why gdb is failing when it calls `PT_CONTINUE`
against Serenity I noticed that the names of the programs in the
System Monitor didn't make sense. They were seemingly stale.

After inspecting the kernel code, it became apparent that the sequence
occurs as follows:

    1. Debugger calls `fork()`
    2. The forked child calls `PT_TRACE_ME`
    3. The `PT_TRACE_ME` instructs the forked process to block in the
       kernel waiting for a signal from the tracer on the next call
       to `execve(..)`.
    4. Debugger waits for forked child to spawn and stop, and then it
       calls `PT_ATTACH` followed by `PT_CONTINUE` on the child.
    5. Currently the `PT_CONTINUE` fails because of some other yet to
       be found bug.
    6. The process name is set immediately AFTER we are woken up by
       the `PT_CONTINUE` which never happens in the case I'm debugging.

This chain of events leaves the process suspended, with the name  of
the original (forked) process instead of the name we inherit from
the `execve(..)` call.

To avoid such confusion in the future, we set the new name before we
block waiting for the tracer.
2022-02-19 18:04:32 -08:00
Jakub Berkop
895a050e04 Kernel: Fixed argument passing for profiling_enable syscall
Arguments larger than 32bit need to be passed as a pointer on a 32bit
architectures. sys$profiling_enable has u64 event_mask argument,
which means that it needs to be passed as an pointer. Previously upper
32bits were filled by garbage.
2022-02-19 11:37:02 +01:00
Ali Mohammad Pur
a1cb2c371a AK+Kernel: OOM-harden most parts of Trie
The only part of Unveil that can't handle OOM gracefully is the
String::formatted() use in the node metadata.
2022-02-15 18:03:02 +02:00
Jakub Berkop
4916c892b2 Kernel/Profiling: Add profiling to read syscall
Syscalls to read can now be profiled, allowing us to monitor
filesystem usage by different applications.
2022-02-14 11:38:13 +01:00
Idan Horowitz
bd821982e0 Kernel: Use StringView::for_each_split_view() in sys$pledge
This let's us avoid the fallible Vector allocation that split_view()
entails.
2022-02-14 11:35:20 +01:00
Idan Horowitz
e384f62ee2 Kernel: Make master TLS region WeakPtr construction OOM-fallible 2022-02-14 11:35:20 +01:00
Idan Horowitz
c8ab7bde3b Kernel: Use try_make_weak_ptr() instead of make_weak_ptr() 2022-02-13 23:02:57 +01:00
Idan Horowitz
d6ea6c39a7 AK+Kernel: Rename try_make_weak_ptr to make_weak_ptr_if_nonnull
This matches the likes of the adopt_{own, ref}_if_nonnull family and
also frees up the name to allow us to eventually add OOM-fallible
versions of these functions.
2022-02-13 23:02:57 +01:00
Andrew Kaster
b4a7d148b1 Kernel: Expose maximum argument limit in sysconf
Move the definitions for maximum argument and environment size to
Process.h from execve.cpp. This allows sysconf(_SC_ARG_MAX) to return
the actual argument maximum of 128 KiB to userspace.
2022-02-13 22:06:54 +02:00
Idan Horowitz
57bce8ab97 Kernel: Set up Regions before adding them to a Process's AddressSpace
This reduces the amount of time in which not fully-initialized Regions
are present inside an AddressSpace's region tree.
2022-02-11 17:49:46 +02:00
Lenny Maiorani
c6acf64558 Kernel: Change static constexpr variables to constexpr where possible
Function-local `static constexpr` variables can be `constexpr`. This
can reduce memory consumption, binary size, and offer additional
compiler optimizations.

These changes result in a stripped x86_64 kernel binary size reduction
of 592 bytes.
2022-02-09 21:04:51 +00:00
Andreas Kling
cda56f8049 Kernel: Robustify and rename Inode bound socket API
Rename the bound socket accessor from socket() to bound_socket().
Also return RefPtr<LocalSocket> instead of a raw pointer, to make it
harder for callers to mess up.
2022-02-07 13:02:34 +01:00
sin-ack
24fd8fb16f Kernel: Ensure socket is suitable for writing in sys$sendmsg
Previously we would return a bytes written value of 0 if the writing end
of the socket was full. Now we either exit with EAGAIN if the socket
description is non-blocking, or block until the description can be
written to.

This is mostly a copy of the conditions in sys$write but with the "total
nwritten" parts removed as sys$sendmsg does not have that.
2022-02-07 12:21:45 +01:00
Andreas Kling
04539d4930 Kernel: Propagate sys$profiling_enable() buffer allocation failure
Caught a kernel panic when enabling profiling of all threads when there
was very little memory available.
2022-02-06 01:25:32 +01:00
Andreas Kling
3845c90e08 Kernel: Remove unnecessary includes from Thread.h
...and deal with the fallout by adding missing includes everywhere.
2022-01-30 16:21:59 +01:00
Idan Horowitz
e28af4a2fc Kernel: Stop using HashMap in Mutex
This commit removes the usage of HashMap in Mutex, thereby making Mutex
be allocation-free.

In order to achieve this several simplifications were made to Mutex,
removing unused code-paths and extra VERIFYs:
 * We no longer support 'upgrading' a shared lock holder to an
   exclusive holder when it is the only shared holder and it did not
   unlock the lock before relocking it as exclusive. NOTE: Unlike the
   rest of these changes, this scenario is not VERIFY-able in an
   allocation-free way, as a result the new LOCK_SHARED_UPGRADE_DEBUG
   debug flag was added, this flag lets Mutex allocate in order to
   detect such cases when debugging a deadlock.
 * We no longer support checking if a Mutex is locked by the current
   thread when the Mutex was not locked exclusively, the shared version
   of this check was not used anywhere.
 * We no longer support force unlocking/relocking a Mutex if the Mutex
   was not locked exclusively, the shared version of these functions
   was not used anywhere.
2022-01-29 16:45:39 +01:00
Andreas Kling
d748a3c173 Kernel: Only lock process file descriptor table once in sys$poll()
Grab the OpenFileDescriptions mutex once and hold on to it while
populating the SelectBlocker::FDVector.
2022-01-29 02:17:12 +01:00
Andreas Kling
b56646e293 Kernel: Switch process file descriptor table from spinlock to mutex
There's no reason for this to use a spinlock. Instead, let's allow
threads to block if someone else is using the descriptor table.
2022-01-29 02:17:09 +01:00
Andreas Kling
8ebec2938c Kernel: Convert process file descriptor table to a SpinlockProtected
Instead of manually locking in the various member functions of
Process::OpenFileDescriptions, simply wrap it in a SpinlockProtected.
2022-01-29 02:17:06 +01:00
Andreas Kling
b27b22a68c Kernel: Allocate entire SelectBlocker::FDVector at once
Use try_ensure_capacity() + unchecked_append() instead of repeatedly
doing try_append().
2022-01-28 23:41:18 +01:00
Andreas Kling
31c1094577 Kernel: Don't mess with thread state in Process::do_exec()
We were marking the execing thread as Runnable near the end of
Process::do_exec().

This was necessary for exec in processes that had never been scheduled
yet, which is a specific edge case that only applies to the very first
userspace process (normally SystemServer). At this point, such threads
are in the Invalid state.

In the common case (normal userspace-initiated exec), making the current
thread Runnable meant that we switched away from its current state:
Running. As the thread is indeed running, that's a bogus change!
This created a short time window in which the thread state was bogus,
and any attempt to block the thread would panic the kernel (due to a
bogus thread state in Thread::block() leading to VERIFY_NOT_REACHED().)

Fix this by not touching the thread state in Process::do_exec()
and instead make the first userspace thread Runnable directly after
calling Process::exec() on it in try_create_userspace_process().

It's unfortunate that exec() can be called both on the current thread,
and on a new thread that has never been scheduled. It would be good to
not have the latter edge case, but fixing that will require larger
architectural changes outside the scope of this fix.
2022-01-27 11:18:25 +01:00
Brian Gianforcaro
e954b4bdd4 Kernel: Return error from sys$execve() when called with zero arguments
There are many assumptions in the stack that argc is not zero, and
argv[0] points to a valid string. The recent pwnkit exploit on Linux
was able to exploit this assumption in the `pkexec` utility
(a SUID-root binary) to escalate from any user to root.

By convention `execve(..)` should always be called with at least one
valid argument, so lets enforce that semantic to harden the system
against vulnerabilities like pwnkit.

Reference: https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
2022-01-26 13:05:59 +01:00