It makes much more sense to have these actions being performed via the
prctl syscall, as they both require 2 plain arguments to be passed to
the syscall layer, and in contrast to most syscalls, we don't get in
these removed syscalls an automatic representation of Userspace<T>, but
two FlatPtr(s) to perform casting on them in the prctl syscall which is
suited to what has been done in the removed syscalls.
Also, it makes sense to have these actions in the prctl syscall, because
they are strongly related to the process control concept of the prctl
syscall.
Any userspace cpp file that included <syscall.h> would end up with
a large glob of Kernel headers included, all the way down to
Kernel/Arch/x86_64/CPU.h and friends.
Only the kernel needs RegisterState, so hide it from userspace.
This allows us to get rid of an include to LibC/sys/ttydefaults.h in the
Kernel TTY implementation.
Also, move ttydefchars static const struct to another file called
Kernel/API/ttydefaultschars.h, so it could be used too in the Kernel TTY
implementation without the need to include anything from LibC.
Instead of using a special case of the annotate_mapping syscall, let's
introduce a new prctl option to disallow further annotations of Regions
as new syscall Region(s).
This is necessary to support the wayland protocol.
I also moved the CMSG_* macros to the kernel API since they are used in
both kernel and userspace.
this does not break ntpquery/SCM_TIMESTAMP.
This replaces manually grabbing the thread's main lock.
This lets us remove the `get_thread_name` and `set_thread_name` syscalls
from the big lock. :^)
This patch removes the x86 mechanism for calling syscalls, favoring
the more modern syscall instruction. It also moves architecture
dependent code from functions that are meant to be architecture
agnostic therefore paving the way for adding more architectures.
This header has always been fundamentally a Kernel API file. Move it
where it belongs. Include it directly in Kernel files, and make
Userland applications include it via sys/ioctl.h rather than directly.
Reduce inclusion of limits.h as much as possible at the same time.
This does mean that kmalloc.h is now including Kernel/API/POSIX/limits.h
instead of LibC/limits.h, but the scope could be limited a lot more.
Basically every file in the kernel includes kmalloc.h, and needs the
limits.h include for PAGE_SIZE.
Before this patch, Core::SessionManagement::parse_path_with_sid() would
figure out the root session ID by sifting through /sys/kernel/processes.
That file can take quite a while to generate (sometimes up to 40ms on my
machine, which is a problem on its own!) and with no caching, many of
our programs were effectively doing this multiple times on startup when
unveiling something in /tmp/session/%sid/
While we should find ways to make generating /sys/kernel/processes fast
again, this patch addresses the specific problem by introducing a new
syscall: sys$get_root_session_id(). This extracts the root session ID
by looking directly at the process table and takes <1ms instead of 40ms.
This cuts WebContent process startup time by ~100ms on my machine. :^)
We add this basic functionality to the Kernel so Userspace can request a
particular virtual memory mapping to be immutable. This will be useful
later on in the DynamicLoader code.
The annotation of a particular Kernel Region as immutable implies that
the following restrictions apply, so these features are prohibited:
- Changing the region's protection bits
- Unmapping the region
- Annotating the region with other virtual memory flags
- Applying further memory advises on the region
- Changing the region name
- Re-mapping the region
This syscall will be used later on to ensure we can declare virtual
memory mappings as immutable (which means that the underlying Region is
basically immutable for both future annotations or changing the
protection bits of it).
Some programs explicitly ask for a different initial stack size than
what the OS provides. This is implemented in ELF by having a
PT_GNU_STACK header which has its p_memsz set to the amount that the
program requires. This commit implements this policy by reading the
p_memsz of the header and setting the main thread stack size to that.
ELF::Image::validate_program_headers ensures that the size attribute is
a reasonable value.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
To accomplish this, we add another VeilState which is called
LockedInherited. The idea is to apply exec unveil data, similar to
execpromises of the pledge syscall, on the current exec'ed program
during the execve sequence. When applying the forced unveil data, the
veil state is set to be locked but the special state of LockedInherited
ensures that if the new program tries to unveil paths, the request will
silently be ignored, so the program will continue running without
receiving an error, but is still can only use the paths that were
unveiled before the exec syscall. This in turn, allows us to use the
unveil syscall with a special utility to sandbox other userland programs
in terms of what is visible to them on the filesystem, and is usable on
both programs that use or don't use the unveil syscall in their code.
Our implementation for Jails resembles much of how FreeBSD jails are
working - it's essentially only a matter of using a RefPtr in the
Process class to a Jail object. Then, when we iterate over all processes
in various cases, we could ensure if either the current process is in
jail and therefore should be restricted what is visible in terms of
PID isolation, and also to be able to expose metadata about Jails in
/sys/kernel/jails node (which does not reveal anything to a process
which is in jail).
A lifetime model for the Jail object is currently plain simple - there's
simpy no way to manually delete a Jail object once it was created. Such
feature should be carefully designed to allow safe destruction of a Jail
without the possibility of releasing a process which is in Jail from the
actual jail. Each process which is attached into a Jail cannot leave it
until the end of a Process (i.e. when finalizing a Process). All jails
are kept being referenced in the JailManagement. When a last attached
process is finalized, the Jail is automatically destroyed.
The priority range was changed several years ago, but the
userland-reported limits were just forgotten :skeleyak:. Move the thread
priority constants into an API header so that userland can use it
properly.
The syscalls are renamed as they no longer reflect the exact POSIX
functionality. They can now handle setting/getting scheduler parameters
for both threads and processes.
Previously we didn't send the SIGPIPE signal to processes when
sendto()/sendmsg()/etc. returned EPIPE. And now we do.
This also adds support for MSG_NOSIGNAL to suppress the signal.
This flag doesn't conform to any POSIX standard nor is found in any OS
out there. The idea behind this mount flag is to ensure that only
non-regular files will be placed in a filesystem, which includes device
nodes, symbolic links, directories, FIFOs and sockets. Currently, the
only valid case for using this mount flag is for TmpFS instances, where
we want to mount a TmpFS but disallow any kind of regular file and only
allow other types of files on the filesystem.
This remained undetected for a long time as HeaderCheck is disabled by
default. This commit makes the following file compile again:
// file: compile_me.cpp
#include <Kernel/API/POSIX/ucontext.h>
// That's it, this was enough to cause a compilation error.