If we find ourselves with a user-accessible, non-shared Region backed by
a SharedInodeVMObject, that's pretty bad news, so let's just panic the
kernel instead of getting abused.
There might be a better place for this kind of check, so I've added a
FIXME about putting more thought into that.
This was exploitable since the shared flag determines whether inode
permission checks are applied in sys$mprotect().
The bug was pretty hard to spot due to default arguments being used
instead. This patch removes the default arguments to make explicit
at each call site what's being done.
When passing nullptr for either promises or execpromises to pledge(),
the expected behaviour is to not change their current value at all - we
were accidentally resetting them to 0, effectively dropping previously
pledge()'d promises.
We now move the execpromises state into the regular promises, and clear
the execpromises state.
Also make sure to duplicate the promise state on fork.
This fixes an issue where "su" would launch a shell which immediately
crashed due to not having pledged "stdio".
Let's force callers to provide a VM range when allocating a region.
This makes ENOMEM error handling more visible and removes implicit
VM allocation which felt a bit magical.
This tells the kernel that the process wants to use pledge, but without
pledging anything - effectively restricting it to syscalls that don't
require a certain promise. This is part of OpenBSD's pledge() as well,
which served as basis for Serenity's.
We were enabling interrupts too early, before the first context switch to
a thread was complete. This could then trigger another context switch
within the context switch, which lead to a crash.
There is a window between acquiring/releasing the lock with the atomic
variables and subsequently waiting or waking threads. With a single
processor this window was closed by using a critical section, but
this doesn't prevent other processors from running these code paths.
To solve this, set a flag in the WaitQueue while holding m_lock which
determines if threads should be blocked at all.
Instead of letting each File subclass do range allocation in their
mmap() override, do it up front in sys$mmap().
This makes us honor alignment requests for file-backed memory mappings
and simplifies the code somwhat.
This was done with the help of several scripts, I dump them here to
easily find them later:
awk '/#ifdef/ { print "#cmakedefine01 "$2 }' AK/Debug.h.in
for debug_macro in $(awk '/#ifdef/ { print $2 }' AK/Debug.h.in)
do
find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/#ifdef '$debug_macro'/#if '$debug_macro'/' {} \;
done
# Remember to remove WRAPPER_GERNERATOR_DEBUG from the list.
awk '/#cmake/ { print "set("$2" ON)" }' AK/Debug.h.in
Booting old computers without RDRAND/RDSEED and without a disk makes
the system severely starved for entropy. Uses interrupts as a source
to side-step that issue.
Also warn whenever the system is starved of entropy, because that's
a non-obvious failure mode.
For some reason we were keeping the bits 04777 in file modes. That
doesn't seem right and I can't think of a reason why the set-uid bit
should be allowed to slip through.
(mode & S_IFDIR) is not enough to check if "mode" is a directory,
we have to check all the bits in the S_IFMT mask.
Use the is_directory() helper to fix this bug.
Since devices are enumerable and can compute their own name inside the
/dev hierarchy, there is no need to try and parse "root=/dev/xxx" by
hand.
This also makes any block device a candidate for the boot device, which
now includes ramdisk devices, so SerenityOS can now boot diskless too.
The disk image generated for QEMU is suitable, as long as it fits in
memory with room to spare for the rest of the system.
Besides removing the monolithic DevFSDeviceInode::determine_name()
method, being able to determine a device's name inside the /dev
hierarchy outside of DevFS has its uses.
The kernel ignored the first 8 MiB of RAM while parsing the memory map
because the kmalloc heaps and the super physical pages lived here. Move
all that stuff inside the .bss segment so that those memory regions are
accounted for, otherwise we risk overwriting boot modules placed next
to the kernel.