Commit graph

71 commits

Author SHA1 Message Date
Gunnar Beutner
2a78bf8596 Kernel: Fix the return type for syscalls
The Process::Handler type has KResultOr<FlatPtr> as its return type.
Using a different return type with an equally-sized template parameter
sort of works but breaks once that condition is no longer true, e.g.
for KResultOr<int> on x86_64.

Ideally the syscall handlers would also take FlatPtrs as their args
so we can get rid of the reinterpret_cast for the function pointer
but I didn't quite feel like cleaning that up as well.
2021-06-28 22:29:28 +02:00
Gunnar Beutner
158355e0d7 Kernel+LibELF: Add support for validating and loading ELF64 executables 2021-06-28 22:29:28 +02:00
Gunnar Beutner
f630299d49 Kernel: Add support for setting up a x86_64 GDT once in C++ land 2021-06-26 11:08:52 +02:00
Hendiadyoin1
7ca3d413f7 Kernel: Pull apart CPU.h
This does not add any functional changes
2021-06-24 00:38:23 +02:00
Gunnar Beutner
fe0ae3161a Kernel: Fix use-after-free in sys$mremap
Now that Region::name() has been changed to return a StringView we
can't rely on keeping a copy of the region's name past the region's
destruction just by holding a copy of the StringView.
2021-06-02 18:00:13 +02:00
Gunnar Beutner
95c2166ca9 Kernel: Move sys$munmap functionality into a helper method 2021-05-29 15:53:08 +02:00
Brian Gianforcaro
65d5f81afc Kernel: Make PrivateInodeVMObject factory APIs OOM safe 2021-05-29 09:04:05 +02:00
Andreas Kling
fc9ce22981 Kernel: Use KString for Region names
Replace the AK::String used for Region::m_name with a KString.

This seems beneficial across the board, but as a specific data point,
it reduces time spent in sys$set_mmap_name() by ~50% on test-js. :^)
2021-05-28 09:37:09 +02:00
Hendiadyoin1
ef425a02f7 Kernel: Implement mprotect for multiple Regions 2021-05-18 16:50:52 +02:00
Brian Gianforcaro
ccdcb6a635 Kernel: Add PerformanceManager static class, move perf event APIs there
The current method of emitting performance events requires a bit of
boiler plate at every invocation, as well as having to ignore the
return code which isn't used outside of the perf event syscall. This
change attempts to clean that up by exposing high level API's that
can be used around the code base.
2021-05-07 15:35:23 +02:00
Brian Gianforcaro
234c6ae32d Kernel: Change Inode::{read/write}_bytes interface to KResultOr<ssize_t>
The error handling in all these cases was still using the old style
negative values to indicate errors. We have a nicer solution for this
now with KResultOr<T>. This change switches the interface and then all
implementers to use the new style.
2021-05-02 13:27:37 +02:00
Itamar
6bbd2ebf83 Kernel+LibELF: Support initializing values of TLS data
Previously, TLS data was always zero-initialized.

To support initializing the values of TLS data, sys$allocate_tls now
receives a buffer with the desired initial data, and copies it to the
master TLS region of the process.

The DynamicLinker gathers the initial TLS image and passes it to
sys$allocate_tls.

We also now require the size passed to sys$allocate_tls to be
page-aligned, to make things easier. Note that this doesn't waste memory
as the TLS data has to be allocated in separate pages anyway.
2021-04-30 18:47:39 +02:00
Itamar
373e8bcbc7 Kernel: Give a name to the Master TLS region allocation 2021-04-30 18:47:39 +02:00
Brian Gianforcaro
0ca668f59c Kernel: Harden sys$munmap Vector usage against OOM.
Theoretically the append should never fail as we have in-line storage
of 2, which should be enough. However I'm planning on removing the
non-try variants of AK::Vector when compiling in kernel mode in the
future, so this will need to go eventually. I suppose it also protects
against some unforeseen bug where we we can append more than 2 items.
2021-04-29 20:31:15 +02:00
Gunnar Beutner
afeee35cbf Kernel: Avoid calling characters() where not necessary 2021-04-26 23:26:58 +02:00
Gunnar Beutner
eb798d5538 Kernel+Profiler: Improve profiling subsystem
This turns the perfcore format into more a log than it was before,
which lets us properly log process, thread and region
creation/destruction. This also makes it unnecessary to dump the
process' regions every time it is scheduled like we did before.

Incidentally this also fixes 'profile -c' because we previously ended
up incorrectly dumping the parent's region map into the profile data.

Log-based mmap support enables profiling shared libraries which
are loaded at runtime, e.g. via dlopen().

This enables profiling both the parent and child process for
programs which use execve(). Previously we'd discard the profiling
data for the old process.

The Profiler tool has been updated to not treat thread IDs as
process IDs anymore. This enables support for processes with more
than one thread. Also, there's a new widget to filter which
process should be displayed.
2021-04-26 17:13:55 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Linus Groh
2b0c361d04 Everywhere: Fix a bunch of typos 2021-04-18 10:30:03 +02:00
Gunnar Beutner
c3ee70591a Kernel: Read the ELF header from the inode rather than the mapped pages
Reading from the mapping doesn't work when the text segment has a non-zero
offset because in that case the first mapped page doesn't contain the ELF
header.
2021-04-14 13:12:52 +02:00
Gunnar Beutner
2d91761cf6 Kernel: Make sure the offset stays the same when using mremap()
When using mmap() on a file with a non-zero offset subsequent
calls to mremap() would incorrectly reset the offset to zero.
2021-04-14 13:12:52 +02:00
Idan Horowitz
497c759ab7 Kernel: Remove old region from process' regions vector before splitting
This does not affect functionality right now, but it means that the
regions vector will now never have any overlapping regions, which will
allow the use of balance binary search trees instead of a vector in the
future. (since they require keys to be exclusive)
2021-04-12 18:03:44 +02:00
Jean-Baptiste Boric
6698fd84ff Kernel: Refactor storage stack with u64 as mmap offset 2021-03-19 09:15:19 +01:00
Hendiadyoin1
eba3fa5e72 Kernel: Make munmap more posix compliant
In case someone tries to unmap a not mapped region (fallback) we should
not return an error, but silently do nothing
2021-03-13 10:00:46 +01:00
Hendiadyoin1
b7f1171a1c Kernel: munmap multiple regions at a time
This implements a fallback to munmap that unmaps multiple regions at a
time, with splitting some when needed.

The way it is implemented is possibly not optimal, due to it searching
without looking into the cache
2021-03-13 10:00:46 +01:00
Andreas Kling
5e7abea31e Kernel+Profiler: Capture metadata about all profiled processes
The perfcore file format was previously limited to a single process
since the pid/executable/regions data was top-level in the JSON.

This patch moves the process-specific data into a top-level array
named "processes" and we now add entries for each process that has
been sampled during the profile run.

This makes it possible to see samples from multiple threads when
viewing a perfcore file with Profiler. This is extremely cool! :^)
2021-03-02 22:38:06 +01:00
Andreas Kling
272c2e6ec5 Kernel: Use Userspace<T> in sys${munmap,mprotect,madvise,msyscall}() 2021-03-01 15:53:33 +01:00
Andreas Kling
ac71775de5 Kernel: Make all syscall functions return KResultOr<T>
This makes it a lot easier to return errors since we no longer have to
worry about negating EFOO errors and can just return them flat.
2021-03-01 13:54:32 +01:00
Andreas Kling
8f70528f30 Kernel: Take some baby steps towards x86_64
Make more of the kernel compile in 64-bit mode, and make some things
pointer-size-agnostic (by using FlatPtr.)

There's a lot of work to do here before the kernel will even compile.
2021-02-25 16:27:12 +01:00
Andreas Kling
53c6c29158 Kernel: Tighten some typing in Arch/i386/CPU.h
Use more appropriate types for some things.
2021-02-25 11:32:27 +01:00
Andreas Kling
5d180d1f99 Everywhere: Rename ASSERT => VERIFY
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)

Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.

We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
2021-02-23 20:56:54 +01:00
Andreas Kling
84b2d4c475 Kernel: Add "map_fixed" pledge promise
This is a new promise that guards access to mmap() with MAP_FIXED.

Fixed-address mappings are rarely used, but can be useful if you are
trying to groom the process address space for malicious purposes.

None of our programs need this at the moment, as the only user of
MAP_FIXED is DynamicLoader, but the fixed mappings are constructed
before the process has had a chance to pledge anything.
2021-02-21 01:08:48 +01:00
Andreas Kling
eb92ec3149 Kernel: Factor out mmap & friends range expansion to a helper function
sys$mmap() and related syscalls must pad to the nearest page boundary
below the base address *and* above the end address of the specified
range. Since we have to do this in many places, let's make a helper.
2021-02-18 18:04:58 +01:00
Andreas Kling
575c7ed414 Kernel: Make sys$msyscall() EFAULT on non-user address
Fixes #5361.
2021-02-16 11:32:00 +01:00
Andreas Kling
6ee499aeb0 Kernel: Round old address/size in sys$mremap() to page size multiples
Found by fuzz-syscalls. :^)
2021-02-14 13:15:05 +01:00
Andreas Kling
09b1b09c19 Kernel: Assert if rounding-up-to-page-size would wrap around to 0
If we try to align a number above 0xfffff000 to the next multiple of
the page size (4 KiB), it would wrap around to 0. This is most likely
never what we want, so let's assert if that happens.
2021-02-14 10:01:50 +01:00
Andreas Kling
c877612211 Kernel: Round down base of partial ranges provided to munmap/mprotect
We were failing to round down the base of partial VM ranges. This led
to split regions being constructed that could have a non-page-aligned
base address. This would then trip assertions in the VM code.

Found by fuzz-syscalls. :^)
2021-02-13 01:49:44 +01:00
Andreas Kling
7551090056 Kernel: Round up ranges to page size multiples in munmap and mprotect
This prevents passing bad inputs to RangeAllocator who then asserts.

Found by fuzz-syscalls. :^)
2021-02-13 01:18:03 +01:00
Andreas Kling
f1b5def8fd Kernel: Factor address space management out of the Process class
This patch adds Space, a class representing a process's address space.

- Each Process has a Space.
- The Space owns the PageDirectory and all Regions in the Process.

This allows us to reorganize sys$execve() so that it constructs and
populates a new Space fully before committing to it.

Previously, we would construct the new address space while still
running in the old one, and encountering an error meant we had to do
tedious and error-prone rollback.

Those problems are now gone, replaced by what's hopefully a set of much
smaller problems and missing cleanups. :^)
2021-02-08 18:27:28 +01:00
Andreas Kling
d4dd4a82bb Kernel: Don't allow sys$msyscall() on non-mmap regions 2021-02-02 20:16:13 +01:00
Andreas Kling
823186031d Kernel: Add a way to specify which memory regions can make syscalls
This patch adds sys$msyscall() which is loosely based on an OpenBSD
mechanism for preventing syscalls from non-blessed memory regions.

It works similarly to pledge and unveil, you can call it as many
times as you like, and when you're finished, you call it with a null
pointer and it will stop accepting new regions from then on.

If a syscall later happens and doesn't originate from one of the
previously blessed regions, the kernel will simply crash the process.
2021-02-02 20:13:44 +01:00
Andreas Kling
123c37e1c0 Kernel: Fix mix-up between MAP_STACK/MAP_ANONYMOUS in prot validation 2021-01-30 10:30:17 +01:00
Andreas Kling
e55ef70e5e Kernel: Remove "has made executable exception for dynamic loader" flag
As Idan pointed out, this flag is actually not needed, since we don't
allow transitioning from previously-executable to writable anyway.
2021-01-30 10:06:52 +01:00
Andreas Kling
d0c5979d96 Kernel: Add "prot_exec" pledge promise and require it for PROT_EXEC
This prevents sys$mmap() and sys$mprotect() from creating executable
memory mappings in pledged programs that don't have this promise.

Note that the dynamic loader runs before pledging happens, so it's
unaffected by this.
2021-01-29 18:56:34 +01:00
Andreas Kling
51df44534b Kernel: Disallow mapping anonymous memory as executable
This adds another layer of defense against introducing new code into a
running process. The only permitted way of doing so is by mmapping an
open file with PROT_READ | PROT_EXEC.

This does make any future JIT implementations slightly more complicated
but I think it's a worthwhile trade-off at this point. :^)
2021-01-29 14:52:34 +01:00
Andreas Kling
af3d3c5c4a Kernel: Enforce W^X more strictly (like PaX MPROTECT)
This patch adds enforcement of two new rules:

- Memory that was previously writable cannot become executable
- Memory that was previously executable cannot become writable

Unfortunately we have to make an exception for text relocations in the
dynamic loader. Since those necessitate writing into a private copy
of library code, we allow programs to transition from RW to RX under
very specific conditions. See the implementation of sys$mprotect()'s
should_make_executable_exception_for_dynamic_loader() for details.
2021-01-29 14:52:27 +01:00
Sahan Fernando
6876b9a514 Kernel: Prevent mmap-ing as both fixed and randomized 2021-01-29 07:45:00 +01:00
Jorropo
22b0ff05d4
Kernel: sys$mmap PAGE_ROUND_UP size before calling allocate_randomized (#5154)
`allocate_randomized` assert an already sanitized size but `mmap` were
just forwarding whatever the process asked so it was possible to
trigger a kernel panic from an unpriviliged process just by asking some
randomly placed memory and a size non alligned with the page size.
This fixes this issue by rounding up to the next page size before
calling `allocate_randomized`.

Fixes #5149
2021-01-28 22:36:20 +01:00
Andreas Kling
b6937e2560 Kernel+LibC: Add MAP_RANDOMIZED flag for sys$mmap()
This can be used to request random VM placement instead of the highly
predictable regular mmap(nullptr, ...) VM allocation strategy.

It will soon be used to implement ASLR in the dynamic loader. :^)
2021-01-28 16:23:38 +01:00
Andreas Kling
e67402c702 Kernel: Remove Range "valid" state and use Optional<Range> instead
It's easier to understand VM ranges if they are always valid. We can
simply use an empty Optional<Range> to encode absence when needed.
2021-01-27 21:14:42 +01:00
Andreas Kling
5ab27e4bdc Kernel: sys$mmap() without MAP_FIXED should consider address a hint
If we can't use that specific address, it's still okay to put it
anywhere else in VM.
2021-01-27 21:14:42 +01:00