Commit graph

268 commits

Author SHA1 Message Date
Andreas Kling
791b32e3c6 Kernel: Remove an unnecessary cast in sys$execve() 2020-12-25 14:16:35 +01:00
Andreas Kling
9c640e67ac Kernel: Don't fetch full inode metadata in sys$execve()
We only need the size, so let's not fetch all the metadata.
2020-12-25 14:15:33 +01:00
Andreas Kling
c3eddbcb49 Kernel: Add back missing ELF::Image validity check
If the image is not a valid ELF we should just fail ASAP.
2020-12-25 14:13:44 +01:00
Andreas Kling
4986f268a5 Kernel: Convert dbg() => dbgln() in sys$execve() 2020-12-25 12:51:35 +01:00
Andreas Kling
09129782de Kernel: Simplify ELF loading logic in sys$execve() somewhat
Get rid of the lambda functions and put the logic inline in the program
header traversal loop instead. This makes the code quite a bit shorter
and hopefully makes it easier to see what's going on.
2020-12-25 02:33:57 +01:00
Andreas Kling
1e4c010643 LibELF: Remove ELF::Loader and move everyone to ELF::Image
This commit gets rid of ELF::Loader entirely since its very ambiguous
purpose was actually to load executables for the kernel, and that is
now handled by the kernel itself.

This patch includes some drive-by cleanup in LibDebug and CrashDaemon
enabled by the fact that we no longer need to keep the ref-counted
ELF::Loader around.
2020-12-25 02:14:56 +01:00
Andreas Kling
7551a66f73 Kernel+LibELF: Move sys$execve()'s loading logic from LibELF to Kernel
It was really weird that ELF loading was performed by the ELF::Loader
class instead of just being done by the kernel itself. This patch moves
all the layout logic from ELF::Loader over to sys$execve().

The kernel no longer cares about ELF::Loader and instead only uses an
ELF::Image as an interpreting wrapper around executables.
2020-12-25 01:22:55 +01:00
Itamar
0cb636078a Kernel+LibELF: Allow Non ET_DYN executables to have an interpreter 2020-12-24 21:34:51 +01:00
Itamar
d64d0451e5 Kernel: Fix mmap with specific address for file backed mappings 2020-12-24 21:34:51 +01:00
Andreas Kling
1e21d49e86 Kernel: Fix wrong-looking overflow check in sys$execve()
This was harmless since sizeof(length) and sizeof(strings) are both 4
on x86 but let's check the right things regardless.
2020-12-23 20:34:22 +01:00
Andreas Kling
6bfbc5f5f5 Kernel: Don't allow modifying IOPL via sys$ptrace() or sys$sigreturn()
It was possible to overwrite the entire EFLAGS register since we didn't
do any masking in the ptrace and sigreturn syscalls.

This made it trivial to gain IO privileges by raising IOPL to 3 and
then you could talk to hardware to do all kinds of nasty things.

Thanks to @allesctf for finding these issues! :^)

Their exploit/write-up: https://github.com/allesctf/writeups/blob/master/2020/hxpctf/wisdom2/writeup.md
2020-12-22 19:38:25 +01:00
Andreas Kling
2dfe5751f3 Kernel: Abort core dump generation if any substep fails
And make an effort to propagate errors out from the inner parts.
This fixes an issue where the kernel would infinitely loop in coredump
generation if the TmpFS filled up.
2020-12-22 10:09:41 +01:00
Tom
5f51d85184 Kernel: Improve time keeping and dramatically reduce interrupt load
This implements a number of changes related to time:
* If a HPET is present, it is now used only as a system timer, unless
  the Local APIC timer is used (in which case the HPET timer will not
  trigger any interrupts at all).
* If a HPET is present, the current time can now be as accurate as the
  chip can be, independently from the system timer. We now query the
  HPET main counter for the current time in CPU #0's system timer
  interrupt, and use that as a base line. If a high precision time is
  queried, that base line is used in combination with quering the HPET
  timer directly, which should give a much more accurate time stamp at
  the expense of more overhead. For faster time stamps, the more coarse
  value based on the last interrupt will be returned. This also means
  that any missed interrupts should not cause the time to drift.
* The default system interrupt rate is reduced to about 250 per second.
* Fix calculation of Thread CPU usage by using the amount of ticks they
  used rather than the number of times a context switch happened.
* Implement CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE and use it
  for most cases where precise timestamps are not needed.
2020-12-21 18:26:12 +01:00
Lenny Maiorani
765936ebae
Everywhere: Switch from (void) to [[maybe_unused]] (#4473)
Problem:
- `(void)` simply casts the expression to void. This is understood to
  indicate that it is ignored, but this is really a compiler trick to
  get the compiler to not generate a warning.

Solution:
- Use the `[[maybe_unused]]` attribute to indicate the value is unused.

Note:
- Functions taking a `(void)` argument list have also been changed to
  `()` because this is not needed and shows up in the same grep
  command.
2020-12-21 00:09:48 +01:00
Andreas Kling
34e9df3c5e Kernel: Randomize memory location of the dynamic loader :^)
This should make it a little bit harder for those who would mess with
our loader.
2020-12-20 18:49:24 +01:00
Andreas Kling
02ef3f6343 Kernel: Ptrace should not assert on poke in non-mapped tracee memory 2020-12-20 18:49:24 +01:00
Andreas Kling
9bf02c32c0 Kernel: Activate SUID/SGID credentials earlier in sys$execve()
Switch on the new credentials before loading the new executable into
memory. This ensures that attempts to ptrace() the program from an
unprivileged process will fail.

This covers one bug that was exploited in the 2020 HXP CTF:
https://hxp.io/blog/79/hxp-CTF-2020-wisdom2/

Thanks to yyyyyyy for finding the bug! :^)
2020-12-20 18:49:18 +01:00
Andreas Kling
5505159a94 Kernel: Silence debug spam about select() being interrupted 2020-12-20 16:06:52 +01:00
Andreas Kling
e5eda151b4 Kernel: Silence debug spam when running dynamically linked programs 2020-12-20 16:06:39 +01:00
Andreas Kling
8e79bde2b7 Kernel: Move KBufferBuilder to the fallible KBuffer API
KBufferBuilder::build() now returns an OwnPtr<KBuffer> and can fail.
Clients of the API have been updated to handle that situation.
2020-12-18 19:22:26 +01:00
Tom
c4176b0da1 Kernel: Fix Lock race causing infinite spinning between two threads
We need to account for how many shared lock instances the current
thread owns, so that we can properly release such references when
yielding execution.

We also need to release the process lock when donating.
2020-12-16 23:38:17 +01:00
Andreas Kling
4befc2c282 Kernel: Avoid null dereference in sys$profiling_disable()
If we can't create a profiling coredump object, we shouldn't try to
call write() on it.
2020-12-15 11:25:51 +01:00
Andreas Kling
28c042e46f Kernel: Make CoreDump::m_num_program_headers const
This makes it an error to assign to it after construction.
2020-12-15 11:24:46 +01:00
Andreas Kling
ff8bf4db8d Kernel: Don't take LexicalPath as argument
LexicalPath is a big and heavy class that's really meant as a helper
for extracting parts of a path, not for storage or passing around.
Instead, pass paths around as strings and use LexicalPath locally
as needed.
2020-12-15 11:17:01 +01:00
Itamar
1efbbf3ac3 Kernel: Don't generate a backtrace when a process exists with non-zero
..status
2020-12-14 23:05:53 +01:00
Itamar
5392f42731 Kernel: Generate coredumps for profiled processes
These coredumps will be used by the Profile Viewer to symbolicate the
profiling samples.
2020-12-14 23:05:53 +01:00
Itamar
39890af833 Kernel: Pass full path of output coredump file to CoreDump 2020-12-14 23:05:53 +01:00
Itamar
b4842d33bb Kernel: Generate a coredump file when a process crashes
When a process crashes, we generate a coredump file and write it in
/tmp/coredumps/.

The coredump file is an ELF file of type ET_CORE.
It contains a segment for every userspace memory region of the process,
and an additional PT_NOTE segment that contains the registers state for
each thread, and a additional data about memory regions
(e.g their name).
2020-12-14 23:05:53 +01:00
Itamar
efe4da57df Loader: Stabilize loader & Use shared libraries everywhere :^)
The dynamic loader is now stable enough to be used everywhere in the
system - so this commit does just that.
No More .a Files, Long Live .so's!
2020-12-14 23:05:53 +01:00
Itamar
9ca1a0731f Kernel: Support TLS allocation from userspace
This adds an allocate_tls syscall through which a userspace process
can request the allocation of a TLS region with a given size.

This will be used by the dynamic loader to allocate TLS for the main
executable & its libraries.
2020-12-14 23:05:53 +01:00
Itamar
5b87904ab5 Kernel: Add ability to load interpreter instead of main program
When the main executable needs an interpreter, we load the requested
interpreter program, and pass to it an open file decsriptor to the main
executable via the auxiliary vector.

Note that we do not allocate a TLS region for the interpreter.
2020-12-14 23:05:53 +01:00
Tom
c455fc2030 Kernel: Change wait blocking to Process-only blocking
This prevents zombies created by multi-threaded applications and brings
our model back to closer to what other OSs do.

This also means that SIGSTOP needs to halt all threads, and SIGCONT needs
to resume those threads.
2020-12-12 21:28:12 +01:00
Tom
4bbee00650 Kernel: disown should unblock any potential waiters
This is necessary because if a process changes the state to Stopped
or resumes from that state, a wait entry is created in the parent
process. So, if a child process does this before disown is called,
we need to clear those entries to avoid leaking references/zombies
that won't be cleaned up until the former parent exits.

This also should solve an even more unlikely corner case where another
thread is waiting on a pid that is being disowned by another thread.
2020-12-12 21:28:12 +01:00
Tom
da5cc34ebb Kernel: Fix some issues related to fixes and block conditions
Fix some problems with join blocks where the joining thread block
condition was added twice, which lead to a crash when trying to
unblock that condition a second time.

Deferred block condition evaluation by File objects were also not
properly keeping the File object alive, which lead to some random
crashes and corruption problems.

Other problems were caused by the fact that the Queued state didn't
handle signals/interruptions consistently. To solve these issues we
remove this state entirely, along with Thread::wait_on and change
the WaitQueue into a BlockCondition instead.

Also, deliver signals even if there isn't going to be a context switch
to another thread.

Fixes #4336 and #4330
2020-12-12 21:28:12 +01:00
Andreas Kling
97d789c75b Kernel: Fix null dereference when execve'ing ELF without PT_TLS header
Fixes #4387.
2020-12-11 22:59:46 +01:00
Tom
12cf6f8650 Kernel: Add CLOCK_REALTIME support to the TimerQueue
This allows us to use blocking timeouts with either monotonic or
real time for all blockers. Which means that clock_nanosleep()
now also supports CLOCK_REALTIME.

Also, switch alarm() to use CLOCK_REALTIME as per specification.
2020-12-02 13:02:04 +01:00
Tom
4c1e27ec65 Kernel: Use TimerQueue for SIGALRM 2020-12-02 13:02:04 +01:00
Andrew Kaster
3f808b0dda LibELF+Kernel: Validate program headers in Image::parse
This should catch more malformed ELF files earlier than simply
checking the ELF header alone. Also change the API of
validate_program_headers to take the interpreter_path by pointer. This
makes it less awkward to call when we don't care about the interpreter,
and just want the validation.
2020-12-01 09:58:21 +01:00
Tom
9e32d79e02 Kernel: Fix leaking a reference on thread creation
New Thread objects should be adopted into a RefPtr upon creation.
If creating a thread failed (e.g. out of memory), releasing the RefPtr
will destruct the partially created object, but in the successful case
the thread will add an additional reference that it keeps until it
finishes execution. Adopting will drop it to 1 when returning from
create_thread, or 0 if the thread could not be fully constructed.
2020-12-01 09:26:37 +01:00
Tom
046d6855f5 Kernel: Move block condition evaluation out of the Scheduler
This makes the Scheduler a lot leaner by not having to evaluate
block conditions every time it is invoked. Instead evaluate them as
the states change, and unblock threads at that point.

This also implements some more waitid/waitpid/wait features and
behavior. For example, WUNTRACED and WNOWAIT are now supported. And
wait will now not return EINTR when SIGCHLD is delivered at the
same time.
2020-11-30 13:17:02 +01:00
Tom
6a620562cc Kernel: Allow passing a thread argument for new kernel threads
This adds the ability to pass a pointer to kernel thread/process.
Also add the ability to use a closure as thread function, which
allows passing information to a kernel thread more easily.
2020-11-30 13:17:02 +01:00
Tom
6cb640eeba Kernel: Move some time related code from Scheduler into TimeManagement
Use the TimerQueue to expire blocking operations, which is one less thing
the Scheduler needs to check on every iteration.

Also, add a BlockTimeout class that will automatically handle relative or
absolute timeouts as well as overriding timeouts (e.g. socket timeouts)
more consistently.

Also, rework the TimerQueue class to be able to fire events from
any processor, which requires Timer to be RefCounted. Also allow
creating id-less timers for use by blocking operations.
2020-11-30 13:17:02 +01:00
Tom
68abd1cb29 Kernel: Fix SharedBuffer reference counting on fork
We need to not only add a record for a reference, but we need
to copy the reference count on fork as well, because the code
in the fork assumes that it has the same amount of references,
still.

Also, once all references are dropped when a process is disowned,
delete the shared buffer.

Fixes #4076
2020-11-24 21:26:39 +01:00
Sergey Bugaev
098070b767 Kernel: Add unveil('b')
This is a new "browse" permission that lets you open (and subsequently list
contents of) directories underneath the path, but not regular files or any other
types of files.
2020-11-23 18:37:40 +01:00
Andreas Kling
086522537e Kernel: Don't leak ref on executable inode in sys$execve()
We were leaking a ref on the executed inode in successful calls to
sys$execve(). This meant that once a binary had ever been executed,
it was impossible to remove it from the file system.

The execve system call is particularly finicky since the function
does not return normally on success, so extra care must be taken to
ensure nothing is kept alive by stack variables.

There is a big NOTE comment about this, and yet the bug still got in.
It would be nice to enforce this, but I'm unsure how.
2020-11-23 16:08:42 +01:00
Tom
a89648e159 Kernel: Inherit shared buffers when forking
We need to create a reference for the new PID for each shared buffer
that the process had a reference to. If the process subsequently
get replaced through exec, those references will be dropped again.
But if exec for some reason fails then other code, such as global
destructors could still expect having access to them.

Fixes #4076
2020-11-23 09:39:32 +01:00
Andreas Kling
94ff04b536 Kernel: Make CLOCK_MONOTONIC respect the system tick frequency
The time returned by sys$clock_gettime() was not aligned with the delay
calculations in sys$clock_nanosleep(). This patch fixes that by taking
the system's ticks_per_second value into account in both functions.

This patch also removes the need for Thread::sleep_until() and uses
Thread::sleep() for both absolute and relative sleeps.

This was causing the nesalizer emulator port to sleep for a negative
amount of time at the end of each frame, making it run way too fast.
2020-11-22 17:20:58 +01:00
Tom
75f61fe3d9 AK: Make RefPtr, NonnullRefPtr, WeakPtr thread safe
This makes most operations thread safe, especially so that they
can safely be used in the Kernel. This includes obtaining a strong
reference from a weak reference, which now requires an explicit
call to WeakPtr::strong_ref(). Another major change is that
Weakable::make_weak_ref() may require the explicit target type.
Previously we used reinterpret_cast in WeakPtr, assuming that it
can be properly converted. But WeakPtr does not necessarily have
the knowledge to be able to do this. Instead, we now ask the class
itself to deliver a WeakPtr to the type that we want.

Also, WeakLink is no longer specific to a target type. The reason
for this is that we want to be able to safely convert e.g. WeakPtr<T>
to WeakPtr<U>, and before this we just reinterpret_cast the internal
WeakLink<T> to WeakLink<U>, which is a bold assumption that it would
actually produce the correct code. Instead, WeakLink now operates
on just a raw pointer and we only make those constructors/operators
available if we can verify that it can be safely cast.

In order to guarantee thread safety, we now use the least significant
bit in the pointer for locking purposes. This also means that only
properly aligned pointers can be used.
2020-11-10 19:11:52 +01:00
Nico Weber
323e727a4c Kernel+LibC: Add adjtime(2)
Most systems (Linux, OpenBSD) adjust 0.5 ms per second, or 0.5 us per
1 ms tick. That is, the clock is sped up or slowed down by at most
0.05%.  This means adjusting the clock by 1 s takes 2000 s, and the
clock an be adjusted by at most 1.8 s per hour.

FreeBSD adjusts 5 ms per second if the remaining time adjustment is
>= 1 s (0.5%) , else it adjusts by 0.5 ms as well. This allows adjusting
by (almost) 18 s per hour.

Since Serenity OS can lose more than 22 s per hour (#3429), this
picks an adjustment rate up to 1% for now. This allows us to
adjust up to 36s per hour, which should be sufficient to adjust
the clock fast enough to keep up with how much time the clock
currently loses. Once we have a fancier NTP implementation that can
adjust tick rate in addition to offset, we can think about reducing
this.

adjtime is a bit old-school and most current POSIX-y OSs instead
implement adjtimex/ntp_adjtime, but a) we have to start somewhere
b) ntp_adjtime() is a fairly gnarly API. OpenBSD's adjfreq looks
like it might provide similar functionality with a nicer API. But
before worrying about all this, it's probably a good idea to get
to a place where the kernel APIs are (barely) good enough so that
we can write an ntp service, and once we have that we should write
a way to automatically evaluate how well it keeps the time adjusted,
and only then should we add improvements ot the adjustment mechanism.
2020-11-10 19:03:08 +01:00
Jesse Buhagiar
940380c986 Kernel: Prevent unveil returning ENOENT with cpath permissions
This addresses the issue first enountered in #3644. If a path is
first unveiled with "c" permissions, we should NOT return ENOENT
if the node does not exist on the disk, as the program will most
likely be creating it at a later time.
2020-11-10 09:53:18 +01:00