Commit graph

48 commits

Author SHA1 Message Date
Andreas Kling
cf16b2c8e6 Kernel: Wrap process address spaces in SpinlockProtected
This forces anyone who wants to look into and/or manipulate an address
space to lock it. And this replaces the previous, more flimsy, manual
spinlock use.

Note that pointers *into* the address space are not safe to use after
you unlock the space. We've got many issues like this, and we'll have
to track those down as wlel.
2022-08-24 14:57:51 +02:00
Anthony Iacono
f86b671de2 Kernel: Use Process::credentials() and remove user ID/group ID helpers
Move away from using the group ID/user ID helpers in the process to
allow for us to take advantage of the immutable credentials instead.
2022-08-22 12:46:32 +02:00
zzLinus
ca74443012 Kernel/LibC: Implement posix syscall clock_getres() 2022-07-25 15:33:50 +02:00
Andreas Kling
12b612ab14 Kernel: Mark sys$adjtime() as not needing the big lock
This syscall works on global kernel state and so doesn't need protection
from threads in the same process.
2022-04-04 00:42:18 +02:00
Andreas Kling
4306422f29 Kernel: Mark sys$clock_settime() as not needing the big log
This syscall ends up disabling interrupts while changing the time,
and the clock is a global resource anyway, so preventing threads in the
same process from running wouldn't solve anything.
2022-04-04 00:42:18 +02:00
Andreas Kling
858b196c59 Kernel: Unbreak ASLR in the new RegionTree world
Functions that allocate and/or place a Region now take a parameter
that tells it whether to randomize unspecified addresses.
2022-04-03 21:51:58 +02:00
Andreas Kling
07f3d09c55 Kernel: Make VM allocation atomic for userspace regions
This patch move AddressSpace (the per-process memory manager) to using
the new atomic "place" APIs in RegionTree as well, just like we did for
MemoryManager in the previous commit.

This required updating quite a few places where VM allocation and
actually committing a Region object to the AddressSpace were separated
by other code.

All you have to do now is call into AddressSpace once and it'll take
care of everything for you.
2022-04-03 21:51:58 +02:00
Andreas Kling
ffe2e77eba Kernel: Add Memory::RegionTree to share code between AddressSpace and MM
RegionTree holds an IntrusiveRedBlackTree of Region objects and vends a
set of APIs for allocating memory ranges.

It's used by AddressSpace at the moment, and will be used by MM soon.
2022-04-03 21:51:58 +02:00
Andreas Kling
02a95a196f Kernel: Use AddressSpace region tree for range allocation
This patch stops using VirtualRangeAllocator in AddressSpace and instead
looks for holes in the region tree when allocating VM space.

There are many benefits:

- VirtualRangeAllocator is non-intrusive and would call kmalloc/kfree
  when used. This new solution is allocation-free. This was a source
  of unpleasant MM/kmalloc deadlocks.

- We consolidate authority on what the address space looks like in a
  single place. Previously, we had both the range allocator *and* the
  region tree both being used to determine if an address was valid.
  Now there is only the region tree.

- Deallocation of VM when splitting regions is no longer complicated,
  as we don't need to keep two separate trees in sync.
2022-04-03 21:51:58 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Brian Gianforcaro
d05fa14e52 Kernel: Use TRY() when validating clock_id in TimeManagement
Gets rid of a bit of code duplication, and makes the API more consistent
with the style we are moving towards.
2022-02-21 15:47:51 -08:00
Brian Gianforcaro
54b9a4ec1e Kernel: Handle promise violations in the syscall handler
Previously we would crash the process immediately when a promise
violation was found during a syscall. This is error prone, as we
don't unwind the stack. This means that in certain cases we can
leak resources, like an OwnPtr / RefPtr tracked on the stack. Or
even leak a lock acquired in a ScopeLockLocker.

To remedy this situation we move the promise violation handling to
the syscall handler, right before we return to user space. This
allows the code to follow the normal unwind path, and grantees
there is no longer any cleanup that needs to occur.

The Process::require_promise() and Process::require_no_promises()
functions were modified to return ErrorOr<void> so we enforce that
the errors are always propagated by the caller.
2021-12-29 18:08:15 +01:00
Brian Gianforcaro
bad6d50b86 Kernel: Use Process::require_promise() instead of REQUIRE_PROMISE()
This change lays the foundation for making the require_promise return
an error hand handling the process abort outside of the syscall
implementations, to avoid cases where we would leak resources.

It also has the advantage that it makes removes a gs pointer read
to look up the current thread, then process for every syscall. We
can instead go through the Process this pointer in most cases.
2021-12-29 18:08:15 +01:00
Andreas Kling
79fa9765ca Kernel: Replace KResult and KResultOr<T> with Error and ErrorOr<T>
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.

Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
2021-11-08 01:10:53 +01:00
Andreas Kling
e6929835d2 Kernel: Make copy_time_from_user() helpers use KResultOr<Time>
...and use TRY() for smooth error propagation everywhere.
2021-09-07 01:18:02 +02:00
Andreas Kling
f4a9a0d561 Kernel: Make VirtualRangeAllocator return KResultOr<VirtualRange>
This achieves two things:
- The allocator can report more specific errors
- Callers can (and now do) use TRY() :^)
2021-09-06 01:55:27 +02:00
Andreas Kling
6dddd500bf Kernel: Use TRY() in sys$map_time_page() 2021-09-05 18:15:05 +02:00
Andreas Kling
789db813d3 Kernel: Use copy_typed_from_user<T> for fetching syscall parameters 2021-09-05 17:51:37 +02:00
Andreas Kling
48a0b31c47 Kernel: Make copy_{from,to}_user() return KResult and use TRY()
This makes EFAULT propagation flow much more naturally. :^)
2021-09-05 17:38:37 +02:00
Andreas Kling
fdfc66db61 Kernel+LibC: Allow clock_gettime() to run without syscalls
This patch adds a vDSO-like mechanism for exposing the current time as
an array of per-clock-source timestamps.

LibC's clock_gettime() calls sys$map_time_page() to map the kernel's
"time page" into the process address space (at a random address, ofc.)
This is only done on first call, and from then on the timestamps are
fetched from the time page.

This first patch only adds support for CLOCK_REALTIME, but eventually
we should be able to support all clock sources this way and get rid of
sys$clock_gettime() in the kernel entirely. :^)

Accesses are synchronized using two atomic integers that are incremented
at the start and finish of the kernel's time page update cycle.
2021-08-10 19:21:16 +02:00
Andreas Kling
fa64ab26a4 Kernel+UserspaceEmulator: Remove unused sys$gettimeofday()
Now that LibC uses clock_gettime() to implement gettimeofday(), we can
get rid of this entire syscall. :^)
2021-08-10 13:01:39 +02:00
Idan Horowitz
d40038a04f Kernel: Disable big process lock for sys$gettimeofday
This syscall doesn't touch any intra-process shared resources and only
accesses the time via the atomic TimeManagement::now so there's no need
to hold the big lock.
2021-08-06 23:36:12 +02:00
Idan Horowitz
3ba2449058 Kernel: Disable big process lock for sys$clock_nanosleep
This syscall doesn't touch any intra-process shared resources and only
accesses the time via the atomic TimeManagement::current_time so there's
no need to hold the big lock.
2021-08-06 23:36:12 +02:00
Idan Horowitz
fbd848e6eb Kernel: Disable big process lock for sys$clock_gettime()
This syscall doesn't touch any intra-process shared resources and
reads the time via the atomic TimeManagement::current_time, so it
doesn't need to hold any lock.
2021-08-06 23:36:12 +02:00
Brian Gianforcaro
9201a06027 Kernel: Annotate all syscalls with VERIFY_PROCESS_BIG_LOCK_ACQUIRED
Before we start disabling acquisition of the big process lock for
specific syscalls, make sure to document and assert that all the
lock is held during all syscalls.
2021-07-20 03:21:14 +02:00
Gunnar Beutner
2a78bf8596 Kernel: Fix the return type for syscalls
The Process::Handler type has KResultOr<FlatPtr> as its return type.
Using a different return type with an equally-sized template parameter
sort of works but breaks once that condition is no longer true, e.g.
for KResultOr<int> on x86_64.

Ideally the syscall handlers would also take FlatPtrs as their args
so we can get rid of the reinterpret_cast for the function pointer
but I didn't quite feel like cleaning that up as well.
2021-06-28 22:29:28 +02:00
Brian Gianforcaro
11306d7121
Kernel: Modify TimeManagement::current_time(..) API so it can't fail. (#6869)
The fact that current_time can "fail" makes its use a bit awkward.
All callers in the Kernel are trusted besides syscalls, so assert
that they never get there, and make sure all current callers perform
validation of the clock_id with TimeManagement::is_valid_clock_id().

I have fuzzed this change locally for a bit to make sure I didn't
miss any obvious regression.
2021-05-05 18:51:06 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling
a166a65eff Kernel: Don't return -EFOO when return type is KResultOr<...> 2021-03-15 09:09:04 +01:00
Ben Wiederhake
336303bda4 Kernel: Make kgettimeofday use AK::Time 2021-03-02 08:36:08 +01:00
Ben Wiederhake
c040e64b7d Kernel: Make TimeManagement use AK::Time internally
I don't dare touch the multi-threading logic and locking mechanism, so it stays
timespec for now. However, this could and should be changed to AK::Time, and I
bet it will simplify the "increment_time_since_boot()" code.
2021-03-02 08:36:08 +01:00
Ben Wiederhake
2b6546c40a Kernel: Make Thread use AK::Time internally
This commit is very invasive, because Thread likes to take a pointer and write
to it. This means that translating between timespec/timeval/Time would have been
more difficult than just changing everything that hands a raw pointer to Thread,
in bulk.
2021-03-02 08:36:08 +01:00
Ben Wiederhake
8598240193 Kernel: Sanitize all user-supplied timeval's/timespec's
This also removes a bunch of unnecessary EINVAL. Most of them weren't even
recommended by POSIX.
2021-03-02 08:36:08 +01:00
Andreas Kling
ac71775de5 Kernel: Make all syscall functions return KResultOr<T>
This makes it a lot easier to return errors since we no longer have to
worry about negating EFOO errors and can just return them flat.
2021-03-01 13:54:32 +01:00
Ben Wiederhake
546cdde776 Kernel: clock_nanosleep's 'flags' is not a bitset
This had the interesting effect that most, but not all, non-zero values
were interpreted as an absolute value.
2021-02-13 00:40:31 +01:00
Tom
5f51d85184 Kernel: Improve time keeping and dramatically reduce interrupt load
This implements a number of changes related to time:
* If a HPET is present, it is now used only as a system timer, unless
  the Local APIC timer is used (in which case the HPET timer will not
  trigger any interrupts at all).
* If a HPET is present, the current time can now be as accurate as the
  chip can be, independently from the system timer. We now query the
  HPET main counter for the current time in CPU #0's system timer
  interrupt, and use that as a base line. If a high precision time is
  queried, that base line is used in combination with quering the HPET
  timer directly, which should give a much more accurate time stamp at
  the expense of more overhead. For faster time stamps, the more coarse
  value based on the last interrupt will be returned. This also means
  that any missed interrupts should not cause the time to drift.
* The default system interrupt rate is reduced to about 250 per second.
* Fix calculation of Thread CPU usage by using the amount of ticks they
  used rather than the number of times a context switch happened.
* Implement CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE and use it
  for most cases where precise timestamps are not needed.
2020-12-21 18:26:12 +01:00
Tom
12cf6f8650 Kernel: Add CLOCK_REALTIME support to the TimerQueue
This allows us to use blocking timeouts with either monotonic or
real time for all blockers. Which means that clock_nanosleep()
now also supports CLOCK_REALTIME.

Also, switch alarm() to use CLOCK_REALTIME as per specification.
2020-12-02 13:02:04 +01:00
Tom
6cb640eeba Kernel: Move some time related code from Scheduler into TimeManagement
Use the TimerQueue to expire blocking operations, which is one less thing
the Scheduler needs to check on every iteration.

Also, add a BlockTimeout class that will automatically handle relative or
absolute timeouts as well as overriding timeouts (e.g. socket timeouts)
more consistently.

Also, rework the TimerQueue class to be able to fire events from
any processor, which requires Timer to be RefCounted. Also allow
creating id-less timers for use by blocking operations.
2020-11-30 13:17:02 +01:00
Andreas Kling
94ff04b536 Kernel: Make CLOCK_MONOTONIC respect the system tick frequency
The time returned by sys$clock_gettime() was not aligned with the delay
calculations in sys$clock_nanosleep(). This patch fixes that by taking
the system's ticks_per_second value into account in both functions.

This patch also removes the need for Thread::sleep_until() and uses
Thread::sleep() for both absolute and relative sleeps.

This was causing the nesalizer emulator port to sleep for a negative
amount of time at the end of each frame, making it run way too fast.
2020-11-22 17:20:58 +01:00
Nico Weber
323e727a4c Kernel+LibC: Add adjtime(2)
Most systems (Linux, OpenBSD) adjust 0.5 ms per second, or 0.5 us per
1 ms tick. That is, the clock is sped up or slowed down by at most
0.05%.  This means adjusting the clock by 1 s takes 2000 s, and the
clock an be adjusted by at most 1.8 s per hour.

FreeBSD adjusts 5 ms per second if the remaining time adjustment is
>= 1 s (0.5%) , else it adjusts by 0.5 ms as well. This allows adjusting
by (almost) 18 s per hour.

Since Serenity OS can lose more than 22 s per hour (#3429), this
picks an adjustment rate up to 1% for now. This allows us to
adjust up to 36s per hour, which should be sufficient to adjust
the clock fast enough to keep up with how much time the clock
currently loses. Once we have a fancier NTP implementation that can
adjust tick rate in addition to offset, we can think about reducing
this.

adjtime is a bit old-school and most current POSIX-y OSs instead
implement adjtimex/ntp_adjtime, but a) we have to start somewhere
b) ntp_adjtime() is a fairly gnarly API. OpenBSD's adjfreq looks
like it might provide similar functionality with a nicer API. But
before worrying about all this, it's probably a good idea to get
to a place where the kernel APIs are (barely) good enough so that
we can write an ntp service, and once we have that we should write
a way to automatically evaluate how well it keeps the time adjusted,
and only then should we add improvements ot the adjustment mechanism.
2020-11-10 19:03:08 +01:00
Tom
c8d9f1b9c9 Kernel: Make copy_to/from_user safe and remove unnecessary checks
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.

So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.

To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.

Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
2020-09-13 21:19:15 +02:00
Nico Weber
e8131f503d Kernel: Let TimeManagement keep epoch time as timespec
Previously, it was kept as just a time_t and the sub-second
offset was inferred from the monotonic clock. This means that
sub-second time adjustments were ignored.

Now that `ntpquery -s` can pass in a time with sub-second
precision, it makes sense to keep time at that granularity
in the kernel.

After this, `ntpquery -s` immediately followed by `ntpquery` shows
an offset of 0.02s (that is, on the order of network roundtrip time)
instead of up to 0.75s previously.
2020-09-07 11:22:48 +02:00
Brian Gianforcaro
b069d757a3 Kernel: Use Userspace<T> for the clock_settime syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
1be6145fdf Kernel: Modifiy clock_settime timespec argument to const
The timeppec paramter is read only, and should be const.
2020-08-10 12:52:15 +02:00
Brian Gianforcaro
b4d04fd8d1 Kernel: Use Userspace<T> for the clock_gettime syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
84035e1035 Kernel: Use Userspace<T> for the clock_nanosleep syscall 2020-08-05 09:36:53 +02:00
Brian Gianforcaro
baa070afb8 Kernel: Use Userspace<T> for the gettimeofday syscall 2020-08-05 09:36:53 +02:00
Andreas Kling
949aef4aef Kernel: Move syscall implementations out of Process.cpp
This is something I've been meaning to do for a long time, and here we
finally go. This patch moves all sys$foo functions out of Process.cpp
and into files in Kernel/Syscalls/.

It's not exactly one syscall per file (although it could be, but I got
a bit tired of the repetitive work here..)

This makes hacking on individual syscalls a lot less painful since you
don't have to rebuild nearly as much code every time. I'm also hopeful
that this makes it easier to understand individual syscalls. :^)
2020-07-30 23:40:57 +02:00