This allows us to use blocking timeouts with either monotonic or
real time for all blockers. Which means that clock_nanosleep()
now also supports CLOCK_REALTIME.
Also, switch alarm() to use CLOCK_REALTIME as per specification.
This changes the Thread::wait_on function to not enable interrupts
upon leaving, which caused some problems with page fault handlers
and in other situations. It may now be called from critical
sections, with interrupts enabled or disabled, and returns to the
same state.
This also requires some fixes to Lock. To aid debugging, a new
define LOCK_DEBUG is added that enables checking for Lock leaks
upon finalization of a Thread.
This makes the Scheduler a lot leaner by not having to evaluate
block conditions every time it is invoked. Instead evaluate them as
the states change, and unblock threads at that point.
This also implements some more waitid/waitpid/wait features and
behavior. For example, WUNTRACED and WNOWAIT are now supported. And
wait will now not return EINTR when SIGCHLD is delivered at the
same time.
This adds the ability to pass a pointer to kernel thread/process.
Also add the ability to use a closure as thread function, which
allows passing information to a kernel thread more easily.
Use the TimerQueue to expire blocking operations, which is one less thing
the Scheduler needs to check on every iteration.
Also, add a BlockTimeout class that will automatically handle relative or
absolute timeouts as well as overriding timeouts (e.g. socket timeouts)
more consistently.
Also, rework the TimerQueue class to be able to fire events from
any processor, which requires Timer to be RefCounted. Also allow
creating id-less timers for use by blocking operations.
We should never resume a thread by directly setting it to Running state.
Instead, if a thread was in Running state when stopped, record the state
as Runnable.
Fixes#4150
The time returned by sys$clock_gettime() was not aligned with the delay
calculations in sys$clock_nanosleep(). This patch fixes that by taking
the system's ticks_per_second value into account in both functions.
This patch also removes the need for Thread::sleep_until() and uses
Thread::sleep() for both absolute and relative sleeps.
This was causing the nesalizer emulator port to sleep for a negative
amount of time at the end of each frame, making it run way too fast.
This makes most operations thread safe, especially so that they
can safely be used in the Kernel. This includes obtaining a strong
reference from a weak reference, which now requires an explicit
call to WeakPtr::strong_ref(). Another major change is that
Weakable::make_weak_ref() may require the explicit target type.
Previously we used reinterpret_cast in WeakPtr, assuming that it
can be properly converted. But WeakPtr does not necessarily have
the knowledge to be able to do this. Instead, we now ask the class
itself to deliver a WeakPtr to the type that we want.
Also, WeakLink is no longer specific to a target type. The reason
for this is that we want to be able to safely convert e.g. WeakPtr<T>
to WeakPtr<U>, and before this we just reinterpret_cast the internal
WeakLink<T> to WeakLink<U>, which is a bold assumption that it would
actually produce the correct code. Instead, WeakLink now operates
on just a raw pointer and we only make those constructors/operators
available if we can verify that it can be safely cast.
In order to guarantee thread safety, we now use the least significant
bit in the pointer for locking purposes. This also means that only
properly aligned pointers can be used.
g_scheduler_lock cannot safely be acquired after Thread::m_lock
because another processor may already hold g_scheduler_lock and wait
for the same Thread::m_lock.
Similar to Process, we need to make Thread refcounted. This will solve
problems that will appear once we schedule threads on more than one
processor. This allows us to hold onto threads without necessarily
holding the scheduler lock for the entire duration.
The thread joining logic hadn't been updated to account for the subtle
differences introduced by software context switching. This fixes several
race conditions related to thread destruction and joining, as well as
finalization which did not properly account for detached state and the
fact that threads can be joined after termination as long as they're not
detached.
Fixes#3596
There are plenty of places in the kernel that aren't
checking if they actually got their allocation.
This fixes some of them, but definitely not all.
Fixes#3390Fixes#3391
Also, let's make find_one_free_page() return nullptr
if it doesn't get a free index. This stops the kernel
crashing when out of memory and allows memory purging
to take place again.
Fixes#3487
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.
So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.
To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.
Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
Since "rings" typically refer to code execution and user processes
can also execute in ring 0, rename these functions to more accurately
describe what they mean: kernel processes and user processes.
In c3d231616c we added the atomic variable
m_have_any_unmasked_pending_signals tracking the state of pending signals.
Add helper functions that automatically update this variable as needed.
We need to wait until a thread is fully set up and ready for running
before attempting to deliver a signal. Otherwise we may not have a
user stack yet.
Also, remove the Skip0SchedulerPasses and Skip1SchedulerPass thread
states that we don't really need anymore with software context switching.
Fixes the kernel crash reported in #3419
Add an ExpandableHeap and switch kmalloc to use it, which allows
for the kmalloc heap to grow as needed.
In order to make heap expansion to work, we keep around a 1 MiB backup
memory region, because creating a region would require space in the
same heap. This means, the heap will grow as soon as the reported
utilization is less than 1 MiB. It will also return memory if an entire
subheap is no longer needed, although that is rarely possible.
Previously we were putting strings at the bottom of the allocated stack
region, and pointer arrays (argv, env, auxv) at the top. There was no
reason for this other than "it seemed easier to do it that way at the
time I wrote it."
This patch packs the strings and pointer vectors into the same area at
the top of the stack.
This reduces the memory footprint of all programs by 4 KiB. :^)
This enables a nice warning in case a function becomes dead code. Also, in case
of signal_trampoline_dummy, marking it external (non-static) prevents it from
being 'optimized away', which would lead to surprising and weird linker errors.
I found these places by using -Wmissing-declarations.
The Kernel still shows these issues, which I think are false-positives,
but don't want to touch:
- Kernel/Arch/i386/CPU.cpp:1081:17: void Kernel::enter_thread_context(Kernel::Thread*, Kernel::Thread*)
- Kernel/Arch/i386/CPU.cpp:1170:17: void Kernel::context_first_init(Kernel::Thread*, Kernel::Thread*, Kernel::TrapFrame*)
- Kernel/Arch/i386/CPU.cpp:1304:16: u32 Kernel::do_init_context(Kernel::Thread*, u32)
- Kernel/Arch/i386/CPU.cpp:1347:17: void Kernel::pre_init_finished()
- Kernel/Arch/i386/CPU.cpp:1360:17: void Kernel::post_init_finished()
No idea, not gonna touch it.
- Kernel/init.cpp:104:30: void Kernel::init()
- Kernel/init.cpp:167:30: void Kernel::init_ap(u32, Kernel::Processor*)
- Kernel/init.cpp:184:17: void Kernel::init_finished(u32)
Called by boot.S.
- Kernel/init.cpp:383:16: int Kernel::__cxa_atexit(void (*)(void*), void*, void*)
- Kernel/StdLib.cpp:285:19: void __cxa_pure_virtual()
- Kernel/StdLib.cpp:300:19: void __stack_chk_fail()
- Kernel/StdLib.cpp:305:19: void __stack_chk_fail_local()
Not sure how to tell the compiler that the compiler is already using them.
Also, maybe __cxa_atexit should go into StdLib.cpp?
- Kernel/Modules/TestModule.cpp:31:17: void module_init()
- Kernel/Modules/TestModule.cpp:40:17: void module_fini()
Could maybe go into a new header. This would also provide type-checking for new modules.
We need to always return from Thread::wait_on, even when a thread
is being killed. This is necessary so that the kernel call stack
can clean up and release references held by it. Then, right before
transitioning back to user mode, we check if the thread is
supposed to die, and at that point change the thread state to
Dying to prevent further scheduling of this thread.
This addresses some possible resource leaks similar to #3073
This compiles, and contains exactly the same bugs as before.
The regex 'FIXME: PID/' should reveal all markers that I left behind, including:
- Incomplete conversion
- Issues or things that look fishy
- Actual bugs that will go wrong during runtime
If a thread is waiting but getting killed, we need to dequeue
the thread from the WaitQueue so that a potential wake before
finalization doesn't happen.
Allow passing in an optional timeout to Thread::block and move
the timeout check out of Thread::Blocker. This way all Blockers
implicitly support timeouts and don't need to implement it
themselves. Do however allow them to override timeouts (e.g.
for sockets).
We need to have a Thread lock to protect threading related
operations, such as Thread::m_blocker which is used in
Thread::block.
Also, if a Thread::Blocker indicates that it should be
unblocking immediately, don't actually block the Thread
and instead return immediately in Thread::block.
This fixes a regression introduced by the new software context
switching where the Kernel would not deliver a signal unless the
process is making system calls. This is because the TSS no longer
updates the CS value, so the scheduler never considered delivery
as the process always appeared to be in kernel mode. With software
context switching we can just set up the signal trampoline at
any time and when the processor returns back to user mode it'll
get executed. This should fix e.g. killing programs that are
stuck in some tight loop that doesn't make any system calls and
is only pre-empted by the timer interrupt.
Fixes#2958
By making the Process class RefCounted we don't really need
ProcessInspectionHandle anymore. This also fixes some race
conditions where a Process may be deleted while still being
used by ProcFS.
Also make sure to acquire the Process' lock when accessing
regions.
Last but not least, there's no reason why a thread can't be
scheduled while being inspected, though in practice it won't
happen anyway because the scheduler lock is held at the same
time.
Because Thread::sleep is an internal interface, it's easy to check that there
are only few callers: Process::sys$sleep, usleep, and nanosleep are happy
with this increased size, because now they support the entire range of their
arguments (assuming small-ish values for ticks_per_second()).
SyncTask doesn't care.
Note that the old behavior wasn't "cap out at 388 days", which would have been
reasonable. Instead, the code resulted in unsigned overflow, meaning that a
very long sleep would "on average" end after about 194 days, sometimes much
quicker.
We now have BlockResult::WokeNormally and BlockResult::NotBlocked,
both of which indicate no error. We can no longer just check for
BlockResult::WokeNormally and assume anything else must be an
interruption.
The AT_* entries are placed after the environment variables, so that
they can be found by iterating until the end of the envp array, and then
going even further beyond :^)
We can now properly initialize all processors without
crashing by sending SMP IPI messages to synchronize memory
between processors.
We now initialize the APs once we have the scheduler running.
This is so that we can process IPI messages from the other
cores.
Also rework interrupt handling a bit so that it's more of a
1:1 mapping. We need to allocate non-sharable interrupts for
IPIs.
This also fixes the occasional hang/crash because all
CPUs now synchronize memory with each other.
The short-circuit path added for waiting on a queue that already had a
pending wake was able to return with interrupts disabled, which breaks
the API contract of wait_on() always returning with IF=1.
Fix this by adding a way to override the restored IF in ScopedCritical.
If WaitQueue::wake_all, WaitQueue::wake_one, or WaitQueue::wake_n
is called but nobody is currently waiting, we should remember that
fact and prevent someone from waiting after such a request. This
solves a race condition where the Finalizer thread is notified
to finalize a thread, but it is not (yet) waiting on this queue.
Fixes#2693
These changes solve a number of problems with the software
context swithcing:
* The scheduler lock really should be held throughout context switches
* Transitioning from the initial (idle) thread to another needs to
hold the scheduler lock
* Transitioning from a dying thread to another also needs to hold
the scheduler lock
* Dying threads cannot necessarily be finalized if they haven't
switched out of it yet, so flag them as active while a processor
is running it (the Running state may be switched to Dying while
it still is actually running)
The Lock class still permits no reason, but for everything else
require a reason to be passed to Thread::wait_on. This makes it
easier to diagnose why a Thread is in Queued state.
If we're trying to walk the stack for another thread, we can
no longer retreive the EBP register from Thread::m_tss. Instead,
we need to look at the top of the kernel stack, because all threads
not currently running were last in kernel mode. Context switches
now always trigger a brief switch to kernel mode, and Thread::m_tss
only is used to save ESP and EIP.
Fixes#2678
When delivering urgent signals to the current thread
we need to check if we should be unblocked, and if not
we need to yield to another process.
We also need to make sure that we suppress context switches
during Process::exec() so that we don't clobber the registers
that it sets up (eip mainly) by a context switch. To be able
to do that we add the concept of a critical section, which are
similar to Process::m_in_irq but different in that they can be
requested at any time. Calls to Scheduler::yield and
Scheduler::donate_to will return instantly without triggering
a context switch, but the processor will then asynchronously
trigger a context switch once the critical section is left.
If these methods get inlined, the compiler is able to statically eliminate most
of the assertions. Alas, it doesn't realize this, and believes inlining them to
be too expensive. So give it a strong hint that it's not the case.
This *decreases* the kernel binary size.