Even if the PIC was disabled it can still generate noise (spurious IRQs)
so we need to register two handlers for handling such cases.
Also, we declare interrupt service routine offset 0x20 to 0x2f as
reserved, so when the PIC is disabled, we can handle spurious IRQs from
the PIC at separate handlers.
At the end of sys$execve(), we perform a context switch from the old
executable into the new executable.
However, the Kernel::Thread object we are switching to is the *same*
thread as the one we are switching from. So we must not assume the
from_thread and to_thread are different threads.
We had a bug caused by this misconception, where the "from" thread would
always get marked as "inactive" when switching to a new thread.
This meant that threads would always get switched into "inactive" mode
on first context switch into them.
If a thread then tried blocking on a kernel mutex within its first time
slice, we'd end up in Thread::block(Mutex&) with an inactive thread.
Once a thread is inactive, the scheduler believes it's okay to
reactivate the thread (by scheduling it.) If a thread got re-scheduled
prematurely while setting up a mutex block, things would fall apart and
we'd crash in Thread::block() due to the thread state being "Runnable"
instead of the expected "Running".
Move this architecture-specific sanity check (IOPL must be 0) out of
Scheduler and into the x86 enter_thread_context(). Also do this for
every thread and not just userspace ones.
It's more accurate to say that we're blocking on a mutex, rather than
blocking on a lock. The previous terminology made sense when this code
was using something called Kernel::Lock, but since it was renamed to
Kernel::Mutex, this updates brings the language back in sync.
It was annoyingly hard to spot these when we were using them with
different amounts of qualification everywhere.
This patch uses Thread::State::Foo everywhere instead of Thread::Foo
or just Foo.
If the blocker is interrupted by a signal, that signal will be delivered
to the process when returning to userspace (at the syscall exit point.)
We don't have to perform the dispatch manually in Thread::block_impl().
Signal dispatch is already taken care of elsewhere, so there appears to
be no need for the hack in enter_current().
This also allows us to remove the Thread::m_in_block flag, simplifying
thread blocking logic somewhat.
Verified with the original repro for #4336 which this was meant to fix.
This function is large and unwieldy and forces Thread.h to #include
a bunch of things. The only reason it was in the header is because we
need to instantiate a blocker based on the templated BlockerType.
We actually keep block<BlockerType>() in the header, but move the
bulk of the function body out of line into Thread::block_impl().
To preserve destructor ordering, we add Blocker::finalize() which is
called where we'd previously destroy the Blocker.
We currently support the left super key. This poses an issue on
keyboards that only have a right super key, such as my Steelseries 6G.
The implementation mirrors the left/right shift key logic and
effectively considers the right super key identical to the left one.
This commit removes the usage of HashMap in Mutex, thereby making Mutex
be allocation-free.
In order to achieve this several simplifications were made to Mutex,
removing unused code-paths and extra VERIFYs:
* We no longer support 'upgrading' a shared lock holder to an
exclusive holder when it is the only shared holder and it did not
unlock the lock before relocking it as exclusive. NOTE: Unlike the
rest of these changes, this scenario is not VERIFY-able in an
allocation-free way, as a result the new LOCK_SHARED_UPGRADE_DEBUG
debug flag was added, this flag lets Mutex allocate in order to
detect such cases when debugging a deadlock.
* We no longer support checking if a Mutex is locked by the current
thread when the Mutex was not locked exclusively, the shared version
of this check was not used anywhere.
* We no longer support force unlocking/relocking a Mutex if the Mutex
was not locked exclusively, the shared version of these functions
was not used anywhere.
Devices such as NVMe can have blocks bigger that 512. Use the
m_block_size variable in read/write_block function instead of the
hardcoded 512 block size.
This is being used by GUID partitions so the first three dash-delimited
fields of the GUID are stored in little endian order but the last two
fields are stored in big endian order, hence it's a representation which
is mixed.
Ideally the x86 fault handler would only do x86 specific things and
delegate the rest of the work to MemoryManager. This patch moves some of
the address checks to a more generic place.
This avoids taking and releasing the MM lock just to reject an address
that we can tell from just looking at it that it won't ever be in the
kernel regions tree.
When the values we're setting are not actually u32s and the size of the
area we're setting is PAGE_SIZE-aligned and a multiple of PAGE_SIZE in
size, there's no point in using fast_u32_fill, as that forces us to use
STOSDs instead of STOSQs.
We were marking the execing thread as Runnable near the end of
Process::do_exec().
This was necessary for exec in processes that had never been scheduled
yet, which is a specific edge case that only applies to the very first
userspace process (normally SystemServer). At this point, such threads
are in the Invalid state.
In the common case (normal userspace-initiated exec), making the current
thread Runnable meant that we switched away from its current state:
Running. As the thread is indeed running, that's a bogus change!
This created a short time window in which the thread state was bogus,
and any attempt to block the thread would panic the kernel (due to a
bogus thread state in Thread::block() leading to VERIFY_NOT_REACHED().)
Fix this by not touching the thread state in Process::do_exec()
and instead make the first userspace thread Runnable directly after
calling Process::exec() on it in try_create_userspace_process().
It's unfortunate that exec() can be called both on the current thread,
and on a new thread that has never been scheduled. It would be good to
not have the latter edge case, but fixing that will require larger
architectural changes outside the scope of this fix.
This makes sure DeviceManagement::try_create_device will call the
static factory function (if available) instead of directly calling the
constructor, which will allow us to move OOM-fallible calls out of
Device constructors.
If we panic the kernel for a storage-related reason, we might as well be
helpful and print out a list of detected storage devices and their
partitions to help with debugging.
Reasons for such a panic include:
- No boot device with the given name found
- No boot device with the given UUID found
- Failing to open the root filesystem after determining a boot device
Before attempting to remove the device while handling an AHCI port
interrupt, check if m_connected_device is even non-null.
This happened during my bare metal run and caused a kernel panic.
A number of crashes in this `VERIFY_NOT_REACHED` case have been
reported on discord. Lets add some tracing to gather more information
and help diagnose what is the cause of these crashes.
There are many assumptions in the stack that argc is not zero, and
argv[0] points to a valid string. The recent pwnkit exploit on Linux
was able to exploit this assumption in the `pkexec` utility
(a SUID-root binary) to escalate from any user to root.
By convention `execve(..)` should always be called with at least one
valid argument, so lets enforce that semantic to harden the system
against vulnerabilities like pwnkit.
Reference: https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt