We were crashing on the VERIFY_INTERRUPTS_DISABLED() in
RecursiveSpinlock::unlock, which was caused by the compiler reordering
instructions in `sys$get_root_session_id`. In this function, a SpinLock
is locked and quickly unlocked again, and since the lock and unlock
functions were inlined into `sys$get_root_session_id` and the DAIF::read
was missing the `volatile` keyword, the compiler was free to reorder the
reads from the DAIF register to the top of this function. This caused
the CPU to read the interrupts state at the beginning of the function,
and storing the result on the stack, which in turn caused the
VERIFY_INTERRUPTS_DISABLED() assertion to fail. By adding the `volatile`
modifier to the inline assembly, the compiler will not reorder the
instructions.
In aa40cef2b7, I mistakenly assumed that the crash was related to the
initial interrupts state of the kernel threads, but it turns out that
the missing `volatile` keyword was the actual problem. This commit also
removes that code again.
We have a problem with the original utimensat syscall because when we
do call LibC futimens function, internally we provide an empty path,
and the Kernel get_syscall_path_argument method will detect this as an
invalid path.
This happens to spit an error for example in the touch utility, so if a
user is running "touch non_existing_file", it will create that file, but
the user will still see an error coming from LibC futimens function.
This new syscall gets an open file description and it provides the same
functionality as utimensat, on the specified open file description.
The new syscall will be used later by LibC to properly implement LibC
futimens function so the situation described with relation to the
"touch" utility could be fixed.
We were detaching from the jail process list too early. To ensure we
detach properly, leverage the remove_from_secondary_lists method
so the possibly jailed parent process can still see the dying process
and therefore clean it properly.
For a very long time, the kernel had only support for basic PS/2 devices
such as the PS2 AT keyboard and regular PS2 mouse (with a scroll wheel).
To adapt to this, we had very simple abstractions in place, essentially,
the PS2 devices were registered as IRQ handlers (IRQ 1 and 12), and when
an interrupt was triggered, we simply had to tell the I8042Controller to
fetch a byte for us, then send it back to the appropriate device for
further processing and queueing of either a key event, or a mouse packet
so userspace can do something meaningful about it.
When we added the VMWare mouse integration feature it was easily adapted
to this paradigm, requiring small changes across the handling code for
these devices.
This patch is a major cleanup for any future advancements in the HID
subsystem.
It ensures we do things in a much more sane manner:
- We stop using LockRefPtrs. Currently, after the initialization of the
i8042 controller, we never have to change RefPtrs in that class, as we
simply don't support PS2 hotplugging currently.
Also, we remove the unnecessary getters for keyboard and mouse devices
which also returned a LockRefPtr.
- There's a clear separation between PS2 devices and the actual device
nodes that normally exist in /dev. PS2 devices are not polled, because
when the user uses these devices, they will trigger an IRQ which when
is handled, could produce either a MousePacket or KeyEvent, depending
on the device state.
The separation is crucial for buses that are polled, for example - USB
is a polled bus and will not generate an IRQ for HID devices.
- There's a clear separation in roles of each structure. The PS2 devices
which are attached to a I8042Controller object are managing the device
state, while the generic MouseDevice and KeyboardDevice manage all
related tasks of a CharacterDevice, as well as interpreting scan code
events and mouse relative/absolute coordinates.
It happens to be that only PS/2 devices that are connected via the i8042
controller can generate interrupt events, so it makes much more sense to
have those devices to implement the enable_interrupts method because of
the I8042Device class and not the HIDDevice class.
Use the new class in HID code, because all other HID device controllers
will be using this class as their parent class.
Hence, we no longer keep a reference to any PS/2 device in HIDManagement
and rely on HIDController derived classes to do this for us.
It also means that we removed another instance of a LockRefPtr, which
is designated to be removed and is replaced by the better pattern of
SpinlockProtected<RefPtr<>> instead.
The definitions were being defined already by `BootInfo.h` and that was
being included here via transitive includes. The extern definitions of
the variables do not have the `READONLY_AFTER_INIT` attribute in
`BootInfo.h`. This causes conflicting definitions of the same variable.
The `READONLY_AFTER_INIT` specifier is not needed for extern variables
as it only effects their linkage, not their actual use, so just use the
versions in `BootInfo.h` instead of re-declaring.
These were easy to pick-up as these pointers are assigned during the
construction point and are never changed afterwards.
This small change to these pointers will ensure that our code will not
accidentally assign these pointers with a new object which is always a
kind of bug we will want to prevent.
When switching to the new address space, we also have to switch the
Process::m_master_tls_* variables as they may refer to a region in
the old address space.
This was causing `su` to not run correctly.
Regression from 65641187ff.
Specifically this commit implements two setters set_userspace_sp and
set_ip in RegisterState.h, and also adds a stack pointer getter (sp) in
ThreadRegisters.h. Contributed by konrad, thanks for that.
Setting the page table base register (ttbr0_el1) is not enough, and will
not flush the TLB caches, in contrary with x86_64 where setting the CR3
register will actually flush the caches. This commit adds the necessary
code to properly flush the TLB caches when context switching. This
commit also changes Processor::flush_tlb_local to use the vmalle1
variant, as previously we would be flushing the tlb's of all the cores
in the inner-shareable domain.
Previously we had a race condition in the page fault handling: We were
relying on the affected Region staying alive while handling the page
fault, but this was not actually guaranteed, as an munmap from another
thread could result in the region being removed concurrently.
This commit closes that hole by extending the lifetime of the region
affected by the page fault until the handling of the page fault is
complete. This is achieved by maintaing a psuedo-reference count on the
region which counts the number of in-progress page faults being handled
on this region, and extending the lifetime of the region while this
counter is non zero.
Since both the increment of the counter by the page fault handler and
the spin loop waiting for it to reach 0 during Region destruction are
serialized using the appropriate AddressSpace spinlock, eventual
progress is guaranteed: As soon as the region is removed from the tree
no more page faults on the region can start.
And similarly correctness is ensured: The counter is incremented under
the same lock, so any page faults that are being handled will have
already incremented the counter before the region is deallocated.
This replaces the previous owning address space pointer. This commit
should not change any of the existing functionality, but it lays down
the groundwork needed to let us properly access the region table under
the address space spinlock during page fault handling.
Instead of setting up the new address space on it's own, and only swap
to the new address space at the end, we now immediately swap to the new
address space (while still keeping the old one alive) and only revert
back to the old one if we fail at any point.
This is done to ensure that the process' active address space (aka the
contents of m_space) always matches actual address space in use by it.
That should allow us to eventually make the page fault handler process-
aware, which will let us properly lock the process address space lock.
The current way we handle sync commands is very ugly and depends on lot
of preconditions. Now that we have an end_io handler for a request, we
can use WaitQueue to do sync commands more elegantly.
This does depend on block layer sending one request at a time but this
change is a step forward towards better IO handling.
There was a private variable named m_current_request which was used to
track a single request at a time. This guarantee is given by the block
layer where we wait on each IO. This design will break down in the
driver once the block layer removes that constraint.
Redesign the IO handling in a completely asynchronous way by maintaining
requests up to queue depth. NVMeIO struct is introduced to track an IO
submitted along with other information such whether the IO is still
being processed and an endio callback which will be called during the
end of a request.
A hashmap private variable is created which will key based on the
command id of a request with a value of NVMeIO. endio handler will come
in handy if we are doing a sync request and we want to wake up the wait
queue during the end.
This change also simplified the code by removing some special condition
in submit_sqe function, etc that were marked as FIXME for a long time.
Using sq_tail as cid makes an inherent assumption that we send only
one IO at a time. Use an atomic variable instead for command id of a
submission queue entry.
As sq_tail is not used as cid anymore, remove m_prev_sq_tail which used
to hold the last used sq_tail value.
The SID was duplicated between the process credentials and protected
data. And to make matters worse, the credentials SID was not updated in
sys$setsid.
This patch fixes this by removing the SID from protected data and
updating the credentials SID everywhere.
This closes two race windows:
- ProcessGroup removed itself from the "all process groups" list in its
destructor. It was possible to walk the list between the last unref()
and the destructor invocation, and grab a pointer to a ProcessGroup
that was about to get deleted.
- sys$setsid() could end up creating a process group that already
existed, as there was a race window between checking if the PGID
is used, and actually creating a ProcessGroup with that PGID.
Now that it's no longer using LockRefPtr, we can actually move it into
protected data. (LockRefPtr couldn't be stored there because protected
data is immutable at times, and LockRefPtr uses some of its own bits
for locking.)