Add polling support to NVMe so that it does not use interrupt to
complete a IO but instead actively polls for completion. This probably
is not very efficient in terms of CPU usage but it does not use
interrupts to complete a IO which is beneficial at the moment as there
is no MSI(X) support and it can reduce the latency of an IO in a very
fast NVMe device.
The NVMeQueue class has been made the base class for NVMeInterruptQueue
and NVMePollQueue. The factory function `NVMeQueue::try_create` will
return the appropriate queue to the controller based on the polling
boot parameter.
The polling mode can be enabled by adding an extra boot parameter:
`nvme_poll`.
There's no need to perform it this early, and until the MemoryManager
is initialized we have very limited kmalloc capacity, so let's try and
keep anything that's not required to be there out of there.
This device will assist userspace to manage hotplug events.
A userspace application reads a DeviceEvent entry until the return value
is zero which indicates no events that are queued and waiting for
processing.
Trying to read with a buffer smaller than sizeof(DeviceEvent) results in
EOVERFLOW.
For now, there's no ioctl mechanism for this device but in the future an
acknowledgement mechanism can be implemented via ioctl(2) interface.
Currently the APIC class is constructed irrespective of whether it
is used or not.
So, move APIC initialization from init to the InterruptManagement
class and construct the APIC class only when it is needed.
The function to protect ksyms after initialization, is only used during
boot of the system, so it can be UNMAP_AFTER_INIT as well.
This requires we switch the order of the init sequence, so we now call
`MM.protect_ksyms_after_init()` before `MM.unmap_text_after_init()`.
The Prekernel's memory is only accessed until MemoryManager has been
initialized. Keeping them around afterwards is both unnecessary and bad,
as it prevents the userland from using the 0x100000-0x155000 virtual
address range.
Co-authored-by: Idan Horowitz <idan.horowitz@gmail.com>
Since this range is mapped in already in the kernel page directory, we
can initialize it before jumping into the first kernel process which
lets us avoid mapping in the range into init_stage2's address space.
This brings us half-way to removing the shared bottom 2 MiB mapping in
every process, leaving only the Prekernel.
We can leave the .ksyms section mapped-but-read-only and then have the
symbols index simply point into it.
Note that we manually insert null-terminators into the symbols section
while parsing it.
This gets rid of ~950 KiB of kmalloc_eternal() at startup. :^)
This small change allows to use the IOAPIC by default without to enable
SMP mode, which emulates Uni-Processor setup with IOAPIC instead of
using the PIC.
This opens the opportunity to utilize other types of interrupts like MSI
and MSI-X interrupts.
The platform independent Processor.h file includes the shared processor
code and includes the specific platform header file.
All references to the Arch/x86/Processor.h file have been replaced with
a reference to Arch/Processor.h.
This singleton simplifies many aspects that we struggled with before:
1. There's no need to make derived classes of Device expose the
constructor as public anymore. The singleton is a friend of them, so he
can call the constructor. This solves the issue with try_create_device
helper neatly, hopefully for good.
2. Getting a reference of the NullDevice is now being done from this
singleton, which means that NullDevice no longer needs to use its own
singleton, and we can apply the try_create_device helper on it too :)
3. We can now defer registration completely after the Device constructor
which means the Device constructor is merely assigning the major and
minor numbers of the Device, and the try_create_device helper ensures it
calls the after_inserting method immediately after construction. This
creates a great opportunity to make registration more OOM-safe.
Both should reside in the SysFS firmware directory which is normally
located in /sys/firmware.
Also, apply some OOM-safety patterns when creating the BIOS and ACPI
directories.
This will somwhat help unify them also under the same SysFS directory in
the commit.
Also, it feels much more like this change reflects the reality that both
ACPI and the BIOS are part of the firmware on x86 computers.
Let's remove the DynamicParser class, as it really did nothing yet in
the Kernel. Instead, when we add support for AML parsing, we can figure
out how to do it properly without the need of a derived class that just
complicates everything for no good reason.
This is really a basic support for AHCI hotplug events, so we know how
to add a node representing the device in /sys/dev/block and removing it
according to the event type (insertion/removal).
This change doesn't take into account what happens if the device was
mounted or a read/write operation is being handled.
For this to work correctly, StorageManagement now uses the Singleton
container, as it might be accessed simultaneously from many CPUs
for hotplug events. DiskPartition holds a WeakPtr instead of a RefPtr,
to allow removal of a StorageDevice object from the heap.
StorageDevices are now stored and being referenced to via an
IntrusiveList to make it easier to remove them on hotplug event.
In future changes, all of the stated above might change, but for now,
this commit represents the least amount of changes to make everything
to work correctly.
These files are not marked as block devices or character devices so they
are not meant to be used as device nodes. The filenames are formatted to
the pattern "major:minor", but a Userland program need to call the parse
these format and inspect the the major and minor numbers and create the
real device nodes in /dev.
Later on, it might be a good idea to ensure we don't create new
SysFSComponents on the heap for each Device, but rather generate
them only when required (and preferably to not create a SysFSComponent
at all if possible).
This function is currently only ever used to create the init process
(SystemServer). It had a few idiosyncratic things about it that this
patch cleans up:
- Errors were returned in an int& out-param.
- It had a path for non-0 process PIDs which was never taken.
Prior to this change, both uid_t and gid_t were typedef'ed to `u32`.
This made it easy to use them interchangeably. Let's not allow that.
This patch adds UserID and GroupID using the AK::DistinctNumeric
mechanism we've already been employing for pid_t/ProcessID.
This has several benefits:
1) We no longer just blindly derefence a null pointer in various places
2) We will get nicer runtime error messages if the current process does
turn out to be null in the call location
3) GCC no longer complains about possible nullptr dereferences when
compiling without KUBSAN
This patch does three things:
- Convert the global thread list from a HashMap to an IntrusiveList
- Combine the thread list and its lock into a SpinLockProtectedValue
- Customize Thread::unref() so it locks the list while unreffing
This closes the same race window for Thread as @sin-ack's recent changes
did for Process.
Note that the HashMap->IntrusiveList conversion means that we lose O(1)
lookups, but the majority of clients of this list are doing traversal,
not lookup. Once we have an intrusive hashing solution, we should port
this to use that, but for now, this gets rid of heap allocations during
a sensitive time.
We are not using this for anything and it's just been sitting there
gathering dust for well over a year, so let's stop carrying all this
complexity around for no good reason.
This removes Pipes dependency on the UHCIController by introducing a
controller base class. This will be used to implement other controllers
such as OHCI.
Additionally, there can be multiple instances of a UHCI controller.
For example, multiple UHCI instances can be required for systems with
EHCI controllers. EHCI relies on using multiple of either UHCI or OHCI
controllers to drive USB 1.x devices.
This means UHCIController can no longer be a singleton. Multiple
instances of it can now be created and passed to the device and then to
the pipe.
To handle finding and creating these instances, USBManagement has been
introduced. It has the same pattern as the other management classes
such as NetworkManagement.
The Clang error message reads like this (`-Wdeprecated-array-compare`):
> error: comparison between two arrays is deprecated; to compare
> array addresses, use unary '+' to decay operands to pointers.
When I laid down the foundation for the start of the big process lock
separation, I added asserts to all system call implementations to
validate we hold the big process lock in the locations we think we
should be.
Adding that assert to sys$profiling_enable broke boot time profiling as
we were never holding the lock on boot. Even though it's not technically
required, lets make sure to hold the lock while enabling to appease the
assert.
This enables further work on implementing KASLR by adding relocation
support to the pre-kernel and updating the kernel to be less dependent
on specific virtual memory layouts.
This allows us to specify virtual addresses for things the kernel should
access via virtual addresses later on. By doing this we can make the
kernel independent from specific physical addresses.
Previously the kernel relied on a fixed offset between virtual and
physical addresses based on the kernel's load address. This allows us
to specify an independent offset.
GCC and Clang allow us to inject a call to a function named
__sanitizer_cov_trace_pc on every edge. This function has to be defined
by us. By noting down the caller in that function we can trace the code
we have encountered during execution. Such information is used by
coverage guided fuzzers like AFL and LibFuzzer to determine if a new
input resulted in a new code path. This makes fuzzing much more
effective.
Additionally this adds a basic KCOV implementation. KCOV is an API that
allows user space to request the kernel to start collecting coverage
information for a given user space thread. Furthermore KCOV then exposes
the collected program counters to user space via a BlockDevice which can
be mmaped from user space.
This work is required to add effective support for fuzzing SerenityOS to
the Syzkaller syscall fuzzer. :^) :^)
Despite what the declaration would have us believe these are not "u8*".
If they were we wouldn't have to use the & operator to get the address
of them and then cast them to "u8*"/FlatPtr afterwards.