Bulk transfers also use Normal TRBs, so move the reusable normal TRB
setup code from submit_async_interrupt_transfer into a new function
prepare_normal_transfer.
submit_bulk_transfer and submit_async_interrupt_transfer use this
function and then either block on the completion or submit it
asynchronously and wrap it into a PeriodicPendingTransfer.
Instead of using a raw `KBuffer` and letting each implementation to
populating the specific flags on its own, we change things so we only
let each FileSystem implementation to validate the flag and its value
but then store it in a HashMap which its key is the flag name and
the value is a special new class called `FileSystemSpecificOption`
which wraps around `AK::Variant<...>`.
This approach has multiple advantages over the previous:
- It allows runtime inspection of what the user has set on a `MountFile`
description for a specific filesystem.
- It ensures accidental overriding of filesystem specific option that
was already set is not possible
- It removes ugly casting of a `KBuffer` contents to a strongly-typed
values. Instead, a strongly-typed `AK::Variant` is used which ensures
we always get a value without doing any casting.
Please note that we have removed support for ASCII string-oriented flags
as there were no actual use cases, and supporting such type would make
`FileSystemSpecificOption` more complicated unnecessarily for now.
The sysret instruction restores the rflags value from the r11 register.
Before, we expected that the value in RegisterState::r11 is still the
rflags value saved by syscall and therefore didn't copy
RegisterState::rflags to r11 before the sysret.
But signal handlers and ptrace can change the value in
RegisterState::r11 while we are handling a syscall, so we shouldn't
assume that it still contains the saved rflags.
While handling a syscall the contents of RegisterState::rflags may also
have been updated by e.g. ptrace in which case we should restore the
updated rflags, not the original state on syscall entry.
This fixes incorrect assumptions about the layout of descriptors and
gets rid of all the pointer arithmetic.
The USB spec doesn't define a strict order for all descriptors to appear
in.
It just says that endpoint descriptors follow its interface descriptor.
We also have to skip all unknown descriptors (which also can appear
anywhere), not just HID descriptors.
The superblock of an ext2 filesystem is always found on the storage
device at offset 1024. This 1024 number was hardcoded in the Ext2FS
code.
This commit:
* adds a constexpr to replace the hardcoded 1024 values
* removes a comment about one of the the hardcoded 1024 values which is
now umnecessary
All USB Devices (including hubs) need to have a configuration set before
you can use them. We already do this in other USB drivers, but forgot to
do it for USB hubs as well.
This adds a minimal (that is, just enough to make USB mouse/keyboard
work) implementation of an xHCI driver, to let us use serenity on
modern baremetal machines.
Both the calculation for the interface descriptor address and the
endpoint descriptors addresses (for second interface and above) were
incorrect, and would read the wrong data (and go out-of-bounds as well)
Previously USB::Pipe would just try to poorly maintain copies of some
of the relevant properties of USB::Device. (address & speed)
Now it just holds a reference to it's owning device and can query them
when needed.
The current USB::Device::enumerate_device() implementation is UHCI
specific, and is not relevant for xHCI controllers.
Move it to a USBController virtual method to allow different
implementations for the other controller types.
This avoids spurious interrupts during APIC timer calibration, as the
timer might otherwise immediately generate an interrupt when enabling
interrupts if the initial count was at a low enough value.
The LibELF validate_program_headers method tried to do too many things
at once, and as a result, we had an awkward return type from it.
To be able to simplify it, we no longer allow passing a StringBuilder*
but instead we require to pass an Optional<Elf_Phdr> by reference so
it could be filled with actual ELF program header that corresponds to
an INTERP header if such found.
As a result, we ensure that only certain implementations that actually
care about the ELF interpreter path will actually try to load it on
their own and if they fail, they can have better diagnostics for an
invalid INTERP header.
This change also fixes a bug that on which we failed to execute an ELF
program if the INTERP header is located outside the first 4KiB page of
the ELF file, as the kernel previously didn't have support for looking
beyond that for that header.
While extremely unlikely, it's possible to change the dynamic loader
to a non regular file, which will result in a kernel panic upon VERIFY
of the `interpreter_description->inode()` statement.
It makes no sense to do all of the loading work just to figure out that
the ELF file is an object file that is a result of compiling and not
an actual executable.
In addition to that, we should disallow running coredumps as well, so
the condition is changed now to only allow ET_DYN or ET_EXEC ELF files.
Plain old VGA text mode functionality was introduced in 1987, and is
obviously still used on some (even modern) x86 machines.
However, it's very limited in what it gives to us, because by using a
80x25 text mode console, it's guaranteed that no desktop functionality
is available during such OS runtime session.
It's also quite complicated to handle access arbitration on the VGA ISA
ports which means that only one VGA card can work in VGA mode, which
makes it very cumbersome to manage multiple cards at once.
Since we never relied on the VGA text mode console for anything serious,
as booting on a QEMU machine always gives a proper framebuffer to work
with, VGA text mode console was used in bare metal sessions due to lack
of drivers.
However, since we "force" multiboot-compatible bootloaders to provide us
a framebuffer, it's basically a non-issue to have a functional console
on bare metal machines even if we don't have the required drivers.
This class will be used in a situation where we simply don't have a
working framebuffer console, but we still want to boot (without having a
screen being attached, or an actual proper GPU driver on bare metal, for
example).
These 2 syscalls are responsible for unsharing resources in the system,
such as hostname, VFS root contexts and process lists.
Together with an appropriate userspace implementation, these syscalls
could be used for creating a sandbox environment (containers) for user
programs.
Similarly to VFSRootContext and ScopedProcessList, this class intends
to form resource isolation as well.
We add this class as an infrastructure preparation of hostname contexts
which should allow processes to obtain different hostnames on the same
machine.
There's no point in constructing an object just for the sake of keeping
a state that can be touched by anything in the kernel code.
Let's reduce everything to be in a C++ namespace called with the
previous name "VirtualFileSystem" and keep a smaller textual-footprint
struct called "VirtualFileSystemDetails".
This change also cleans up old "friend class" statements that were no
longer needed, and move methods from the VirtualFileSystem code to more
appropriate places as well.
Please note that the method of locking all filesystems during shutdown
is removed, as in that place there's no meaning to actually locking all
filesystems because of running in kernel mode entirely.
This new syscall will be used by the upcoming runc (run-container)
utility.
In addition to that, this syscall allows userspace to neatly copy RAMFS
instances to other places, which was not possible in the past.
The whole concept of Jails was far more complicated than I actually want
it to be, so let's reduce the complexity of how it works from now on.
Please note that we always leaked the attach count of a Jail object in
the fork syscall if it failed midway.
Instead, we should have attach to the jail just before registering the
new Process, so we don't need to worry about unsuccessful Process
creation.
The reduction of complexity in regard to jails means that instead of
relying on jails to provide PID isolation, we could simplify the whole
idea of them to be a simple SetOnce, and let the ProcessList (now called
ScopedProcessList) to be responsible for this type of isolation.
Therefore, we apply the following changes to do so:
- We make the Jail concept no longer a class of its own. Instead, we
simplify the idea of being jailed to a simple ProtectedValues boolean
flag. This means that we no longer check of matching jail pointers
anywhere in the Kernel code.
To set a process as jailed, a new prctl option was added to set a
Kernel SetOnce boolean flag (so it cannot change ever again).
- We provide Process & Thread methods to iterate over process lists.
A process can either iterate on the global process list, or if it's
attached to a scoped process list, then only over that list.
This essentially replaces the need of checking the Jail pointer of a
process when iterating over process lists.
Expose some initial interfaces in the mount-related syscalls to select
the desired VFSRootContext, by specifying the VFSRootContext index
number.
For now there's still no way to create a different VFSRootContext, so
the only valid IDs are -1 (for currently attached VFSRootContext) or 1
for the first userspace VFSRootContext.
The VFSRootContext class, as its name suggests, holds a context for a
root directory with its mount table and the root custody/inode in the
same class.
The idea is derived from the Linux mount namespace mechanism.
It mimicks the concept of the ProcessList object, but it is adjusted for
a root directory tree context.
In contrast to the ProcessList concept, processes that share the default
VFSRootContext can't see other VFSRootContext related properties such as
as the mount table and root custody/inode.
To accommodate to this change progressively, we internally create 2 main
VFS root contexts for now - one for kernel processes (as they don't need
to care about VFS root contexts for the most part), and another for all
userspace programs.
This separation allows us to continue pretending for userspace that
everything is "normal" as it is used to be, until we introduce proper
interfaces in the mount-related syscalls as well as in the SysFS.
We make VFSRootContext objects being listed, as another preparation
before we could expose interfaces to userspace.
As a result, the PowerStateSwitchTask now iterates on all contexts
and tear them down one by one.
AnonymousVMObject::try_clone() computed how many shared cow pages to
commit by counting all VMObject pages that were not shared_zero_pages.
This means that lazy_committed_pages were also being included in the
count. This is a problem because the page fault handling code for
lazy_committed_pages does not allocate from
m_shared_committed_cow_pages. So more pages than necessary were being
committed.
This fixes this overcommitting problem by skipping lazy_committed_pages
when counting how many pages to commit.
This commit introduces VMObject::remap_regions_single_page(). This
method remaps a single page in all regions associated with a VMObject.
This is intended to be a more efficient replacement for remap_regions()
in cases where only a single page needs to be remapped.
This commit also updates the cow page fault handling code to use this
new method.
Writes to a MAP_SHARED | MAP_ANONYMOUS mmap region were not visible to
other processes sharing the mmap region. This was happening because the
page fault handler was not remapping the VMObject's m_regions after
allocating a new page.
This commit fixes the problem by calling remap_regions() after assigning
a new page to the VMObject in the page fault handler. This remapping
only occurs for shared Regions.
This commit makes the following minor changes to handle_zero_fault():
* cleans up a call to static_cast(), replacing it with a reference (a
future commit will also use this reference).
* replaces a call to vmobject() with the new reference mentioned above.
* moves the definition of already_handled to inside the block where
already_handled is used.
After a fork(), page faults on anonymous mmaps can cause a redundant
page fault to occur.
This happens because VMObjects for anonymous mmaps are initially filled
with references to the lazy_committed_page or shared_zero_page. If there
is a fork, VMObject::try_clone() is called and all pages of the VMObject
are marked as cow (via the m_cow_map).
Page faults on a zero/lazy page are handled by handle_zero_fault().
handle_zero_fault() does not update m_cow_map, so if the page was marked
cow before the fault, it will still be marked cow after the fault. This
causes a second (redundant) page fault when the CPU retries the write.
This commit removes the redundant page fault by not marking zero/lazy
pages as cow in m_cow_map.
AddressSpace::try_allocate_split_region() was updating the cow map of
new_region based on the cow map of source_region.
The problem is that both new_region and source_region reference the
same vmobject and the same cow map, so these cow map updates didn't
actually change anything.
This commit:
* removes the cow map updates from try_allocate_split_region()
* removes Region::set_should_cow() since it is no longer used
InodeVMObjects now track dirty and clean pages. This tracking of
dirty and clean pages is used by the msync and purge syscalls.
dirty page tracking works using the following rules:
* when a new InodeVMObject is made, all pages are marked clean.
* writes to clean InodeVMObject pages will cause a page fault,
the fault handler will mark the page as dirty.
* writes to dirty InodeVMObject pages do not cause page faults.
* if msync is called, only dirty pages are flushed to storage (and
marked clean).
* if purge syscall is called, only clean pages are discarded.
This device was a short-lived solution to allow userspace (WindowServer)
to easily support hotplugging mouse devices with presumably very small
modifications on userspace side.
Now that we have a proper mechanism to propagate hotplug events from the
DeviceMapper program to any program that needs to get such events, we no
longer need this device, so let's remove it.
After the previous commit, we are able to create a comprehensive list of
all devices' major number allocations.
To help userspace to distinguish between character and block devices, we
expose 2 sysfs nodes so userspace can decide which list it needs to open
in order to iterate on it.