Instead of using a raw `KBuffer` and letting each implementation to
populating the specific flags on its own, we change things so we only
let each FileSystem implementation to validate the flag and its value
but then store it in a HashMap which its key is the flag name and
the value is a special new class called `FileSystemSpecificOption`
which wraps around `AK::Variant<...>`.
This approach has multiple advantages over the previous:
- It allows runtime inspection of what the user has set on a `MountFile`
description for a specific filesystem.
- It ensures accidental overriding of filesystem specific option that
was already set is not possible
- It removes ugly casting of a `KBuffer` contents to a strongly-typed
values. Instead, a strongly-typed `AK::Variant` is used which ensures
we always get a value without doing any casting.
Please note that we have removed support for ASCII string-oriented flags
as there were no actual use cases, and supporting such type would make
`FileSystemSpecificOption` more complicated unnecessarily for now.
The superblock of an ext2 filesystem is always found on the storage
device at offset 1024. This 1024 number was hardcoded in the Ext2FS
code.
This commit:
* adds a constexpr to replace the hardcoded 1024 values
* removes a comment about one of the the hardcoded 1024 values which is
now umnecessary
There's no point in constructing an object just for the sake of keeping
a state that can be touched by anything in the kernel code.
Let's reduce everything to be in a C++ namespace called with the
previous name "VirtualFileSystem" and keep a smaller textual-footprint
struct called "VirtualFileSystemDetails".
This change also cleans up old "friend class" statements that were no
longer needed, and move methods from the VirtualFileSystem code to more
appropriate places as well.
Please note that the method of locking all filesystems during shutdown
is removed, as in that place there's no meaning to actually locking all
filesystems because of running in kernel mode entirely.
This new syscall will be used by the upcoming runc (run-container)
utility.
In addition to that, this syscall allows userspace to neatly copy RAMFS
instances to other places, which was not possible in the past.
The whole concept of Jails was far more complicated than I actually want
it to be, so let's reduce the complexity of how it works from now on.
Please note that we always leaked the attach count of a Jail object in
the fork syscall if it failed midway.
Instead, we should have attach to the jail just before registering the
new Process, so we don't need to worry about unsuccessful Process
creation.
The reduction of complexity in regard to jails means that instead of
relying on jails to provide PID isolation, we could simplify the whole
idea of them to be a simple SetOnce, and let the ProcessList (now called
ScopedProcessList) to be responsible for this type of isolation.
Therefore, we apply the following changes to do so:
- We make the Jail concept no longer a class of its own. Instead, we
simplify the idea of being jailed to a simple ProtectedValues boolean
flag. This means that we no longer check of matching jail pointers
anywhere in the Kernel code.
To set a process as jailed, a new prctl option was added to set a
Kernel SetOnce boolean flag (so it cannot change ever again).
- We provide Process & Thread methods to iterate over process lists.
A process can either iterate on the global process list, or if it's
attached to a scoped process list, then only over that list.
This essentially replaces the need of checking the Jail pointer of a
process when iterating over process lists.
Expose some initial interfaces in the mount-related syscalls to select
the desired VFSRootContext, by specifying the VFSRootContext index
number.
For now there's still no way to create a different VFSRootContext, so
the only valid IDs are -1 (for currently attached VFSRootContext) or 1
for the first userspace VFSRootContext.
The VFSRootContext class, as its name suggests, holds a context for a
root directory with its mount table and the root custody/inode in the
same class.
The idea is derived from the Linux mount namespace mechanism.
It mimicks the concept of the ProcessList object, but it is adjusted for
a root directory tree context.
In contrast to the ProcessList concept, processes that share the default
VFSRootContext can't see other VFSRootContext related properties such as
as the mount table and root custody/inode.
To accommodate to this change progressively, we internally create 2 main
VFS root contexts for now - one for kernel processes (as they don't need
to care about VFS root contexts for the most part), and another for all
userspace programs.
This separation allows us to continue pretending for userspace that
everything is "normal" as it is used to be, until we introduce proper
interfaces in the mount-related syscalls as well as in the SysFS.
We make VFSRootContext objects being listed, as another preparation
before we could expose interfaces to userspace.
As a result, the PowerStateSwitchTask now iterates on all contexts
and tear them down one by one.
After the previous commit, we are able to create a comprehensive list of
all devices' major number allocations.
To help userspace to distinguish between character and block devices, we
expose 2 sysfs nodes so userspace can decide which list it needs to open
in order to iterate on it.
Instead of putting everything in one hash map, let's distinguish between
the devices based on their type.
This change makes the devices semantically separated, and is considered
a preparation before we could expose a comprehensive list of allocations
per major numbers and their purpose.
With this change, we no longer preallocate blocks when an inode's size
is updated, and instead only allocate the minimum amount of blocks when
the inode is actually written to.
This is a large commit, since this is essentially a complete rewrite of
the key low-level functions that handle reading/writing blocks. This is,
however, a necessary prerequisite of being able to write holes.
The previous version of `flush_block_list()` (along with its numerous
helper functions) was entirely reliant on all blocks being sequential.
In contrast to the previous implementation, the new version
of `flush_block_list()` simply writes out the difference between the old
block list and the new block list by calculating the correct indirect
block(s) to update based on the relevant block's logical index.
`compute_block_list()` has also been rewritten, since the estimated
amount of meta blocks was incorrectly calculated for files with holes as
a result of the estimated amount of blocks being a function of the file
size. Since it isn't possible to accurately compute the shape of the
block list without traversing it, we no longer try to perform such a
computation, and instead simply search through all of the allocated
indirect blocks.
`compute_block_list_with_meta_blocks()` has also been removed in favor
of the new `compute_meta_blocks()`, since meta blocks are fundamentally
distinct from data blocks due to there being no mapping between any
logical block index and the physical block index.
Since we now only store blocks that are actually allocated, it is
entirely valid for the block list to be empty, so this commit lifts the
restrictions on accessing inodes with an empty block list.
It can be possible for a request to be blocked on another request, so
this patch allows us to send more requests even when a request is
already pending.
To be able to do this, we add a new class called CustodyBase, which can
be resolved on-demand internally in the VirtualFileSystem resolving path
code.
When being resolved, CustodyBase will return a known custody if it was
constructed with such, if that's not the case it will provide the root
custody if the original path is absolute.
Lastly, if that's not the case as well, it will resolve the given dirfd
to provide a Custody object.
This is a large commit because it implements a lot of stuff to make
add_child simpler to get working. This allows us to create new files
on a FAT partition.
This is the first part of write support, it allows for full file
modification, but no creating or removing files yet.
Co-Authored-By: implicitfield <114500360+implicitfield@users.noreply.github.com>
Caching the cluster list allows us to fill the two fields in the
InodeMetadata. While at it, don't cache the metadata as when we
have write support having to keep both InodeMetadata and FATEntry
correct is going to get very annoying.
dbgln() will always take its arguments by reference when possible, which
causes UB when dealing with packed structs. To avoid this, we now
explicitly copy all members whose alignment requirements aren't met.
These changes are compatible with clang-format 16 and will be mandatory
when we eventually bump clang-format version. So, since there are no
real downsides, let's commit them now.
Either we mount from a loop device or other source, the user might want
to obfuscate the given source for security reasons, so this option will
ensure this will happen.
If passed during a mount, the source will be hidden when reading from
the /sys/kernel/df node.
Similarly to DevPtsFS, this filesystem is about exposing loop device
nodes easily in /dev/loop, so userspace doesn't need to do anything in
order to use new devices immediately.