There's no point in constructing an object just for the sake of keeping
a state that can be touched by anything in the kernel code.
Let's reduce everything to be in a C++ namespace called with the
previous name "VirtualFileSystem" and keep a smaller textual-footprint
struct called "VirtualFileSystemDetails".
This change also cleans up old "friend class" statements that were no
longer needed, and move methods from the VirtualFileSystem code to more
appropriate places as well.
Please note that the method of locking all filesystems during shutdown
is removed, as in that place there's no meaning to actually locking all
filesystems because of running in kernel mode entirely.
The VFSRootContext class, as its name suggests, holds a context for a
root directory with its mount table and the root custody/inode in the
same class.
The idea is derived from the Linux mount namespace mechanism.
It mimicks the concept of the ProcessList object, but it is adjusted for
a root directory tree context.
In contrast to the ProcessList concept, processes that share the default
VFSRootContext can't see other VFSRootContext related properties such as
as the mount table and root custody/inode.
To accommodate to this change progressively, we internally create 2 main
VFS root contexts for now - one for kernel processes (as they don't need
to care about VFS root contexts for the most part), and another for all
userspace programs.
This separation allows us to continue pretending for userspace that
everything is "normal" as it is used to be, until we introduce proper
interfaces in the mount-related syscalls as well as in the SysFS.
We make VFSRootContext objects being listed, as another preparation
before we could expose interfaces to userspace.
As a result, the PowerStateSwitchTask now iterates on all contexts
and tear them down one by one.
To be able to do this, we add a new class called CustodyBase, which can
be resolved on-demand internally in the VirtualFileSystem resolving path
code.
When being resolved, CustodyBase will return a known custody if it was
constructed with such, if that's not the case it will provide the root
custody if the original path is absolute.
Lastly, if that's not the case as well, it will resolve the given dirfd
to provide a Custody object.
Such operation is almost equivalent to writing on an Inode, so lock the
Inode m_inode_lock exclusively.
All FileSystem Inode implementations then override a new method called
truncate_locked which should implement the actual truncating.
"Wherever applicable" = most places, actually :^), especially for
networking and filesystem timestamps.
This includes changes to unzip, which uses DOSPackedTime, since that is
changed for the FAT file systems.
That's what this class really is; in fact that's what the first line of
the comment says it is.
This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
The existing `read_entire` is quite slow due to allocating and copying
multiple times, but it is simultaneously quite hard to get rid of in a
single step. As a replacement, add a new function that reads as much as
possible directly into a user-provided buffer.
This was mostly straightforward, as all the storage locations are
guarded by some related mutex.
The use of old-school associated mutexes instead of MutexProtected
is unfortunate, but the process to modernize such code is ongoing.
There was a bug in which bound Inodes would lose all their references
(because localsocket does not reference them), and they would be
deallocated, and clients would get ECONNREFUSED as a result. now
LocalSocket has a strong reference to inode so that the inode will live
as long as the socket, and Inode has a weak reference to the socket,
because if the socket stops being referenced anywhere it should not be
bound.
This still prevents the reference loop that
220b7dd779 was trying to fix.
This step would ideally not have been necessary (increases amount of
refactoring and templates necessary, which in turn increases build
times), but it gives us a couple of nice properties:
- SpinlockProtected inside Singleton (a very common combination) can now
obtain any lock rank just via the template parameter. It was not
previously possible to do this with SingletonInstanceCreator magic.
- SpinlockProtected's lock rank is now mandatory; this is the majority
of cases and allows us to see where we're still missing proper ranks.
- The type already informs us what lock rank a lock has, which aids code
readability and (possibly, if gdb cooperates) lock mismatch debugging.
- The rank of a lock can no longer be dynamic, which is not something we
wanted in the first place (or made use of). Locks randomly changing
their rank sounds like a disaster waiting to happen.
- In some places, we might be able to statically check that locks are
taken in the right order (with the right lock rank checking
implementation) as rank information is fully statically known.
This refactoring even more exposes the fact that Mutex has no lock rank
capabilites, which is not fixed here.
This commit makes it possible for a process to downgrade a file lock it
holds from a write (exclusive) lock to a read (shared) lock. For this,
the process must point to the exact range of the flock, and must be the
owner of the lock.
Because the ".." entry in a directory is a separate inode, if a
directory is renamed to a new location, then we should update this entry
the point to the new parent directory as well.
Co-authored-by: Liav A <liavalb@gmail.com>
We no longer require to lock the m_inode_lock in the SharedInodeVMObject
code as the methods write_bytes and read_bytes of the Inode class do
this for us now.
We make these methods non-virtual because we want to ensure we properly
enforce locking of the m_inode_lock mutex. Also, for write operations,
we want to call prepare_to_write_data before the actual write. The
previous design required us to ensure the callers do that at various
places which lead to hard-to-find bugs. By moving everything to a place
where we call prepare_to_write_data only once, we eliminate a possibilty
of forgeting to call it on some code path in the kernel.
Instead of having three separate APIs (one for each timestamp),
there's now only Inode::update_timestamps() and it takes 3x optional
timestamps. The non-empty timestamps are updated while holding the inode
mutex, and the outside world no longer has to look at intermediate
timestamp states.
By protecting all the RefPtr<Custody> objects that may be accessed from
multiple threads at the same time (with spinlocks), we remove the need
for using LockRefPtr<Custody> (which is basically a RefPtr with a
built-in spinlock.)
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:
- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable
This patch renames the Kernel classes so that they can coexist with
the original AK classes:
- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable
The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
All users which relied on the default constructor use a None lock rank
for now. This will make it easier to in the future remove LockRank and
actually annotate the ranks by searching for None.
We ensure that when we call SharedInodeVMObject::sync we lock the inode
lock before calling Inode virtual write_bytes method directly to avoid
assertion on the unlocked inode lock, as it was regressed recently. This
is not a complete fix as the need to lock from each path before calling
the write_bytes method should be avoided because it can lead to
hard-to-find bugs, and this commit only fixes the problem temporarily.
Instead of requiring each FileSystem implementation to call this method
when trying to write data, do the calls at 2 points to avoid further
calls (or lack of them due to not remembering to use it) at other files
and locations in the codebase.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
Rename the bound socket accessor from socket() to bound_socket().
Also return RefPtr<LocalSocket> instead of a raw pointer, to make it
harder for callers to mess up.
Previously we were uncaching inodes from TmpFSInode::one_ref_left().
This was not safe, since one_ref_left() was effectively being called
on a raw pointer after decrementing the local ref count and observing
it become 1. There was a race here where someone else could trigger
the destructor by unreffing to 0 before one_ref_left() got called,
causing us to call one_ref_left() on a deleted inode.
We fix this by using the new remove_from_secondary_lists() mechanism
in ListedRefCounted and synchronizing all access to the TmpFS inode
map with the main Inode::all_instances() lock.
There's probably a nicer way to solve this.
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.
Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
Because we were holding a strong ref to the OpenFileDescription in
LocalSocket and a strong ref to the LocalSocket in Inode, we were
creating a reference cycle in the event of the socket being cleaned up
after the file description did (i.e. unlinking the file before closing
the socket), because the file description never got destructed.