When ProcFS could no longer allocate KBuffer objects to serve calls to
read, it would just return 0, indicating EOF. This then triggered
parsing errors because code assumed it read the file.
Because read isn't supposed to return ENOMEM, change ProcFS to populate
the file data upon file open or seek to the beginning. This also means
that calls to open can now return ENOMEM if needed. This allows the
caller to either be able to successfully open the file and read it, or
fail to open it in the first place.
Problem:
- `(void)` simply casts the expression to void. This is understood to
indicate that it is ignored, but this is really a compiler trick to
get the compiler to not generate a warning.
Solution:
- Use the `[[maybe_unused]]` attribute to indicate the value is unused.
Note:
- Functions taking a `(void)` argument list have also been changed to
`()` because this is not needed and shows up in the same grep
command.
This makes the Scheduler a lot leaner by not having to evaluate
block conditions every time it is invoked. Instead evaluate them as
the states change, and unblock threads at that point.
This also implements some more waitid/waitpid/wait features and
behavior. For example, WUNTRACED and WNOWAIT are now supported. And
wait will now not return EINTR when SIGCHLD is delivered at the
same time.
Since we're using UserOrKernelBuffers, SMAP will be automatically
disabled when we actually access the buffer later on. There's no need
to disable it wholesale across the entire read/write operations.
Before e06362de94 this was a sneaky buffer
overflow. BufferStream did not do range checking and continued to write
past the allocated buffer (the size of which was controlled by the
user.)
The issue surfaced after my changes because OutputMemoryStream does
range checking.
Not sure how exploitable that bug was, directory entries are somewhat
controllable by the user but the buffer was on the heap, so exploiting
that should be tough.
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.
So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.
To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.
Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
Instead of FileDescriptor branching on the type of File it's wrapping,
add a File::stat() function that can be overridden to provide custom
behavior for the stat syscalls.
A change introduced in 5e01234 made it the resposibility of each
filesystem to have the file types returned from
'traverse_as_directory' match up with the DT_* types.
However, this caused corruption of the Ext2FS file format because
the Ext2FS uses 'traverse_as_directory' internally when manipulating
the file system. The result was a mixture between EXT2_FT_* and DT_*
file types in the internal Ext2FS structures.
Starting with this commit, the conversion from internal filesystem file
types to the user facing DT_* types happens at a later stage,
in the 'FileDescription::get_dir_entries' function which is directly
used by sys$get_dir_entries.
Unlike DirectoryEntry (which is used when constructing directories),
DirectoryEntryView does not manage storage for file names. Names are
just StringViews.
This is much more suited to the directory traversal API and makes
it easier to implement this in file system classes since they no
longer need to create temporary name copies while traversing.
In #3001 I was trying to fix result propagation issues, and
I actually just introduced another one. Luckily Ben spotted
it in the diff after it was in the tree, thanks Ben!
This fixes a bug where the mode of a FIFO was reported as 001000 instead
of 0010000 (you see the difference? me nethier), and hopefully doesn't
introduce new bugs. I've left 0777 and similar in a few places, because
that is *more* readable than its symbolic version.
We're going to make use of it in the next commit. But the idea is we want to
know how this File (more specifically, InodeFile) was opened in order to decide
how chown()/chmod() should behave, in particular whether it should be allowed or
not. Note that many other File operations, such as read(), write(), and ioctl(),
already require the caller to pass a FileDescription.
Allow file system implementation to return meaningful error codes to
callers of the FileDescription::read_entire_file(). This allows both
Process::sys$readlink() and Process::sys$module_load() to return more
detailed errors to the user.
In contrast to the previous patchset that was reverted, this time we use
a "special" method to access a file with block size of 512 bytes (like
a harddrive essentially).
You can now mmap a file as private and writable, and the changes you
make will only be visible to you.
This works because internally a MAP_PRIVATE region is backed by a
unique PrivateInodeVMObject instead of using the globally shared
SharedInodeVMObject like we always did before. :^)
Fixes#1045.
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.
For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.
Going forward, all new source files should include a license header.
Supervisor Mode Access Prevention (SMAP) is an x86 CPU feature that
prevents the kernel from accessing userspace memory. With SMAP enabled,
trying to read/write a userspace memory address while in the kernel
will now generate a page fault.
Since it's sometimes necessary to read/write userspace memory, there
are two new instructions that quickly switch the protection on/off:
STAC (disables protection) and CLAC (enables protection.)
These are exposed in kernel code via the stac() and clac() helpers.
There's also a SmapDisabler RAII object that can be used to ensure
that you don't forget to re-enable protection before returning to
userspace code.
THis patch also adds copy_to_user(), copy_from_user() and memset_user()
which are the "correct" way of doing things. These functions allow us
to briefly disable protection for a specific purpose, and then turn it
back on immediately after it's done. Going forward all kernel code
should be moved to using these and all uses of SmapDisabler are to be
considered FIXME's.
Note that we're not realizing the full potential of this feature since
I've used SmapDisabler quite liberally in this initial bring-up patch.
In order to ensure a specific owner and mode when the local socket
filesystem endpoint is instantiated, we need to be able to call
fchmod() and fchown() on a socket fd between socket() and bind().
This is because until we call bind(), there is no filesystem inode
for the socket yet.
This code never worked, as was never used for anything. We can build
a much better SHM implementation on top of TmpFS or similar when we
get to the point when we need one.
Files opened with O_DIRECT will now bypass the disk cache in read/write
operations (though metadata operations will still hit the disk cache.)
This will allow us to test actual disk performance instead of testing
disk *cache* performance, if that's what we want. :^)
There's room for improvment here, we're very aggressively flushing any
dirty cache entries for the specific block before reading/writing that
block. This is done by walking the entire cache, which may be slow.
I have no idea why this was here. It makes no sense. If you're trying
to find out if something is a directory, why wouldn't you be allowed to
ask that about a FIFO? :^)
Thanks to Brandon for spotting this!
Also, while we're here, cache the directory state in a bool member so
we don't have to keep fetching inode metadata when checking this
repeatedly. This is important since sys$read() now calls it.