This patch implements a simple version of the futex (fast userspace
mutex) API in the kernel and uses it to make the pthread_cond_t API's
block instead of busily sched_yield().
An arbitrary userspace address is passed to the kernel as a "token"
that identifies the futex and you can then FUTEX_WAIT and FUTEX_WAKE
that specific userspace address.
FUTEX_WAIT corresponds to pthread_cond_wait() and FUTEX_WAKE is used
for pthread_cond_signal() and pthread_cond_broadcast().
I'm pretty sure I'm missing something in this implementation, but it's
hopefully okay for a start. :^)
This patch adds a single "kernel info page" that is mappable read-only
by any process and contains the current time of day.
This is then used to implement a version of gettimeofday() that doesn't
have to make a syscall.
To protect against race condition issues, the info page also has a
serial number which is incremented whenever the kernel updates the
contents of the page. Make sure to verify that the serial number is the
same before and after reading the information you want from the page.
The kernel now supports basic profiling of all the threads in a process
by calling profiling_enable(pid_t). You finish the profiling by calling
profiling_disable(pid_t).
This all works by recording thread stacks when the timer interrupt
fires and the current thread is in a process being profiled.
Note that symbolication is deferred until profiling_disable() to avoid
adding more noise than necessary to the profile.
A simple "/bin/profile" command is included here that can be used to
start/stop profiling like so:
$ profile 10 on
... wait ...
$ profile 10 off
After a profile has been recorded, it can be fetched in /proc/profile
There are various limits (or "bugs") on this mechanism at the moment:
- Only one process can be profiled at a time.
- We allocate 8MB for the samples, if you use more space, things will
not work, and probably break a bit.
- Things will probably fall apart if the profiled process dies during
profiling, or while extracing /proc/profile
This patch makes SharedBuffer use a PurgeableVMObject as its underlying
memory object.
A new syscall is added to control the volatile flag of a SharedBuffer.
It's now possible to get purgeable memory by using mmap(MAP_PURGEABLE).
Purgeable memory has a "volatile" flag that can be set using madvise():
- madvise(..., MADV_SET_VOLATILE)
- madvise(..., MADV_SET_NONVOLATILE)
When in the "volatile" state, the kernel may take away the underlying
physical memory pages at any time, without notifying the owner.
This gives you a guilt discount when caching very large things. :^)
Setting a purgeable region to non-volatile will return whether or not
the memory has been taken away by the kernel while being volatile.
Basically, if madvise(..., MADV_SET_NONVOLATILE) returns 1, that means
the memory was purged while volatile, and whatever was in that piece
of memory needs to be reconstructed before use.
The main thread of each kernel/user process will take the name of
the process. Extra threads will get a fancy new name
"ProcessName[<tid>]".
Thread backtraces now list the thread name in addtion to tid.
Add the thread name to /proc/all (should it get its own proc
file?).
Add two new syscalls, set_thread_name and get_thread_name.
It's now possible to load a .o file into the kernel via a syscall.
The kernel will perform all the necessary ELF relocations, and then
call the "module_init" symbol in the loaded module.
Add an initial implementation of pthread attributes for:
* detach state (joinable, detached)
* schedule params (just priority)
* guard page size (as skeleton) (requires kernel support maybe?)
* stack size and user-provided stack location (4 or 8 MB only, must be aligned)
Add some tests too, to the thread test program.
Also, LibC: Move pthread declarations to sys/types.h, where they belong.
This can be implemented entirely in userspace by calling tcgetattr().
To avoid screwing up the syscall indexes, this patch also adds a
mechanism for removing a syscall without shifting the index of other
syscalls.
Note that ports will still have to be rebuilt after this change,
as their LibC code will try to make the isatty() syscall on startup.
It's now possible to block until another thread in the same process has
exited. We can also retrieve its exit value, which is whatever value it
passed to pthread_exit(). :^)
Some syscalls have to pass parameters through a struct, since we can
only fit 3 parameters with our calling convention.
This patch makes use of C++ structured binding to clean up the places
where we expand those parameters structs into local variables.
POSIX's openat() is very similar to open(), except you also provide a
file descriptor referring to a directory from which relative paths
should be resolved.
Passing it the magical fd number AT_FDCWD means "resolve from current
directory" (which is indeed also what open() normally does.)
This fixes libarchive's bsdtar, since it was trying to do something
extremely wrong in the absence of openat() support. The issue has
recently been fixed upstream in libarchive:
https://github.com/libarchive/libarchive/issues/1239
However, we should have openat() support anyway, so I went ahead and
implemented it. :^)
Fixes#748.
Instead of the big ugly switch statement, build a lookup table using
the syscall enumeration macro.
This greatly simplifies the syscall implementation. :^)
The way it gets the entropy and blasts it to the buffer is pretty
ugly IMHO, but it does work for now. (It should be replaced, by
not truncating a u32.)
It implements an (unused for now) flags argument, like Linux but
instead of OpenBSD's. This is in case we want to distinguish
between entropy sources or any other reason and have to implement
a new syscall later. Of course, learn from Linux's struggles with
entropy sourcing too.
The introduction fchdir in the middle of the EUMERATE_SYSCALLS
expression changed other syscall numbers, which broke compatibility.
This commit fixes that by moving it to the end.
The fchdir() function is equivalent to chdir() except that the
directory that is to be the new current working directory is
specified by a file descriptor.
It is now possible to unmount file systems from the VFS via `umount`.
It works via looking up the `fsid` of the filesystem from the `Inode`'s
metatdata so I'm not sure how fragile it is. It seems to work for now
though as something to get us going.
This patch adds the mprotect() syscall to allow changing the protection
flags for memory regions. We don't do any region splitting/merging yet,
so this only works on whole mmap() regions.
Added a "crash -r" flag to verify that we crash when you attempt to
write to read-only memory. :^)
It is now possible to mount ext2 `DiskDevice` devices under Serenity on
any folder in the root filesystem. Currently any user can do this with
any permissions. There's a fair amount of assumptions made here too,
that might not be too good, but can be worked on in the future. This is
a good start to allow more dynamic operation under the OS itself.
It is also currently impossible to unmount and such, and devices will
fail to mount in Linux as the FS 'needs to be cleaned'. I'll work on
getting `umount` done ASAP to rectify this (as well as working on less
assumption-making in the mount syscall. We don't want to just be able
to mount DiskDevices!). This could probably be fixed with some `-t`
flag or something similar.
Processes can now have an icon assigned, which is essentially a 16x16 RGBA32
bitmap exposed as a shared buffer ID.
You set the icon ID by calling set_process_icon(int) and the icon ID will be
exposed through /proc/all.
To make this work, I added a mechanism for making shared buffers globally
accessible. For safety reasons, each app seals the icon buffer before making
it global.
Right now the first call to GWindow::set_icon() is what determines the
process icon. We'll probably change this in the future. :^)
The syscall is quite simple:
int watch_file(const char* path, int path_length);
It returns a file descriptor referring to a "InodeWatcher" object in the
kernel. It becomes readable whenever something changes about the inode.
Currently this is implemented by hooking the "metadata dirty bit" in
Inode which isn't perfect, but it's a start. :^)
The "stddbg" stream was a cute idea but we never ended up using it in
practice, so let's simplify this and implement userspace dbgprintf() on top
of a simple dbgputch() syscall instead.
This makes debugging LibC startup a little bit easier. :^)
This is very simple but already very useful. Now you're able to call to
dump_backtrace() from anywhere userspace to get a nice symbolicated
backtrace in the debugger output. :^)
Rolling with the theme of adding a dialog to shutdown the machine, it is
probably nice to have a way to reboot the machine without performing a full
system powerdown.
A reboot program has been added to `/bin/` as well as a corresponding
`syscall` (SC_reboot). This syscall works by attempting to pulse the 8042
keyboard controller. Note that this is NOT supported on new machines, and
should only be a fallback until we have proper ACPI support.
The implementation causes a triple fault in QEMU, which then restarts the
system. The filesystems are locked and synchronized before this occurs,
so there shouldn't be any corruption etctera.
This allows us to seal a buffer *before* anyone else has access to it
(well, ok, the creating process still does, but you can't win them all).
It also means that a SharedBuffer can be shared with multiple clients:
all you need is to have access to it to share it on again.
Instead of computing the path length inside the syscall handler, let the
caller do that work. This allows us to implement to new variants of open()
and creat(), called open_with_path_length() and creat_with_path_length().
These are suitable for use with e.g StringView.
Right now, we allow anything inside a user to raise or lower any other process's
priority. This feels simple enough to me. Linux disallows raising, but
that's annoying in practice.
Also run it across the whole tree to get everything using the One True Style.
We don't yet run this in an automated fashion as it's a little slow, but
there is a snippet to do so in makeall.sh.
Hook this up in Terminal so that the '\a' character generates a beep.
Finally emit an '\a' character in the shell line editing code when
backspacing at the start of the line.
Calling systrace(pid) gives you a file descriptor with a stream of the
syscalls made by a peer process. The process must be owned by the same
UID who calls systrace(). :^)