Since we know for sure that the virtual memory regions in the new
process being created are not being used on any CPU, there's no need
to do TLB flushes for every mapped page.
Dynamic Vector allocations in sys$select() were showing up in the
full-system profile and since there will never be more than FD_SETSIZE
file descriptors to worry about, we can confidently add enough inline
capacity to this Vector that it never has to kmalloc.
To compensate for the increased stack usage, reduce the size of the
FDInfo struct while we're here. :^)
The full system profiling functionality is useful for profiling the
boot performance of the system. Add a new kernel boot option to start
the system with profiling enabled. This lets you disable and view a
profile once the system is booted.
You can use it by running:
```
$ run.sh qcmd boot_prof
```
Previously all of the CommandLine parsing was spread out around the
Kernel. Instead move it all into the Kernel CommandLine class, and
expose a strongly typed API for querying the state of options.
Previously, the instruction fetch flag of the page fault handler
did not have the currect binary representation, and would always
return false. This aligns these flags.
The perfcore file format was previously limited to a single process
since the pid/executable/regions data was top-level in the JSON.
This patch moves the process-specific data into a top-level array
named "processes" and we now add entries for each process that has
been sampled during the profile run.
This makes it possible to see samples from multiple threads when
viewing a perfcore file with Profiler. This is extremely cool! :^)
The superuser can now call sys$profiling_enable() with PID -1 to enable
profiling of all running threads in the system. The perf events are
collected in a global PerformanceEventBuffer (currently 32 MiB in size.)
The events can be accessed via /proc/profile
If we can't allocate a PerformanceEventBuffer to store the profiling
events, we now fail sys$profiling_enable() and sys$perf_event()
with ENOMEM instead of carrying on with a broken buffer.
This was the original approach before we switched to get_fast_random()
which wasn't fast enough, so we added a buffer.
Unfortunately that buffer is racy and we can actually skid past the end
of it and continue fetching "random" offsets from the adjacent memory
for a while, until we run out of kernel data segment and trip a fault.
Instead of making this even more convoluted, let's just go back to the
pleasantly simple (RDTSC & 0xff) approach. :^)
Fixes#4912.
I don't dare touch the multi-threading logic and locking mechanism, so it stays
timespec for now. However, this could and should be changed to AK::Time, and I
bet it will simplify the "increment_time_since_boot()" code.
This commit is very invasive, because Thread likes to take a pointer and write
to it. This means that translating between timespec/timeval/Time would have been
more difficult than just changing everything that hands a raw pointer to Thread,
in bulk.
These structs can be inconsistent, for example if the amount of microseconds is
negative or larger than 1'000'000. Therefore, they should not be copied as-is.
Use copy_time_from_user instead.
copy_from_user can fail, for example when the user-supplied pointer is just before
the end of mapped address space. In that case, the first few bytes would get copied,
permanently overwriting the internal state of the Socket, potentially leaving it
in an inconsistent or at least difficult-to-predict state.
fuzz-syscalls found a bunch of unaligned accesses into struct sigaction
via this syscall. This patch fixes that issue by porting the syscall
to Userspace<T> which we should have done anyway. :^)
Fixes#5500.
We were calibrating it to 260 instead of 250 ticks per second (being
off by one for the 1/10th second calibration time), resulting in
ticks of only ~3.6 ms instead of ~4ms. This gets us closer to ~4ms,
but because the APIC isn't nearly as precise as e.g. HPET, it will
only be a best effort. Then, use the higher precision reference
timer to more accurately calculate how many ticks we actually get
each second.
Also the frequency calculation was off, causing a "Frequency too slow"
error with VMware.
Fixes some problems observed in #5539
Add a special boot mode for running tests, rather than using the system
as a general purpose OS. We'll use this in SystemServer to specify
only services needed to run tests and exit.
This may seem like a no-op change, however it shrinks down the Kernel by a bit:
.text -432
.unmap_after_init -60
.data -480
.debug_info -673
.debug_aranges 8
.debug_ranges -232
.debug_line -558
.debug_str -308
.debug_frame -40
With '= default', the compiler can do more inlining, hence the savings.
I intentionally omitted some opportunities for '= default', because they
would increase the Kernel size.
Because registering and unregistering interrupt handlers triggers
calls to virtual functions, we can't do this in the constructor
and destructor.
Fixes#5539
We don't need to flush the on-disk inode struct multiple times while
writing out its block list. Just mark the in-memory Inode as having
dirty metadata and the SyncTask will flush it eventually.
Since the inode is the logical owner of its block list, let's move the
code that computes the block list there, and also stop hogging the FS
lock while we compute the block list, as there is no need for it.
There are two locks in the Ext2FS implementation:
* The FS lock (Ext2FS::m_lock)
This governs access to the superblock, block group descriptors,
and the block & inode bitmap blocks. It's held while allocating
or freeing blocks/inodes.
* The inode lock (Ext2FSInode::m_lock)
This governs access to the inode metadata, including the block
list, and to the content data as well. It's held while doing
basically anything with the inode.
Once an on-disk block/inode is allocated, it logically belongs
to the in-memory Inode object, so there's no need for the FS lock
to be taken while manipulating them, the inode lock is all you need.
This dramatically reduces the impact of disk I/O on path resolution
and various other things that look at individual inodes.