The perfcore file format was previously limited to a single process
since the pid/executable/regions data was top-level in the JSON.
This patch moves the process-specific data into a top-level array
named "processes" and we now add entries for each process that has
been sampled during the profile run.
This makes it possible to see samples from multiple threads when
viewing a perfcore file with Profiler. This is extremely cool! :^)
The superuser can now call sys$profiling_enable() with PID -1 to enable
profiling of all running threads in the system. The perf events are
collected in a global PerformanceEventBuffer (currently 32 MiB in size.)
The events can be accessed via /proc/profile
If we can't allocate a PerformanceEventBuffer to store the profiling
events, we now fail sys$profiling_enable() and sys$perf_event()
with ENOMEM instead of carrying on with a broken buffer.
This was the original approach before we switched to get_fast_random()
which wasn't fast enough, so we added a buffer.
Unfortunately that buffer is racy and we can actually skid past the end
of it and continue fetching "random" offsets from the adjacent memory
for a while, until we run out of kernel data segment and trip a fault.
Instead of making this even more convoluted, let's just go back to the
pleasantly simple (RDTSC & 0xff) approach. :^)
Fixes#4912.
I don't dare touch the multi-threading logic and locking mechanism, so it stays
timespec for now. However, this could and should be changed to AK::Time, and I
bet it will simplify the "increment_time_since_boot()" code.
This commit is very invasive, because Thread likes to take a pointer and write
to it. This means that translating between timespec/timeval/Time would have been
more difficult than just changing everything that hands a raw pointer to Thread,
in bulk.
These structs can be inconsistent, for example if the amount of microseconds is
negative or larger than 1'000'000. Therefore, they should not be copied as-is.
Use copy_time_from_user instead.
copy_from_user can fail, for example when the user-supplied pointer is just before
the end of mapped address space. In that case, the first few bytes would get copied,
permanently overwriting the internal state of the Socket, potentially leaving it
in an inconsistent or at least difficult-to-predict state.
fuzz-syscalls found a bunch of unaligned accesses into struct sigaction
via this syscall. This patch fixes that issue by porting the syscall
to Userspace<T> which we should have done anyway. :^)
Fixes#5500.
We were calibrating it to 260 instead of 250 ticks per second (being
off by one for the 1/10th second calibration time), resulting in
ticks of only ~3.6 ms instead of ~4ms. This gets us closer to ~4ms,
but because the APIC isn't nearly as precise as e.g. HPET, it will
only be a best effort. Then, use the higher precision reference
timer to more accurately calculate how many ticks we actually get
each second.
Also the frequency calculation was off, causing a "Frequency too slow"
error with VMware.
Fixes some problems observed in #5539
Add a special boot mode for running tests, rather than using the system
as a general purpose OS. We'll use this in SystemServer to specify
only services needed to run tests and exit.
This may seem like a no-op change, however it shrinks down the Kernel by a bit:
.text -432
.unmap_after_init -60
.data -480
.debug_info -673
.debug_aranges 8
.debug_ranges -232
.debug_line -558
.debug_str -308
.debug_frame -40
With '= default', the compiler can do more inlining, hence the savings.
I intentionally omitted some opportunities for '= default', because they
would increase the Kernel size.
Because registering and unregistering interrupt handlers triggers
calls to virtual functions, we can't do this in the constructor
and destructor.
Fixes#5539
We don't need to flush the on-disk inode struct multiple times while
writing out its block list. Just mark the in-memory Inode as having
dirty metadata and the SyncTask will flush it eventually.
Since the inode is the logical owner of its block list, let's move the
code that computes the block list there, and also stop hogging the FS
lock while we compute the block list, as there is no need for it.
There are two locks in the Ext2FS implementation:
* The FS lock (Ext2FS::m_lock)
This governs access to the superblock, block group descriptors,
and the block & inode bitmap blocks. It's held while allocating
or freeing blocks/inodes.
* The inode lock (Ext2FSInode::m_lock)
This governs access to the inode metadata, including the block
list, and to the content data as well. It's held while doing
basically anything with the inode.
Once an on-disk block/inode is allocated, it logically belongs
to the in-memory Inode object, so there's no need for the FS lock
to be taken while manipulating them, the inode lock is all you need.
This dramatically reduces the impact of disk I/O on path resolution
and various other things that look at individual inodes.
Since these filesystems operate on an underlying file descriptor
and rely on its offset for correctness, let's use the FS lock to
serialize these operations.
This also means that FS subclasses can rely on block-level read/write
operations being atomic.
This is basically just for consistency, it's quite strange to see
multiple AK container types next to each other, some with and some
without the namespace prefix - we're 'using AK::Foo;' a lot and should
leverage that. :^)
This reverts commit 1e737a5c50.
The cached block list does not include meta-blocks, so we'd end up
leaking those. There's definitely a nice way to avoid work here, but it
turns out it wasn't quite this trivial. Reverting for now.
Currently, when a process which has a tracee exits, nothing will happen,
leaving the tracee unable to be attached again. This will call the
stop_tracing function on any process which is traced by the exiting
process and sending the SIGSTOP signal making the traced process wait
for a SIGCONT (just as Linux does)
This patch combines inode the scan for an available inode with the
updating of the bit in the inode bitmap into a single operation.
We also exit the scan immediately when we find an inode, instead of
continuing until we've scanned all the eligible groups(!)
Finally, we stop holding the filesystem lock throughout the entire
operation, and instead only take it while actually necessary
(during inode allocation, flush, and inode cache update.)