This class takes on the duties of CLOCK_MONOTONIC, a time without a
defined reference point that always increases. This informs some
important design decisions about the class API: MonotonicTime cannot be
constructed from external time data, except as a computation based on
other monotonic time, or the current monotonic time. Importantly, there
is no default constructor, since the reference point of monotonic time
is unspecified and therefore without meaning as a default.
The current use of monotonic time (via Duration) includes some potential
problems that may be caught when we move most to all code to
MonotonicTime in the next commit.
The API restrictions have one important relaxation:
Kernel::TimeManagement is allowed to exchange raw time data within
MonotonicTime freely. This is required for the clock-agnostic time
accessors for timeouts and syscalls, as well as creating monotonic time
data from hardware in the first place.
"Wherever applicable" = most places, actually :^), especially for
networking and filesystem timestamps.
This includes changes to unzip, which uses DOSPackedTime, since that is
changed for the FAT file systems.
This is a generic wrapper for a time instant relative to the unix epoch,
and does not account for leap seconds. It should be used in place of
Duration in most current cases.
This is a trivial change, and since this batch of commits will make a
large-scale rebuild necessary anyways, it seems sensible. The feature is
useful for e.g. building compound constant durations at compile time in
a readable way.
That's what this class really is; in fact that's what the first line of
the comment says it is.
This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
This attribute is used for functions in the kernel that are entirely
written in assembly, yet defined in C++ source files.
Without `__attribute__((naked))`, Clang might decide to inline these
functions, making any `ret` instructions within them actually exit the
caller, or discard argument values as they appear "dead". This issue
caused a kernel panic when using the `execve` syscall in AArch64
SerenityOS built by Clang.
While the empty definition so far appears to work fine with GCC, simpler
test cases do similarly suffer from unintended inlining, so define
`NAKED` as a synonym of `NEVER_INLINE` to avert future issues.
Perhaps we should move users of `NAKED` to plain assembly files?
This makes aarch64Clang builds boot :^)
This underlines that we still copy-construct and copy-assign HashMaps.
Primarily, this makes it easier to develop towards OOM-safe(r) internal
data structures, by providing a reminder (the FIXME) and an easy error-
checking switch (just change it to "delete" to see some of the errors).
See identical code in LittleEndianBitStream; even in the bytewise
reading BigEndianBitStream an offset of 8 is not inconsistent state and
handled just fine by read_bits.
Rather than tracking our position in the bit buffer, we can simply shift
away the bits that we read. This is mostly for simplicity, but also does
help performance a bit.
Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:
3.96s to 3.79s on Serenity (cold)
1.08s to 1.04s on Serenity (warm)
0.83s to 0.82s on Linux
We have outln() and out(), warnln() and warn(),
now we have dbgln() and dbg().
This is useful for printing arrays element-by-element while still
only printing one line per array.
This class, in a similar fashion to what has been done with
`InputBufferedStream`, postpones write to the stream until an internal
buffer is full.
This patch also adds the `OutputBufferedFile` alias.
In a similar fashion to what have been done with `fill_from_stream`,
this new method allows to write CircularBuffer's data to a Stream
without additional copies.
Due to 582c55a, both `must_set()` and `set()` should be providing the
same behavior. Not only is that a reason to remove `must_set()`, but
it also performs erroneous behavior since it inserts an element at
a specified index instead of modifying an element at that index.
"The official project language is American English […]."
5d2e915623/CONTRIBUTING.md (L30)
Here's a short statistic of the occurrences of the word "behavio(u)r":
$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
2 BEHAVIOR
24 Behaviour
32 behaviour
407 Behavior
992 behavior
Therefore, it is clear that "behaviour" (56 occurrences) should be
regarded a typo, and "behavior" (1401 occurrences) should be preferred.
Note that The occurrences in LibJS are intentionally NOT changed,
because there are taken verbatim from the specification. Hence:
$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
2 BEHAVIOR
10 behaviour
24 Behaviour
407 Behavior
1014 behavior
Our current `peek_bits` function allows retrieving more bits than we can
actually provide, so whenever someone discards the requested bit count
afterwards we were underflowing the value instead.
Resolves#18618.
8134dcc changed `JsonArray::set()` to insert elements at an index
instead of changing existing elements in-place. Since no behavior
such as `Vector::try_at()` exists yet, it returns nothing.
GCC 13 was released on 2023-04-26. This commit fixes Lagom build errors
when using an updated host toolchain:
- Adds a workaround for a bug in constraint handling, which made LibJS
fail to compile: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109683
- Silences the new `-Wdangling-reference` diagnostic globally. It
produces multiple false positives with no clear way to silence them
without `#pragmas`.
- Silences `-Wself-move` in `RefPtr` tests as GCC 13 adds this
previously Clang-exclusive warning.
This further optimizes floating point parsing (specifically with a large
amount of digits). The commit shaves additional 20% of the run time for
750-digit numbers. No performance degradation is noticeable for small
numbers.
Although it might seem like we've switched to more generic functions,
which must run slower, it is not the case. The time required to parse
"1", for example, decreased by 1%. For numbers with more digits, the
effect is more noticeable: 8-digit numbers are parsed ~5% faster; for
gigantic 750-digit numbers, parsing is 2 times faster.
The later result is achieved by using UFixedBigInt<64>::wide_multiply
instead of u128::operator*(u128).
Additionally, split it into two versions (for IsIntegral<T> -- asking
to place value into register and for !IsIntegral<T> -- asking to place
value into memory with memory clobber), so that Clang is no more
completely confused about `taint_for_optimizer(AK::StringView&)`.
There's no real reason to make this a debug-only formatter, on top of
that, jakt has a optional formatter that prints None/foo instead of
OptionalNone/Optional(foo), which is more concise anyway, so switch to
that.
I found this handy for debugging, and so might others.
This now also adds a formatter for TimeZone::TimeZone. This is needed
for FormatIfSupported<Optional<TimeZone::TimeZone>> to compile. As
FormatIfSupported sees a formatter for Optional exists, but not that
there's not one for TimeZone::TimeZone.
While swap() is available in the global namespace in normal conditions,
!USING_AK_GLOBALLY will make this name unavailable in the global
namespace, making these calls fail to compile.
That pattern seems to show up a lot in code written by people that
aren't intimately familiar with the lifetime model of Error and Strings.
This commit makes the compiler detect it and present a more helpful
diagnostic than "garbage string at runtime".
This brings the function name in line with how we usually name those
functions, which is with a `read_` or `write_` prefix depending on what
they do.
While at it, make the internal `_impl` function private and not-virtual,
since there is no good reason to ever override that function.
This class can be used to run a task in another thread, and allows the
caller to wait for the task to complete to retrieve any error that may
have occurred.
Currently, it doesn't support functions returning a value on success,
but with some template magic that should be possible. :^)
Based on `out()` and `vout()` from AK/Format.h. As opposed to those,
this propagates errors.
As `Core::File` extends `AK::Stream`, this allows formatted
printing directly on `Core::File`s.
We now null out smart pointers *before* calling unref on the pointee.
This ensures that the same smart pointer can't be used to acquire a new
reference to the pointee after its destruction has begun.
I ran into this when destroying a non-empty IntrusiveList of RefPtrs,
but the problem was more general so this fixes it for all of RefPtr,
NonnullRefPtr, OwnPtr and NonnullOwnPtr.
This allows accessing and looping over the path segments in a URL
without necessarily allocating a new vector if you want them percent
decoded too (which path_segment_at_index() has an option for).
This now defaults to serializing the path with percent decoded segments
(which is what all callers expect), but has an option not to. This fixes
`file://` URLs with spaces in their paths.
The name has been changed to serialize_path() path to make it more clear
that this method will generate a new string each call (except for the
cannot_be_a_base_url() case). A few callers have then been updated to
avoid repeatedly calling this function.
The defaults selected for this are based on the behaviour of URL
when it applied percent decoding during parsing. This does mean now
in some cases the getters will allocate, but percent_decode() checks
if there's anything to decode first, so in many cases still won't.
ARCH() uses the AK_IS_ARCH_ macros internally since 349e54d5375a4a,
and all user code uses the ARCH() macro instead of AK_ARCH_.
(Why it's called ARCH() and not AK_ARCH(), I don't know.)
If any ports not in the main repo use AK_ARCH_, they should switch
to using ARCH() instead.
957f89ce4a added some tweaks for serenity-on-aarch64.
It broke anythingelse-on-aarch64 hosts though, so only do these tweaks
when targeting serenity.
(I wonder if AK/Math.h should fall back to the system math routines
when not targeting serenity in general. Would probably help ladybird
performance. On the other hand, the serenity routines would see less
use and hence exposure and love.)
Previously, if we copied the last byte for a length of 100, we'd
recalculate the read span 100 times and memmove one byte 100 times,
which resulted in a lot of overhead.
Now, if we know that we have two consecutive copies of the data, we just
extend the distance to cover both copies, which halves the number of
times that we recalculate the span and actually call memmove.
This takes the running time of the attached benchmark case from 150ms
down to 15ms.
The {sin,cos,tan} functions in AK are used as the implementation of the
same function in libm. We cannot use the __builtin_foo functions as
these would just call the libc functions. This was causing an infinite
loop. Fix this by adding a very naive implementation of
AK::{sin, cos,tan}, that is only valid for small inputs. For the other
functions in this file, I added a TODO() such that we'll crash, instead
of infinite looping.
This is a remnant from when we didn't have a `read_value` and
`write_value` implementation for `AK::Endian` and instead used the
generic functions for reading a span of bytes. Now that we have a more
ergonomic alternative, remove the helper that is no longer needed.
As noted in serval comments doing this goes against the WC3 spec,
and breaks parsing then re-serializing URLs that contain percent
encoded data, that was not encoded using the same character set as
the serializer.
For example, previously if you had a URL like:
https:://foo.com/what%2F%2F (the path is what + '//' percent encoded)
Creating URL("https:://foo.com/what%2F%2F").serialize() would return:
https://foo.com/what//
Which is incorrect and not the same as the URL we passed. This is
because the re-serializing uses the PercentEncodeSet::Path which
does not include '/'.
Only doing the percent encoding in the setters fixes this, which
is required to navigate to Google Street View (which includes a
percent encoded URL in its URL).
Seems to fix#13477 too
`vformat()` can now accept format specifiers of the form
{:'[numeric-type]}. This will output a number with a comma separator
every 3 digits.
For example:
`dbgln("{:'d}", 9999999);` will output 9,999,999.
Binary, octal and hexadecimal numbers can also use this feature, for
example:
`dbgln("{:'x}", 0xffffffff);` will output ff,fff,fff.
When BufferedFile.can_read_line() was invoked on files with no newlines,
t incorrectly returned a false result for this single line that, even
though doesn't finish with a newline character, is still a line. Since
this method is usually used in tandem with read_line(), users would miss
reading this line (and hence all the file contents).
This commit fixes this corner case by adding another check after a
negative result from finding a newline character. This new check does
the same as the check that is done *before* looking for newlines, which
takes care of this problem, but only works for files that have at least
one newline (hence the buffer has already been filled).
A new unit test has been added that shows the use case. Without the
changes in this commit the test fails, which is a testament that this
commit really fixes the underlying issue.
For whatever reason, when CLion does its code indexing thing, it doesn't
define __clang__ despite using Clang. This causes it to run into various
problems that we've solved by checking for Clang.
Since CLion does define __CLION_IDE__ (or sometimes __CLION_IDE_, no
idea why but I have seen this issue locally), let's make that part of
the AK_COMPILER_CLANG check.
This makes CLion stop highlighting various things as errors.
I was originally thinking in the wrong direction when adding this limit,
we can at most read from the buffer until we reach the current write
head. Since that write head is the reference point for the distance,
we need to limit ourselves to that instead of the seekback limit (which
is the maximum of how far back the distance can be).
Rather than the very C-like API we currently have, accepting a void* and
a length, let's take a Bytes object instead. In almost all existing
cases, the compiler figures out the length.
This helper constructor exists on the unspecialized Span<T> class also,
and is convenient for e.g. creating Bytes from:
u8 buffer[64];
Bytes bytes { buffer };
From the getentropy() man page, "The maximum permitted value for the
length argument is 256". Several of our tests were passing lengths of
several thousand bytes, causing getentropy() to fail with EIO, which we
were completely ignoring. This caused these tests to only test long
sequences of 0x00.
We now loop over the provided buffer to fill it 256 bytes at a time. If
getentropy() fails for any reason, we fall back to the default method of
filling it with one random byte at a time.
This is very similar to the LittleEndianInputBitStream bit buffer change
from 8e834d4bb2.
We currently buffer one byte of data for the underlying stream. And when
we put bits onto that buffer, we do so 1 bit at a time.
This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to write the desired number of bits.
Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:
13.62s to 10.9s on Serenity (cold)
13.62s to 9.22s on Serenity (warm)
2.93s to 2.32s on Linux
One caveat is that this requires explicitly flushing any leftover bits
when the caller is done with the stream. The byte buffer implementation
implicitly flushed its data every time the buffer was byte-aligned, as
doing so would always fill the byte. This is no longer the case. But for
now, this should be fine as the one user of this class, DEFLATE, already
has a "flush everything now that we're done" finalizer.
Instead of reading bytes from the output stream into a buffer, just to
immediately write them back out, we can skip the middle-man and copy the
bytes directly into the output buffer.
We currently only fill a buffer when it is empty. So if it has 1 byte
and 16 KB was requested, only that 1 byte would be returned. Instead,
attempt to refill the buffer when it's size is less than the requested
size.
When reading, we currently only fill a BufferedStream's buffer when it
is empty, and only with 1 KB of data. This means that while the buffer
defaults to a size of 16 KB, at least 15 KB is always unused.
The current implementation of `Array<T, 0>` has a zero-length C array as
its storage type. While this is accepted as a GNU extension, when
compiling with Clang 16, an UBSan error is raised every time an object
is accessed whose only field is a zero-length array.
This is likely a bug in Clang 16's implementation of UBSan, which has
been reported here: https://github.com/llvm/llvm-project/issues/61775
Else:
AK/BitStream.h:218:24:
error: inline function '...::lsb_mask<unsigned char>' is not
defined [-Werror,-Wundefined-inline]
static constexpr T lsb_mask(T bits)
^
We current buffer one byte of data from the underlying stream. And when
we pull bits off that buffer, we do so 1 or 8 bits at a time (depending
on whether the buffer is byte aligned). The 1-bit-at-a-time loop is by
far the most common during e.g. GZIP decompression.
This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to extract the desired number of bits.
Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression
time decreases from:
242s to 35s on Serenity
11.125s to 3.527s on Linux
Note that BigEndianInputBitStream can also use the same techniques,
and some of the methods here may make sense to live in an endianness-
agnostic base class. The focus is GZIP right now though, which only
uses the little endian stream.
This mirrors String::from_utf8(StringView).
Jakt will use this to construct strings instead of just assuming the
allocation will succeed, lowering the API difference between
Jakt::String and AK::String by one API :^)
This patch parses enough of GPOS tables to be able to support the
kerning information embedded in Inter.
Since that specific font only applies positioning offsets to the first
glyph in each pair, I was able to get away with not changing our API.
Once we start adding support for more sophisticated positioning, we'll
need to be able to communicate more than a simple "kerning offset" to
the clients of this code.
With Clang, the previous/next pointers in buckets of an
`OrderedHashTable` are not cleared when a bucket is being shifted up as
a result of a removed bucket. As a result, an unfortunate pointer mixup
could lead to an infinite loop in the `HashTable` iterator, which was
exposed in `HashMap::keys()`.
Co-authored-by: Luke Wilde <lukew@serenityos.org>
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").
Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).
No functional changes, just a lot of new FIXMEs.
We don't need to decode the entire code point to know its length. This
reduces the runtime of decoding a string containing 5 million instances
of U+10FFFF from over 4 seconds to 0.9 seconds.
Let's add FlyString::from_deprecated_fly_string() so we can use it
instead of FlyString::from_utf8(). This will make it easier to detect
potential unncessary allocations as we transfer to FlyString.
We currently fully casefold the left- and right-hand sides to compare
two strings with case-insensitivity. Now, we casefold one code point at
a time, storing the result in a view for comparison, until we exhaust
both strings.
Indented #cmakedefine01 is supported since CMake 3.10:
https://cmake.org/cmake/help/latest/release/3.10.html#commands
We're on 3.16, and the minimum required for Serenity itself is 3.25, so
this should be fine. And it makes CLion's auto-formatter much happier!
For example, the code point U+002F could be encoded as UTF-8 with the
bytes 0x80 0xAF. This trick has historically been used to bypass
security checks.
This is needed to have code for creating an in-memory sRGB profile using
the (floating-ppoint) numbers from the sRGB spec and having the
fixed-point values in the profile match what they are in other software
(such as GIMP).
It has the side effect of making the FixedPoint ctor no longer constexpr
(which seems fine; nothing was currently relying on that).
Some of FixedPoint's member functions don't round yet, which requires
tweaking a test.
`consume_until(foo)` stops before foo, and so does
`ignore_until(Predicate)`, so let's make the other `ignore_until()`
overloads consistent with that so they're less confusing.
This commit moves the implementation of getopt into AK, and converts its
API to understand and use StringView instead of char*.
Everything else is caught in the crossfire of making
Option::accept_value() take a StringView instead of a char const*.
With this, we must now pass a Span<StringView> to ArgsParser::parse(),
applications using LibMain are unaffected, but anything not using that
or taking its own argc/argv has to construct a Vector<StringView> for
this method.