Commit graph

3018 commits

Author SHA1 Message Date
Tim Ledbetter
72ea046b68 AK: Add option to the string formatter to use a digit separator
`vformat()` can now accept format specifiers of the form
{:'[numeric-type]}. This will output a number with a comma separator
every 3 digits.

For example:

`dbgln("{:'d}", 9999999);` will output 9,999,999.

Binary, octal and hexadecimal numbers can also use this feature, for
example:

`dbgln("{:'x}", 0xffffffff);` will output ff,fff,fff.
2023-04-11 13:03:30 +02:00
Rodrigo Tobar
5a8373c6b9 LibCore: Fix corner case for files without newlines
When BufferedFile.can_read_line() was invoked on files with no newlines,
t incorrectly returned a false result for this single line that, even
though doesn't finish with a newline character, is still a line. Since
this method is usually used in tandem with read_line(), users would miss
reading this line (and hence all the file contents).

This commit fixes this corner case by adding another check after a
negative result from finding a newline character. This new check does
the same as the check that is done *before* looking for newlines, which
takes care of this problem, but only works for files that have at least
one newline (hence the buffer has already been filled).

A new unit test has been added that shows the use case. Without the
changes in this commit the test fails, which is a testament that this
commit really fixes the underlying issue.
2023-04-09 18:09:23 -06:00
Andreas Kling
e1f5aae632 AK: Bake CLion IDE check into AK_COMPILER_CLANG
For whatever reason, when CLion does its code indexing thing, it doesn't
define __clang__ despite using Clang. This causes it to run into various
problems that we've solved by checking for Clang.

Since CLion does define __CLION_IDE__ (or sometimes __CLION_IDE_, no
idea why but I have seen this issue locally), let's make that part of
the AK_COMPILER_CLANG check.

This makes CLion stop highlighting various things as errors.
2023-04-08 13:43:25 +02:00
Timothy Flynn
4d9c14ca67 AK: Allow specifying a minimum value for IDs returned by IDAllocator 2023-04-07 16:02:22 +02:00
Kenneth Myhra
d6cf9f5329 AK: Add FlyString::is_one_of for variadic string comparison 2023-04-06 23:49:08 +02:00
stelar7
6163f607c0 AK: Add to_string() for IPv4Address 2023-04-06 09:57:31 +03:30
Tim Schumacher
b88c58b94c AK+LibCompress: Break when seekback copying to a full CircularBuffer
Otherwise, we just end up infinitely looping while waiting for more
space in the destination.
2023-04-05 07:30:38 -04:00
Tim Schumacher
767fb01a8c AK: Report copied bytes when seekback copying from CircularBuffer
Otherwise, we have no way of determining whether our copy was truncated
by accident.
2023-04-05 07:30:38 -04:00
Tim Schumacher
997e745e87 AK: Properly limit the internal seekback span for CircularBuffer
I was originally thinking in the wrong direction when adding this limit,
we can at most read from the buffer until we reach the current write
head. Since that write head is the reference point for the distance,
we need to limit ourselves to that instead of the seekback limit (which
is the maximum of how far back the distance can be).
2023-04-05 07:30:38 -04:00
Timothy Flynn
15532df83d AK+Everywhere: Change AK::fill_with_random to accept a Bytes object
Rather than the very C-like API we currently have, accepting a void* and
a length, let's take a Bytes object instead. In almost all existing
cases, the compiler figures out the length.
2023-04-03 15:53:49 +02:00
Timothy Flynn
5c045b6934 AK: Add templated Span<u8> and Span<u8 const> constructors for C-arrays
This helper constructor exists on the unspecialized Span<T> class also,
and is convenient for e.g. creating Bytes from:

    u8 buffer[64];
    Bytes bytes { buffer };
2023-04-03 15:53:49 +02:00
Timothy Flynn
f7960ffbe3 AK: Prevent passing lengths greater than 256 to getentropy()
From the getentropy() man page, "The maximum permitted value for the
length argument is 256". Several of our tests were passing lengths of
several thousand bytes, causing getentropy() to fail with EIO, which we
were completely ignoring. This caused these tests to only test long
sequences of 0x00.

We now loop over the provided buffer to fill it 256 bytes at a time. If
getentropy() fails for any reason, we fall back to the default method of
filling it with one random byte at a time.
2023-04-03 15:53:49 +02:00
Timothy Flynn
eed956b473 AK: Increase LittleEndianOutputBitStream's buffer size and remove loops
This is very similar to the LittleEndianInputBitStream bit buffer change
from 8e834d4bb2.

We currently buffer one byte of data for the underlying stream. And when
we put bits onto that buffer, we do so 1 bit at a time.

This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to write the desired number of bits.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:

    13.62s to 10.9s on Serenity (cold)
    13.62s to 9.22s on Serenity (warm)
    2.93s to 2.32s on Linux

One caveat is that this requires explicitly flushing any leftover bits
when the caller is done with the stream. The byte buffer implementation
implicitly flushed its data every time the buffer was byte-aligned, as
doing so would always fill the byte. This is no longer the case. But for
now, this should be fine as the one user of this class, DEFLATE, already
has a "flush everything now that we're done" finalizer.
2023-04-02 10:54:37 +02:00
Timothy Flynn
8b56d82865 AK+LibCompress: Remove the Deflate back-reference intermediate buffer
Instead of reading bytes from the output stream into a buffer, just to
immediately write them back out, we can skip the middle-man and copy the
bytes directly into the output buffer.
2023-03-31 06:56:11 +02:00
Timothy Flynn
7319deb03d AK: Remove BitStream workaround for now-resolved BufferedStream behavior
The issue was that the buffer would only be filled if it was empty.
2023-03-30 08:47:22 +02:00
Timothy Flynn
13573a6c4b AK: Refill a BufferedStream when it has less than the requested size
We currently only fill a buffer when it is empty. So if it has 1 byte
and 16 KB was requested, only that 1 byte would be returned. Instead,
attempt to refill the buffer when it's size is less than the requested
size.
2023-03-30 08:47:22 +02:00
Timothy Flynn
5c38b14045 AK: Remove arbitrary 1 KB limit when filling a BufferedStream's buffer
When reading, we currently only fill a BufferedStream's buffer when it
is empty, and only with 1 KB of data. This means that while the buffer
defaults to a size of 16 KB, at least 15 KB is always unused.
2023-03-30 08:47:22 +02:00
Daniel Bertalan
0016f63547 AK: Fix Clang 16 UBSan issue with zero-length Array
The current implementation of `Array<T, 0>` has a zero-length C array as
its storage type. While this is accepted as a GNU extension, when
compiling with Clang 16, an UBSan error is raised every time an object
is accessed whose only field is a zero-length array.

This is likely a bug in Clang 16's implementation of UBSan, which has
been reported here: https://github.com/llvm/llvm-project/issues/61775
2023-03-29 21:35:53 -04:00
Nico Weber
e1f8443db0 AK: Fix build with Xcode 14.2's clang
Else:

    AK/BitStream.h:218:24:
        error: inline function '...::lsb_mask<unsigned char>' is not
               defined [-Werror,-Wundefined-inline]
        static constexpr T lsb_mask(T bits)
                           ^
2023-03-29 06:28:55 -04:00
Timothy Flynn
8e834d4bb2 AK: Increase LittleEndianInputBitStream's buffer size and remove loops
We current buffer one byte of data from the underlying stream. And when
we pull bits off that buffer, we do so 1 or 8 bits at a time (depending
on whether the buffer is byte aligned). The 1-bit-at-a-time loop is by
far the most common during e.g. GZIP decompression.

This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to extract the desired number of bits.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression
time decreases from:

    242s to 35s on Serenity
    11.125s to 3.527s on Linux

Note that BigEndianInputBitStream can also use the same techniques,
and some of the methods here may make sense to live in an endianness-
agnostic base class. The focus is GZIP right now though, which only
uses the little endian stream.
2023-03-29 07:19:14 +02:00
Timothy Flynn
9423662107 AK: Add NumericLimits::digits to return the number of digits in a type
Analogous to std::numeric_limits<T>::digits.
2023-03-29 07:19:14 +02:00
Ali Mohammad Pur
a28aba7663 AK: Add DeprecatedString::from_utf8(StringView)
This mirrors String::from_utf8(StringView).
Jakt will use this to construct strings instead of just assuming the
allocation will succeed, lowering the API difference between
Jakt::String and AK::String by one API :^)
2023-03-28 15:55:35 +01:00
Sam Atkins
b876e97719 AK: Expose the current position of a Utf8CodePointIterator as a pointer 2023-03-22 19:45:40 +01:00
Tim Schumacher
52d9fc92f1 AK: Expose the seekback limit of CircularBuffer 2023-03-21 10:25:13 +01:00
Tim Schumacher
e62183f0ba AK: Add a Stream wrapper that counts read bytes 2023-03-21 10:25:13 +01:00
Tim Schumacher
d1f6a28ffd AK: Move ConstrainedStream from LibWasm and limit discarding 2023-03-21 10:25:13 +01:00
Timothy Flynn
2671d4280f AK: Export FlyString from the forwarding header 2023-03-18 19:50:45 +01:00
Andreas Kling
d38a3ca9eb LibGfx/OpenType: Add some initial support for GPOS glyph positioning
This patch parses enough of GPOS tables to be able to support the
kerning information embedded in Inter.

Since that specific font only applies positioning offsets to the first
glyph in each pair, I was able to get away with not changing our API.
Once we start adding support for more sophisticated positioning, we'll
need to be able to communicate more than a simple "kerning offset" to
the clients of this code.
2023-03-17 09:36:20 +01:00
Jelle Raaijmakers
954d660094 AK: Clear OrderedHashTable previous/next pointers on removal
With Clang, the previous/next pointers in buckets of an
`OrderedHashTable` are not cleared when a bucket is being shifted up as
a result of a removed bucket. As a result, an unfortunate pointer mixup
could lead to an infinite loop in the `HashTable` iterator, which was
exposed in `HashMap::keys()`.

Co-authored-by: Luke Wilde <lukew@serenityos.org>
2023-03-15 21:43:52 +01:00
gustrb
5141c86587 AK: Rename CaseInsensitiveStringViewTraits to reflect intent
Now it is called `CaseInsensitiveASCIIStringViewTraits`, so we can be
more specific about what data structure does it operate onto. ;)
2023-03-14 21:34:32 +00:00
Andreas Kling
f8c075f2b1 AK: Add Queue::tail()
We already had head(), so let's also have tail().
2023-03-14 16:52:44 +01:00
Tim Schumacher
e007279315 AK: Read and write accumulated BitStream bits directly 2023-03-13 15:16:20 +00:00
Tim Schumacher
b623dda765 AK: Remove unneeded overrides for write_until_depleted from BitStream 2023-03-13 15:16:20 +00:00
Tim Schumacher
ecd1862859 AK: Rename Stream::write_entire_buffer to Stream::write_until_depleted
No functional changes.
2023-03-13 15:16:20 +00:00
Tim Schumacher
a3f73e7d85 AK: Rename Stream::read_entire_buffer to Stream::read_until_filled
No functional changes.
2023-03-13 15:16:20 +00:00
Tim Schumacher
d5871f5717 AK: Rename Stream::{read,write} to Stream::{read_some,write_some}
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").

Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).

No functional changes, just a lot of new FIXMEs.
2023-03-13 15:16:20 +00:00
Timothy Flynn
1d5b45f7d9 AK: Compute UTF-8 code point lengths using only leading bytes
We don't need to decode the entire code point to know its length. This
reduces the runtime of decoding a string containing 5 million instances
of U+10FFFF from over 4 seconds to 0.9 seconds.
2023-03-13 15:16:02 +00:00
Kenneth Myhra
e28a6d5da4 AK: Add FlyString::from_deprecated_fly_string()
Let's add FlyString::from_deprecated_fly_string() so we can use it
instead of FlyString::from_utf8(). This will make it easier to detect
potential unncessary allocations as we transfer to FlyString.
2023-03-11 18:32:33 +00:00
Sam Atkins
82f58b8af0 AK: Forward-declare LexicalPath
And alphabetically sort the list while I'm at it.
2023-03-11 13:22:57 +00:00
Andreas Kling
a504ac3e2a Everywhere: Rename equals_ignoring_case => equals_ignoring_ascii_case
Let's make it clear that these functions deal with ASCII case only.
2023-03-10 13:15:44 +01:00
kleines Filmröllchen
b38beb106e AK: Add constexpr floor and round 2023-03-10 04:07:14 -07:00
Andreas Kling
d517e7fb3a AK: Make FlyString::hash() use the cached hash in StringData if possible
This avoids rehashing the string every time.
2023-03-09 21:54:59 +01:00
Sam Atkins
067d0689c5 AK: Replace C-style casts 2023-03-09 21:43:54 +01:00
Linus Groh
e76394d96c AK: Remove infallible version of StringBuilder::to_byte_buffer
Also drop the try_ prefix from the fallible function, as it is no longer
needed to distinguish the two.
2023-03-09 15:51:00 +00:00
Karol Baraniecki
b4b283670d AK: Introduce a fallible version of StringBuilder::to_byte_buffer
Name it StringBuilder::try_to_byte_buffer accordingly :^)
2023-03-09 12:59:57 +00:00
Timothy Flynn
1393ed2000 AK+LibUnicode: Implement String::equals_ignoring_case without allocating
We currently fully casefold the left- and right-hand sides to compare
two strings with case-insensitivity. Now, we casefold one code point at
a time, storing the result in a view for comparison, until we exhaust
both strings.
2023-03-08 18:57:53 +00:00
Timothy Flynn
4aee4e80bd AK: Add a Utf32View::substring_view overload to take only an offset
This is for convenience, and matches our other UTF-N views.
2023-03-08 18:57:53 +00:00
Timothy Flynn
515fca4f7a AK: Make String::contains(code_point) handle non-ASCII
We currently only accept a char, instead of a full code point.
2023-03-08 14:16:47 +00:00
Timothy Flynn
f882581e91 AK: Make String::{starts,ends}_with(code_point) handle non-ASCII
We currently pass the code point to StringView::{starts,ends}_with,
which actually accepts a single char, thus cannot handle non-ASCII
code points.
2023-03-08 14:16:47 +00:00
Andreas Kling
629b6462dc AK: Add FlyString::equals_ignoring_ascii_case()
This is similar to equals_ignoring_case() but only cares about ASCII
case insensitivity.
2023-03-08 13:19:15 +01:00