Commit graph

186 commits

Author SHA1 Message Date
Andreas Kling
e1ba881587 AK+LibWeb: Add {Fly,}String::to_ascii_{upper,lower}_case()
These don't have to worry about the input not being valid UTF-8 and
so can be infallible (and can even return self if no changes needed.)

We use this instead of Infra::to_ascii_{upper,lower}_case in LibWeb.

(cherry picked from commit 073bcfd3866852a4c4bcca2bd131bd65ae53541f)
2024-11-18 20:22:45 -05:00
Shannon Booth
bcdb23ba3e AK: Add BOM handling to String::from_utf8_with_replacement_character
(cherry picked from commit b3bf5c4ea84bfbd0ba14fdbc3a86e5d54e053702)
2024-10-16 23:56:40 -04:00
Shannon Booth
6a9222ffd3 AK: Add fast-path in from_utf8_with_replacement_character for utf-8
This ports the same optimization which was made in
1a46d8df5fc81eb2c320d5c8a5597285d3d8fb3a to this function as well.

(cherry picked from commit 1e8cc97b731871409316c121e637c32806135122)
2024-10-16 23:56:40 -04:00
Shannon Booth
997ce4644f AK: Add String::from_utf8_with_replacement_character
This takes a byte sequence and converts it to a UTF-8 string with the
replacement character.

(cherry picked from commit 033ea0e7fb0f72338ae95aa0413da838206440bb)
2024-10-16 23:56:40 -04:00
Andreas Kling
ebe6ec6069 AK: Check for u32 overflow in String::repeated()
I don't know why this was checking for size_t overflow, but it was
tripping up ASAN malloc() checks by passing a way-too-large size.
2024-05-07 09:15:40 +02:00
Jess
ecb7d4b40f LibJS: Throw RangeError in StringPrototype::repeat if OOM
currently crashes with an assertion failure in `String::repeated` if
malloc can't serve a `count * input_size` sized request, so add
`String::repeated_with_error` to propagate the error.
2024-04-20 19:23:46 -04:00
Timothy Flynn
de80f544d8 AK: Disallow calling String methods that return a view on rvalues
This prevents, for example:

    StringView view = "foo"_string.bytes_as_string_view();

This prevents a class of potential UAF.
2024-04-04 11:23:21 +02:00
Dan Klishch
870a947040 AK: Remove StringInternals.h
Since we do not expose memory layout anymore in StringBase, there is no
need to keep StringData public.
2024-01-21 16:16:15 -07:00
Dan Klishch
fa52f68142 AK: Store data in FlyString as StringBase
Unfortunately, it is not clear to me how to split this commit into
several atomic ones.
2024-01-21 16:16:15 -07:00
Dan Klishch
e7700e16ee AK: Forward substring creation with shared superstring to StringBase 2024-01-21 16:16:15 -07:00
Dan Klishch
5d6cd65e29 AK: Simplify String::repeated by leveraging StringBase helpers 2024-01-21 16:16:15 -07:00
Dan Klishch
7dbe357e9f AK: Simplify String::from_stream by leveraging StringBase helpers 2024-01-21 16:16:15 -07:00
Dan Klishch
dcd1fda9c8 AK: Introduce StringBase::replace_with_new_{short_,}string 2024-01-21 16:16:15 -07:00
Dan Klishch
d6290c4684 AK: Move String::hash() and String::String() to StringBase 2024-01-21 16:16:15 -07:00
Dan Klishch
1b09a1851e AK: Move String::~String() and String::destroy_string() to StringBase 2024-01-21 16:16:15 -07:00
Dan Klishch
54d149bc25 AK: Move String::bytes() and String::operator==(String) to StringBase
The idea is to eventually get rid of protected state in StringBase. To
do this, we first need to remove all references to m_data and
m_short_string from String.
2024-01-21 16:16:15 -07:00
Dan Klishch
4364a28d3d AK: Move data fields from AK::String to a newly created AK::StringBase
This starts separating memory management of string data and string
utilities like `String::formatted`. This would also allow to reuse the
same storage in `DeprecatedString` in the future.
2024-01-21 16:16:15 -07:00
Dan Klishch
6e2f627cb3 AK: Move StringData from String.cpp to a newly created StringInternals.h
This is done to allow using it in files other than AK/String.cpp.
2024-01-21 16:16:15 -07:00
Andreas Kling
3c039903fb LibTextCodec+AK: Don't validate UTF-8 strings twice
UTF8Decoder was already converting invalid data into replacement
characters while converting, so we know for sure we have valid UTF-8
by the time conversion is finished.

This patch adds a new StringBuilder::to_string_without_validation()
and uses it to make UTF8Decoder avoid half the work it was doing.
2023-12-30 13:49:50 +01:00
Andreas Kling
a285e36041 LibJS+AK: Make String.prototype.repeat() way faster
Instead of using a StringBuilder, add a String::repeated(String, N)
overload that takes advantage of knowing it's already all UTF-8.

This makes the following microbenchmark go 4x faster:

    "foo".repeat(100_000_000)

And for single character strings, we can even go 10x faster:

    "x".repeat(100_000_000)
2023-12-30 13:49:50 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Timothy Flynn
6aa334767f AK: Ensure assigned-to Strings are dereferenced if needed
If we assign to an existing non-short string, we must dereference its
StringData object to prevent leaking that data.
2023-11-28 16:38:18 +01:00
Andreas Kling
0902f552a3 AK: Bring some missing DeprecatedString API over to String
Specifically, case sensitivity parameters for starts/ends with,
and the equals_ignoring_ascii_case() helper.
2023-11-04 21:28:30 +01:00
Andreas Kling
1e820385d9 AK: Add case-insensitive hashing for the new String classes
Bringing over this functionality from DeprecatedString.
2023-09-06 11:29:03 -04:00
aryanbaburajan
a94c0eea94 AK: Add trim_ascii_whitespace method to String 2023-08-06 22:21:10 +02:00
Tim Schumacher
a3f73e7d85 AK: Rename Stream::read_entire_buffer to Stream::read_until_filled
No functional changes.
2023-03-13 15:16:20 +00:00
Andreas Kling
d517e7fb3a AK: Make FlyString::hash() use the cached hash in StringData if possible
This avoids rehashing the string every time.
2023-03-09 21:54:59 +01:00
Timothy Flynn
515fca4f7a AK: Make String::contains(code_point) handle non-ASCII
We currently only accept a char, instead of a full code point.
2023-03-08 14:16:47 +00:00
Timothy Flynn
f882581e91 AK: Make String::{starts,ends}_with(code_point) handle non-ASCII
We currently pass the code point to StringView::{starts,ends}_with,
which actually accepts a single char, thus cannot handle non-ASCII
code points.
2023-03-08 14:16:47 +00:00
Timothy Flynn
da0d000909 AK: Ensure short String instances are valid UTF-8
We are currently only validating long strings.
2023-03-03 11:46:42 -05:00
Linus Groh
45dc3d8a3e AK: Add String::ends_with{,_bytes}() 2023-03-03 11:02:21 +00:00
Ali Mohammad Pur
79e4027480 AK: Add two starts_with{bytes,}() APIs to String 2023-02-28 15:52:24 +03:30
Tim Schumacher
9096b4d893 AK: Ensure that we fill the whole String when reading from a Stream 2023-02-21 22:28:15 -07:00
Andrew Kaster
0ea697ace5 AK: Add String::from_stream method
The caller is responsible for determining how long the string is that
they want to read.
2023-02-21 10:57:44 +01:00
Andreas Kling
e08c55dd8d AK: Make String const-correct internally 2023-02-21 00:54:04 +01:00
Andreas Kling
d0697d350d AK: Fix 64-bit alignment issue in shared-superstring substrings
Thanks to Timothy Flynn for the test!

Fixes #17141
2023-02-18 09:12:46 -05:00
Timothy Flynn
c59268d15b AK: Add String::trim 2023-01-28 00:13:46 +00:00
Timothy Flynn
cccaa94767 AK: Add String::join 2023-01-28 00:13:46 +00:00
Timothy Flynn
c35b1371a3 AK: Add an overload of String::find_byte_offset for StringView 2023-01-27 18:00:17 +00:00
Timothy Flynn
76fd5f2756 AK: Add convenience substring wrappers to String to exclude a length
These overloads exist on other string classes and are used throughout
the code base.
2023-01-24 16:23:50 -05:00
Timothy Flynn
427b82065c AK: Add a method to create a String with a repeated code point 2023-01-24 16:23:50 -05:00
Timothy Flynn
d50724956e AK: Add a method to find the byte offset of a code point 2023-01-24 16:23:50 -05:00
Timothy Flynn
ef275e25b8 AK: Reduce String's allocated data by one byte
This was copied from allocation_size_for_stringimpl, which had to ensure
the string is null-terminated. String makes no such guarantee.
2023-01-22 20:27:52 +00:00
Timothy Flynn
8aca8e82cb AK: Change String's default constructor to be constant
This allows creating expressions such as:

    constexpr Array<String, 10> {};
2023-01-22 01:03:13 +00:00
martinfalisse
aec2dadfdd AK: Add split() for String 2023-01-21 14:35:00 +01:00
Timothy Flynn
d48266a420 AK: Support creating known short string literals at compile time
In cases where we know a string literal will fit in the short string
storage, we can do so at compile time without needing to handle error
propagation. If the provided string literal is too long, a compilation
error will be emitted due to the failed VERIFY statement being a non-
constant expression.
2023-01-20 14:24:12 -05:00
Timothy Flynn
cf0899f440 AK: Add String::contains 2023-01-15 01:00:20 +00:00
Timothy Flynn
9db9b2f9be AK: Add a somewhat naive implementation of String::reverse
This will reverse the String's code points (i.e. not just its bytes),
but is not aware of grapheme clusters.
2023-01-15 01:00:20 +00:00
Timothy Flynn
1d4f287582 AK: Implement FlyString for the new String class
This implements a FlyString that will de-duplicate String instances. The
FlyString will store the raw encoded data of the String instance: If the
String is a short string, FlyString holds the String::ShortString bytes;
otherwise FlyString holds a pointer to the Detail::StringData.

FlyString itself does not know about String's storage or how to refcount
its Detail::StringData. It defers to String to implement these details.
2023-01-12 11:23:58 +01:00
Ben Wiederhake
65b420f996 Everywhere: Remove unused includes of AK/Memory.h
These instances were detected by searching for files that include
AK/Memory.h, but don't match the regex:

\\b(fast_u32_copy|fast_u32_fill|secure_zero|timing_safe_compare)\\b

This regex is pessimistic, so there might be more files that don't
actually use any memory function.

In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
2023-01-02 20:27:20 -05:00