Commit graph

59 commits

Author SHA1 Message Date
Andreas Kling
a3e82eaad3 AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.

Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.

Problems in DeprecatedString:

- It assumes string allocation never fails. This makes it impossible
  to use in allocation-sensitive contexts, and is the reason we had to
  ban DeprecatedString from the kernel entirely.

- The awkward null state. DeprecatedString can be null. It's different
  from the empty state, although null strings are considered empty.
  All code is immediately nicer when using Optional<DeprecatedString>
  but DeprecatedString came before Optional, which is how we ended up
  like this.

- The encoding of the underlying data is ambiguous. For the most part,
  we use it as if it's always UTF-8, but there have been cases where
  we pass around strings in other encodings (e.g ISO8859-1)

- operator[] and length() are used to iterate over DeprecatedString one
  byte at a time. This is done all over the codebase, and will *not*
  give the right results unless the string is all ASCII.

How we solve these issues in the new String:

- Functions that may allocate now return ErrorOr<String> so that ENOMEM
  errors can be passed to the caller.

- String has no null state. Use Optional<String> when needed.

- String is always UTF-8. This is validated when constructing a String.
  We may need to add a bypass for this in the future, for cases where
  you have a known-good string, but for now: validate all the things!

- There is no operator[] or length(). You can get the underlying data
  with bytes(), but for iterating over code points, you should be using
  an UTF-8 iterator.

Furthermore, it has two nifty new features:

- String implements a small string optimization (SSO) for strings that
  can fit entirely within a pointer. This means up to 3 bytes on 32-bit
  platforms, and 7 bytes on 64-bit platforms. Such small strings will
  not be heap-allocated.

- String can create substrings without making a deep copy of the
  substring. Instead, the superstring gets +1 refcount from the
  substring, and it acts like a view into the superstring. To make
  substrings like this, use the substring_with_shared_superstring() API.

One caveat:

- String does not guarantee that the underlying data is null-terminated
  like DeprecatedString does today. While this was nifty in a handful of
  places where we were calling C functions, it did stand in the way of
  shared-superstring substrings.
2022-12-06 15:21:26 +01:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Andreas Kling
ae3ffdd521 AK: Make it possible to not using AK classes into the global namespace
This patch adds the `USING_AK_GLOBALLY` macro which is enabled by
default, but can be overridden by build flags.

This is a step towards integrating Jakt and AK types.
2022-11-26 15:51:34 +01:00
Lucas CHOLLET
62b8ccaffc StringBuilder: Add try_append_repeated() and append_repeated()
This two methods add the character as many times as specified by the
second parameter.
2022-09-15 14:08:21 +01:00
Idan Horowitz
9da8c78133 AK: Add a try variant of StringBuilder::append_escaped_for_json
This will allow us to make a fallible version of the JSON serializers.
2022-02-27 20:37:57 +01:00
Linus Groh
b253bca807 AK: Add optional format string parameter to String{,Builder}::join()
Allow specifying a custom format string that's being used for each item
instead of hardcoding "{}".
2022-02-23 21:53:30 +00:00
Idan Horowitz
8f093e91e0 AK: Exclude StringBuilder String APIs from the Kernel
These APIs are only used by userland, and String is OOM-infallible,
so let's just ifdef it out of the Kernel.
2022-02-16 22:21:37 +01:00
Andreas Kling
8a51f64503 AK: Increase StringBuilder's inline buffer size from 128 to 256 bytes 2021-12-26 01:42:58 +01:00
Andreas Kling
216e21a1fa AK: Convert AK::Format formatting helpers to returning ErrorOr<void>
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
2021-11-17 00:21:13 +01:00
Andreas Kling
008355c222 AK: Add failable try_* functions to StringBuilder
These will allow us to start using TRY() with StringBuilder operations.
2021-11-17 00:21:13 +01:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Andreas Kling
a15ed8743d AK: Make ByteBuffer::try_* functions return ErrorOr<void>
Same as Vector, ByteBuffer now also signals allocation failure by
returning an ENOMEM Error instead of a bool, allowing us to use the
TRY() and MUST() patterns.
2021-11-10 21:58:58 +01:00
Ali Mohammad Pur
3a9f00c59b Everywhere: Use OOM-safe ByteBuffer APIs where possible
If we can easily communicate failure, let's avoid asserting and report
failure instead.
2021-09-06 01:53:26 +02:00
Brian Gianforcaro
f0b3aa0331 Everywhere: Pass AK::Format TypeErasedFormatParams by reference
This silences a overeager warning in sonar cloud, warning that
slicing could occur with `VariadicFormatParams` which derives from
`TypeErasedFormatParams`.

Reference:
https://sonarcloud.io/project/issues?id=SerenityOS_serenity&issues=AXuVPBO_k92xXUF3qWsm&open=AXuVPBO_k92xXUF3qWsm
2021-08-30 15:50:00 +04:30
Timothy Flynn
c16aca7abf AK+Kernel: Add StringBuilder::append overload for UTF-16 views
Currently, to append a UTF-16 view to a StringBuilder, callers must
first convert the view to UTF-8 and then append the copy. Add a UTF-16
overload so callers do not need to hold an entire copy in memory.
2021-08-10 23:07:50 +02:00
Timothy Flynn
5978caf96b AK: Convert StringBuilder to use east-const 2021-08-10 23:07:50 +02:00
Ali Mohammad Pur
3829bf115c AK: Make StringBuilder::join() use appendff() instead of append()
`append()` is almost never going to select the overload that is desired.
e.g. it will append chars when you pass it a Vector<size_t>, which is
definitely not the right overload :)
2021-08-06 01:14:03 +02:00
Gunnar Beutner
de0aa44bb6 AK: Remove the m_length member for StringBuilder
Instead we can just use ByteBuffer::size() which already keeps track
of the buffer's size.
2021-05-31 14:49:00 +04:30
Max Wipfli
f51b0729f5 AK: Implement StringBuilder::append_as_lowercase(char ch)
This patch adds a convenience method to AK::StringBuilder which converts
ASCII uppercase characters to lowercase before appending them.
2021-05-18 21:02:07 +02:00
Gunnar Beutner
fcaf98361f AK: Turn ByteBuffer into a value type
Previously ByteBuffer would internally hold a RefPtr to the byte
buffer and would behave like a reference type, i.e. copying a
ByteBuffer would not create a duplicate byte buffer, but rather
two objects which refer to the same internal buffer.

This also changes ByteBuffer so that it has some internal capacity
much like the Vector<T> type. Unlike Vector<T> however a byte
buffer's data may be uninitialized.

With this commit ByteBuffer makes use of the kmalloc_good_size()
API to pick an optimal allocation size for its internal buffer.
2021-05-16 17:49:42 +02:00
Andreas Kling
834b6508d7 AK: Remove StringBuilder::appendf()
All users have been converted to using AK::Format via appendff().
2021-05-07 21:12:09 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Brian Gianforcaro
7db74a6b3e AK: Annotate StringBuilder functions as [[nodiscard]] 2021-04-11 12:50:33 +02:00
AnotherTest
7c2754c3a6 AK+Kernel+Userland: Enable some more compiletime format string checks
This enables format string checks for three more functions:
- String::formatted()
- Builder::appendff()
- KBufferBuilder::appendff()
2021-02-23 13:59:33 +01:00
Lenny Maiorani
e6f907a155 AK: Simplify constructors and conversions from nullptr_t
Problem:
- Many constructors are defined as `{}` rather than using the ` =
  default` compiler-provided constructor.
- Some types provide an implicit conversion operator from `nullptr_t`
  instead of requiring the caller to default construct. This violates
  the C++ Core Guidelines suggestion to declare single-argument
  constructors explicit
  (https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c46-by-default-declare-single-argument-constructors-explicit).

Solution:
- Change default constructors to use the compiler-provided default
  constructor.
- Remove implicit conversion operators from `nullptr_t` and change
  usage to enforce type consistency without conversion.
2021-01-12 09:11:45 +01:00
Sahan Fernando
ccf4368ca5 AK: Enable format string warnings for AK printf wrappers 2021-01-11 21:06:32 +01:00
Andreas Kling
54ade31d84 AK: Add some inline capacity to StringBuilder
This patch adds a 128-byte inline buffer that we use before switching
to using a dynamically growing ByteBuffer.

This allows us to avoid heap allocations in many cases, and totally
incidentally also speeds up @nico's favorite test, "disasm /bin/id"
more than 2x. :^)
2020-11-24 22:06:51 +01:00
Andreas Kling
5e164052f6 AK+Kernel: Escape JSON keys & values
Grab the escaping logic from JSON string value serialization and use
it for serializing all keys and values.

Fixes #3917.
2020-11-02 12:56:36 +01:00
asynts
6351a56d27 AK+Format: Do some housekeeping in the format implementation. 2020-10-02 20:48:19 +02:00
asynts
b7a4c4482f AK: Resolve format related circular dependencies properly.
With this commit, <AK/Format.h> has a more supportive role and isn't
used directly.

Essentially, there now is a public 'vformat' function ('v' for vector)
which takes already type erased parameters. The name is choosen to
indicate that this function behaves similar to C-style functions taking
a va_list equivalent.

The interface for frontend users are now 'String::formatted' and
'StringBuilder::appendff'.
2020-09-23 21:45:28 +02:00
asynts
e5497a326a AK: Add StringBuilder::appendff using the new format.
StringBuilder::appendf was already used, thus this name. If we some day
replace all usages of printf, we could rename this method.
2020-09-22 15:06:40 +02:00
Nico Weber
ce95628b7f Unicode: Try s/codepoint/code_point/g again
This time, without trailing 's'. Ran:

    git grep -l 'codepoint' | xargs sed -ie 's/codepoint/code_point/g
2020-08-05 22:33:42 +02:00
Nico Weber
19ac1f6368 Revert "Unicode: s/codepoint/code_point/g"
This reverts commit ea9ac3155d.
It replaced "codepoint" with "code_points", not "code_point".
2020-08-05 22:33:42 +02:00
Andreas Kling
ea9ac3155d Unicode: s/codepoint/code_point/g
Unicode calls them "code points" so let's follow their style.
2020-08-03 19:06:41 +02:00
Andreas Kling
d4bbc00901 AK: Add StringBuilder::append_codepoint(u32)
Also, implement append(Utf32View) using it.
2020-06-04 21:12:17 +02:00
Andreas Kling
86242f9c18 AK: Add StringBuilder::append(Utf32View)
This encodes the incoming UTF-32 sequence as UTF-8.
2020-05-17 22:35:25 +02:00
Andreas Kling
4eef3e5a09 AK: Add StringBuilder::join() for joining collections with a separator
This patch adds a generic StringBuilder::join(separator, collection):

    Vector<String> strings = { "well", "hello", "friends" };
    StringBuilder builder;
    builder.join("+ ", strings);
    builder.to_string(); // "well + hello + friends"
2020-03-20 14:41:23 +01:00
howar6hill
d8df9c510c AK: Improve the API of StringBuilder 2020-03-08 17:08:18 +01:00
Shannon Booth
ba9313111f AK: Add StringBuilder::is_empty() 2020-02-22 21:36:54 +01:00
Andreas Kling
3bbf4610d2 AK: Add a forward declaration header
You can now #include <AK/Forward.h> to get most of the AK types as
forward declarations.

Header dependency explosion is one of the main contributors to compile
times at the moment, so this is a step towards smaller include graphs.
2020-02-14 23:31:18 +01:00
Andreas Kling
94ca55cefd Meta: Add license header to source files
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.

For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.

Going forward, all new source files should include a license header.
2020-01-18 09:45:54 +01:00
Andreas Kling
6f4c380d95 AK: Use size_t for the length of strings
Using int was a mistake. This patch changes String, StringImpl,
StringView and StringBuilder to use size_t instead of int for lengths.
Obviously a lot of code needs to change as a result of this.
2019-12-09 17:51:21 +01:00
Andreas Kling
99fc7033b5 AK: Add StringBuilder::length() and trim(int)
Sometimes you want to trim a byte or two off the end.
2019-09-29 16:20:09 +02:00
Sergey Bugaev
08d9883306 AK: Add StringBuilder::string_view() and StringBuilder::clear()
The former allows you to inspect the string while it's being built.
It's an explicit method rather than `operator StringView()` because
you must remember you can only look at it in between modifications;
appending to the StringBuilder invalidates the StringView.

The latter lets you clear the state of a StringBuilder explicitly, to
start from an empty string again.
2019-09-28 18:29:42 +02:00
Andreas Kling
73fdbba59c AK: Rename <AK/AKString.h> to <AK/String.h>
This was a workaround to be able to build on case-insensitive file
systems where it might get confused about <string.h> vs <String.h>.

Let's just not support building that way, so String.h can have an
objectively nicer name. :^)
2019-09-06 15:36:54 +02:00
Andreas Kling
f6998b1817 JSON: Templatize the JSON serialization code
This makes it possible to use something other than a StringBuilder for
serialization (and to produce something other than a String.) :^)
2019-08-07 21:29:32 +02:00
Andreas Kling
255c7562ba AK: Massage it into building on my host system without breaking Serenity. 2019-06-14 06:43:56 +02:00
Robin Burchell
7bce096afd Take StringView in more places
We should work towards a pattern where we take StringView as function
arguments, and store String as member, to push the String construction
to the last possible moment.
2019-06-02 12:55:51 +02:00
Robin Burchell
0dc9af5f7e Add clang-format file
Also run it across the whole tree to get everything using the One True Style.
We don't yet run this in an automated fashion as it's a little slow, but
there is a snippet to do so in makeall.sh.
2019-05-28 17:31:20 +02:00