AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2018-2022, Andreas Kling <kling@serenityos.org>
|
|
|
|
*
|
|
|
|
* SPDX-License-Identifier: BSD-2-Clause
|
|
|
|
*/
|
|
|
|
|
2023-01-22 10:17:48 -05:00
|
|
|
#include <AK/Array.h>
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
#include <AK/Checked.h>
|
2023-01-11 08:26:49 -05:00
|
|
|
#include <AK/FlyString.h>
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
#include <AK/Format.h>
|
2023-01-27 10:17:34 -05:00
|
|
|
#include <AK/MemMem.h>
|
2023-02-19 20:34:29 -05:00
|
|
|
#include <AK/Stream.h>
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
#include <AK/String.h>
|
2023-10-28 15:12:53 -04:00
|
|
|
#include <AK/StringInternals.h>
|
2023-01-13 11:34:00 -05:00
|
|
|
#include <AK/Vector.h>
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
#include <stdlib.h>
|
|
|
|
|
|
|
|
namespace AK {
|
|
|
|
|
|
|
|
namespace Detail {
|
|
|
|
|
|
|
|
void StringData::operator delete(void* ptr)
|
|
|
|
{
|
|
|
|
free(ptr);
|
|
|
|
}
|
|
|
|
|
|
|
|
StringData::StringData(size_t byte_count)
|
|
|
|
: m_byte_count(byte_count)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
StringData::StringData(StringData const& superstring, size_t start, size_t byte_count)
|
|
|
|
: m_byte_count(byte_count)
|
|
|
|
, m_substring(true)
|
|
|
|
{
|
|
|
|
auto& data = const_cast<SubstringData&>(substring_data());
|
|
|
|
data.start_offset = start;
|
|
|
|
data.superstring = &superstring;
|
|
|
|
superstring.ref();
|
|
|
|
}
|
|
|
|
|
|
|
|
StringData::~StringData()
|
|
|
|
{
|
2023-01-11 08:26:49 -05:00
|
|
|
if (m_is_fly_string)
|
|
|
|
FlyString::did_destroy_fly_string_data({}, bytes_as_string_view());
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
if (m_substring)
|
|
|
|
substring_data().superstring->unref();
|
|
|
|
}
|
|
|
|
|
|
|
|
constexpr size_t allocation_size_for_string_data(size_t length)
|
|
|
|
{
|
2023-01-22 15:04:35 -05:00
|
|
|
return sizeof(StringData) + (sizeof(char) * length);
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
ErrorOr<NonnullRefPtr<StringData>> StringData::create_uninitialized(size_t byte_count, u8*& buffer)
|
|
|
|
{
|
|
|
|
VERIFY(byte_count);
|
|
|
|
void* slot = malloc(allocation_size_for_string_data(byte_count));
|
|
|
|
if (!slot) {
|
|
|
|
return Error::from_errno(ENOMEM);
|
|
|
|
}
|
|
|
|
auto new_string_data = adopt_ref(*new (slot) StringData(byte_count));
|
|
|
|
buffer = const_cast<u8*>(new_string_data->bytes().data());
|
|
|
|
return new_string_data;
|
|
|
|
}
|
|
|
|
|
|
|
|
ErrorOr<NonnullRefPtr<StringData>> StringData::from_utf8(char const* utf8_data, size_t byte_count)
|
|
|
|
{
|
|
|
|
// Strings of MAX_SHORT_STRING_BYTE_COUNT bytes or less should be handled by the String short string optimization.
|
|
|
|
VERIFY(byte_count > String::MAX_SHORT_STRING_BYTE_COUNT);
|
|
|
|
|
|
|
|
VERIFY(utf8_data);
|
|
|
|
u8* buffer = nullptr;
|
|
|
|
auto new_string_data = TRY(create_uninitialized(byte_count, buffer));
|
|
|
|
memcpy(buffer, utf8_data, byte_count * sizeof(char));
|
|
|
|
return new_string_data;
|
|
|
|
}
|
|
|
|
|
2023-03-03 09:03:45 -05:00
|
|
|
static ErrorOr<void> read_stream_into_buffer(Stream& stream, Bytes buffer)
|
|
|
|
{
|
2023-03-01 09:27:35 -05:00
|
|
|
TRY(stream.read_until_filled(buffer));
|
2023-03-03 09:03:45 -05:00
|
|
|
|
|
|
|
if (!Utf8View { StringView { buffer } }.validate())
|
|
|
|
return Error::from_string_literal("String::from_stream: Input was not valid UTF-8");
|
|
|
|
|
|
|
|
return {};
|
|
|
|
}
|
|
|
|
|
2023-02-19 20:34:29 -05:00
|
|
|
ErrorOr<NonnullRefPtr<StringData>> StringData::from_stream(Stream& stream, size_t byte_count)
|
|
|
|
{
|
|
|
|
// Strings of MAX_SHORT_STRING_BYTE_COUNT bytes or less should be handled by the String short string optimization.
|
|
|
|
VERIFY(byte_count > String::MAX_SHORT_STRING_BYTE_COUNT);
|
|
|
|
|
|
|
|
u8* buffer = nullptr;
|
|
|
|
auto new_string_data = TRY(create_uninitialized(byte_count, buffer));
|
2023-03-03 09:03:45 -05:00
|
|
|
TRY(read_stream_into_buffer(stream, { buffer, byte_count }));
|
2023-02-19 20:34:29 -05:00
|
|
|
|
|
|
|
return new_string_data;
|
|
|
|
}
|
|
|
|
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
ErrorOr<NonnullRefPtr<StringData>> StringData::create_substring(StringData const& superstring, size_t start, size_t byte_count)
|
|
|
|
{
|
|
|
|
// Strings of MAX_SHORT_STRING_BYTE_COUNT bytes or less should be handled by the String short string optimization.
|
|
|
|
VERIFY(byte_count > String::MAX_SHORT_STRING_BYTE_COUNT);
|
|
|
|
|
|
|
|
void* slot = malloc(sizeof(StringData) + sizeof(StringData::SubstringData));
|
|
|
|
if (!slot) {
|
|
|
|
return Error::from_errno(ENOMEM);
|
|
|
|
}
|
|
|
|
return adopt_ref(*new (slot) StringData(superstring, start, byte_count));
|
|
|
|
}
|
|
|
|
|
|
|
|
void StringData::compute_hash() const
|
|
|
|
{
|
|
|
|
auto bytes = this->bytes();
|
|
|
|
if (bytes.size() == 0)
|
|
|
|
m_hash = 0;
|
|
|
|
else
|
|
|
|
m_hash = string_hash(reinterpret_cast<char const*>(bytes.data()), bytes.size());
|
|
|
|
m_has_hash = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2023-12-29 09:30:15 -05:00
|
|
|
String String::from_utf8_without_validation(ReadonlyBytes bytes)
|
|
|
|
{
|
|
|
|
if (bytes.size() <= MAX_SHORT_STRING_BYTE_COUNT) {
|
|
|
|
ShortString short_string;
|
|
|
|
if (!bytes.is_empty())
|
|
|
|
memcpy(short_string.storage, bytes.data(), bytes.size());
|
|
|
|
short_string.byte_count_and_short_string_flag = (bytes.size() << 1) | SHORT_STRING_FLAG;
|
|
|
|
return String { short_string };
|
|
|
|
}
|
|
|
|
auto data = MUST(Detail::StringData::from_utf8(reinterpret_cast<char const*>(bytes.data()), bytes.size()));
|
|
|
|
return String { move(data) };
|
|
|
|
}
|
|
|
|
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
ErrorOr<String> String::from_utf8(StringView view)
|
|
|
|
{
|
2023-03-03 09:03:45 -05:00
|
|
|
if (!Utf8View { view }.validate())
|
|
|
|
return Error::from_string_literal("String::from_utf8: Input was not valid UTF-8");
|
|
|
|
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
if (view.length() <= MAX_SHORT_STRING_BYTE_COUNT) {
|
|
|
|
ShortString short_string;
|
|
|
|
if (!view.is_empty())
|
|
|
|
memcpy(short_string.storage, view.characters_without_null_termination(), view.length());
|
|
|
|
short_string.byte_count_and_short_string_flag = (view.length() << 1) | SHORT_STRING_FLAG;
|
|
|
|
return String { short_string };
|
|
|
|
}
|
|
|
|
auto data = TRY(Detail::StringData::from_utf8(view.characters_without_null_termination(), view.length()));
|
|
|
|
return String { move(data) };
|
|
|
|
}
|
|
|
|
|
2023-02-19 20:34:29 -05:00
|
|
|
ErrorOr<String> String::from_stream(Stream& stream, size_t byte_count)
|
|
|
|
{
|
|
|
|
if (byte_count <= MAX_SHORT_STRING_BYTE_COUNT) {
|
|
|
|
ShortString short_string;
|
|
|
|
if (byte_count > 0)
|
2023-03-03 09:03:45 -05:00
|
|
|
TRY(Detail::read_stream_into_buffer(stream, { short_string.storage, byte_count }));
|
2023-02-19 20:34:29 -05:00
|
|
|
short_string.byte_count_and_short_string_flag = (byte_count << 1) | SHORT_STRING_FLAG;
|
|
|
|
return String { short_string };
|
|
|
|
}
|
|
|
|
auto data = TRY(Detail::StringData::from_stream(stream, byte_count));
|
|
|
|
return String { move(data) };
|
|
|
|
}
|
|
|
|
|
2023-01-22 10:17:48 -05:00
|
|
|
ErrorOr<String> String::repeated(u32 code_point, size_t count)
|
|
|
|
{
|
|
|
|
VERIFY(is_unicode(code_point));
|
|
|
|
|
|
|
|
Array<u8, 4> code_point_as_utf8;
|
|
|
|
size_t i = 0;
|
|
|
|
|
|
|
|
size_t code_point_byte_length = UnicodeUtils::code_point_to_utf8(code_point, [&](auto byte) {
|
|
|
|
code_point_as_utf8[i++] = static_cast<u8>(byte);
|
|
|
|
});
|
|
|
|
|
|
|
|
auto copy_to_buffer = [&](u8* buffer) {
|
|
|
|
if (code_point_byte_length == 1) {
|
|
|
|
memset(buffer, code_point_as_utf8[0], count);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < count; ++i)
|
|
|
|
memcpy(buffer + (i * code_point_byte_length), code_point_as_utf8.data(), code_point_byte_length);
|
|
|
|
};
|
|
|
|
|
|
|
|
auto total_byte_count = code_point_byte_length * count;
|
|
|
|
|
|
|
|
if (total_byte_count <= MAX_SHORT_STRING_BYTE_COUNT) {
|
|
|
|
ShortString short_string;
|
|
|
|
copy_to_buffer(short_string.storage);
|
|
|
|
short_string.byte_count_and_short_string_flag = (total_byte_count << 1) | SHORT_STRING_FLAG;
|
|
|
|
|
|
|
|
return String { short_string };
|
|
|
|
}
|
|
|
|
|
|
|
|
u8* buffer = nullptr;
|
|
|
|
auto new_string_data = TRY(Detail::StringData::create_uninitialized(total_byte_count, buffer));
|
|
|
|
copy_to_buffer(buffer);
|
|
|
|
|
|
|
|
return String { move(new_string_data) };
|
|
|
|
}
|
|
|
|
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
StringView String::bytes_as_string_view() const
|
|
|
|
{
|
|
|
|
return StringView(bytes());
|
|
|
|
}
|
|
|
|
|
|
|
|
bool String::is_empty() const
|
|
|
|
{
|
|
|
|
return bytes().size() == 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
ErrorOr<String> String::vformatted(StringView fmtstr, TypeErasedFormatParams& params)
|
|
|
|
{
|
|
|
|
StringBuilder builder;
|
|
|
|
TRY(vformat(builder, fmtstr, params));
|
|
|
|
return builder.to_string();
|
|
|
|
}
|
|
|
|
|
2023-01-16 11:12:53 -05:00
|
|
|
ErrorOr<Vector<String>> String::split(u32 separator, SplitBehavior split_behavior) const
|
|
|
|
{
|
|
|
|
return split_limit(separator, 0, split_behavior);
|
|
|
|
}
|
|
|
|
|
|
|
|
ErrorOr<Vector<String>> String::split_limit(u32 separator, size_t limit, SplitBehavior split_behavior) const
|
|
|
|
{
|
|
|
|
Vector<String> result;
|
|
|
|
|
|
|
|
if (is_empty())
|
|
|
|
return result;
|
|
|
|
|
|
|
|
bool keep_empty = has_flag(split_behavior, SplitBehavior::KeepEmpty);
|
|
|
|
|
|
|
|
size_t substring_start = 0;
|
|
|
|
for (auto it = code_points().begin(); it != code_points().end() && (result.size() + 1) != limit; ++it) {
|
|
|
|
u32 code_point = *it;
|
|
|
|
if (code_point == separator) {
|
|
|
|
size_t substring_length = code_points().iterator_offset(it) - substring_start;
|
|
|
|
if (substring_length != 0 || keep_empty)
|
|
|
|
TRY(result.try_append(TRY(substring_from_byte_offset_with_shared_superstring(substring_start, substring_length))));
|
|
|
|
substring_start = code_points().iterator_offset(it) + it.underlying_code_point_length_in_bytes();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
size_t tail_length = code_points().byte_length() - substring_start;
|
|
|
|
if (tail_length != 0 || keep_empty)
|
|
|
|
TRY(result.try_append(TRY(substring_from_byte_offset_with_shared_superstring(substring_start, tail_length))));
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2023-01-22 09:24:12 -05:00
|
|
|
Optional<size_t> String::find_byte_offset(u32 code_point, size_t from_byte_offset) const
|
|
|
|
{
|
|
|
|
auto code_points = this->code_points();
|
|
|
|
if (from_byte_offset >= code_points.byte_length())
|
|
|
|
return {};
|
|
|
|
|
|
|
|
for (auto it = code_points.iterator_at_byte_offset(from_byte_offset); it != code_points.end(); ++it) {
|
|
|
|
if (*it == code_point)
|
|
|
|
return code_points.byte_offset_of(it);
|
|
|
|
}
|
|
|
|
|
|
|
|
return {};
|
|
|
|
}
|
|
|
|
|
2023-01-27 10:17:34 -05:00
|
|
|
Optional<size_t> String::find_byte_offset(StringView substring, size_t from_byte_offset) const
|
|
|
|
{
|
|
|
|
auto view = bytes_as_string_view();
|
|
|
|
if (from_byte_offset >= view.length())
|
|
|
|
return {};
|
|
|
|
|
|
|
|
auto index = memmem_optional(
|
|
|
|
view.characters_without_null_termination() + from_byte_offset, view.length() - from_byte_offset,
|
|
|
|
substring.characters_without_null_termination(), substring.length());
|
|
|
|
|
|
|
|
if (index.has_value())
|
|
|
|
return *index + from_byte_offset;
|
|
|
|
return {};
|
|
|
|
}
|
|
|
|
|
2023-01-11 08:26:49 -05:00
|
|
|
bool String::operator==(FlyString const& other) const
|
|
|
|
{
|
|
|
|
if (reinterpret_cast<uintptr_t>(m_data) == other.data({}))
|
|
|
|
return true;
|
|
|
|
return bytes_as_string_view() == other.bytes_as_string_view();
|
|
|
|
}
|
|
|
|
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
bool String::operator==(StringView other) const
|
|
|
|
{
|
|
|
|
return bytes_as_string_view() == other;
|
|
|
|
}
|
|
|
|
|
|
|
|
ErrorOr<String> String::substring_from_byte_offset(size_t start, size_t byte_count) const
|
|
|
|
{
|
|
|
|
if (!byte_count)
|
|
|
|
return String {};
|
|
|
|
return String::from_utf8(bytes_as_string_view().substring_view(start, byte_count));
|
|
|
|
}
|
|
|
|
|
2023-01-22 11:40:57 -05:00
|
|
|
ErrorOr<String> String::substring_from_byte_offset(size_t start) const
|
|
|
|
{
|
|
|
|
VERIFY(start <= bytes_as_string_view().length());
|
|
|
|
return substring_from_byte_offset(start, bytes_as_string_view().length() - start);
|
|
|
|
}
|
|
|
|
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
ErrorOr<String> String::substring_from_byte_offset_with_shared_superstring(size_t start, size_t byte_count) const
|
|
|
|
{
|
|
|
|
if (!byte_count)
|
|
|
|
return String {};
|
|
|
|
if (byte_count <= MAX_SHORT_STRING_BYTE_COUNT)
|
|
|
|
return String::from_utf8(bytes_as_string_view().substring_view(start, byte_count));
|
|
|
|
return String { TRY(Detail::StringData::create_substring(*m_data, start, byte_count)) };
|
|
|
|
}
|
|
|
|
|
2023-01-22 11:40:57 -05:00
|
|
|
ErrorOr<String> String::substring_from_byte_offset_with_shared_superstring(size_t start) const
|
|
|
|
{
|
|
|
|
VERIFY(start <= bytes_as_string_view().length());
|
|
|
|
return substring_from_byte_offset_with_shared_superstring(start, bytes_as_string_view().length() - start);
|
|
|
|
}
|
|
|
|
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
bool String::operator==(char const* c_string) const
|
|
|
|
{
|
|
|
|
return bytes_as_string_view() == c_string;
|
|
|
|
}
|
|
|
|
|
2023-09-05 13:55:21 -04:00
|
|
|
u32 String::ascii_case_insensitive_hash() const
|
|
|
|
{
|
|
|
|
return case_insensitive_string_hash(reinterpret_cast<char const*>(bytes().data()), bytes().size());
|
|
|
|
}
|
|
|
|
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
Utf8View String::code_points() const
|
|
|
|
{
|
|
|
|
return Utf8View(bytes_as_string_view());
|
|
|
|
}
|
|
|
|
|
|
|
|
ErrorOr<void> Formatter<String>::format(FormatBuilder& builder, String const& utf8_string)
|
|
|
|
{
|
|
|
|
return Formatter<StringView>::format(builder, utf8_string.bytes_as_string_view());
|
|
|
|
}
|
|
|
|
|
|
|
|
ErrorOr<String> String::replace(StringView needle, StringView replacement, ReplaceMode replace_mode) const
|
|
|
|
{
|
|
|
|
return StringUtils::replace(*this, needle, replacement, replace_mode);
|
|
|
|
}
|
|
|
|
|
2023-01-13 11:34:00 -05:00
|
|
|
ErrorOr<String> String::reverse() const
|
|
|
|
{
|
|
|
|
// FIXME: This handles multi-byte code points, but not e.g. grapheme clusters.
|
|
|
|
// FIXME: We could avoid allocating a temporary vector if Utf8View supports reverse iteration.
|
|
|
|
auto code_point_length = code_points().length();
|
|
|
|
|
|
|
|
Vector<u32> code_points;
|
|
|
|
TRY(code_points.try_ensure_capacity(code_point_length));
|
|
|
|
|
|
|
|
for (auto code_point : this->code_points())
|
|
|
|
code_points.unchecked_append(code_point);
|
|
|
|
|
|
|
|
auto builder = TRY(StringBuilder::create(code_point_length * sizeof(u32)));
|
|
|
|
while (!code_points.is_empty())
|
|
|
|
TRY(builder.try_append_code_point(code_points.take_last()));
|
|
|
|
|
|
|
|
return builder.to_string();
|
|
|
|
}
|
|
|
|
|
2023-01-27 14:37:40 -05:00
|
|
|
ErrorOr<String> String::trim(Utf8View const& code_points_to_trim, TrimMode mode) const
|
|
|
|
{
|
|
|
|
auto trimmed = code_points().trim(code_points_to_trim, mode);
|
|
|
|
return String::from_utf8(trimmed.as_string());
|
|
|
|
}
|
|
|
|
|
|
|
|
ErrorOr<String> String::trim(StringView code_points_to_trim, TrimMode mode) const
|
|
|
|
{
|
|
|
|
return trim(Utf8View { code_points_to_trim }, mode);
|
|
|
|
}
|
|
|
|
|
2023-07-07 03:52:36 -04:00
|
|
|
ErrorOr<String> String::trim_ascii_whitespace(TrimMode mode) const
|
|
|
|
{
|
|
|
|
return trim(" \n\t\v\f\r"sv, mode);
|
|
|
|
}
|
|
|
|
|
2023-01-14 10:17:32 -05:00
|
|
|
bool String::contains(StringView needle, CaseSensitivity case_sensitivity) const
|
|
|
|
{
|
|
|
|
return StringUtils::contains(bytes_as_string_view(), needle, case_sensitivity);
|
|
|
|
}
|
|
|
|
|
2023-03-08 09:06:59 -05:00
|
|
|
bool String::contains(u32 needle, CaseSensitivity case_sensitivity) const
|
2023-01-14 10:17:32 -05:00
|
|
|
{
|
2023-03-08 09:06:59 -05:00
|
|
|
auto needle_as_string = String::from_code_point(needle);
|
|
|
|
return contains(needle_as_string.bytes_as_string_view(), case_sensitivity);
|
2023-01-14 10:17:32 -05:00
|
|
|
}
|
|
|
|
|
2023-03-03 04:27:50 -05:00
|
|
|
bool String::starts_with(u32 code_point) const
|
|
|
|
{
|
2023-03-08 08:56:02 -05:00
|
|
|
if (is_empty())
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return *code_points().begin() == code_point;
|
2023-03-03 04:27:50 -05:00
|
|
|
}
|
|
|
|
|
2023-11-04 05:07:01 -04:00
|
|
|
bool String::starts_with_bytes(StringView bytes, CaseSensitivity case_sensitivity) const
|
2023-02-18 01:34:37 -05:00
|
|
|
{
|
2023-11-04 05:07:01 -04:00
|
|
|
return bytes_as_string_view().starts_with(bytes, case_sensitivity);
|
2023-02-18 01:34:37 -05:00
|
|
|
}
|
|
|
|
|
2023-03-03 04:27:50 -05:00
|
|
|
bool String::ends_with(u32 code_point) const
|
2023-02-18 01:34:37 -05:00
|
|
|
{
|
2023-03-08 08:56:02 -05:00
|
|
|
if (is_empty())
|
|
|
|
return false;
|
|
|
|
|
|
|
|
u32 last_code_point = 0;
|
|
|
|
for (auto it = code_points().begin(); it != code_points().end(); ++it)
|
|
|
|
last_code_point = *it;
|
|
|
|
|
|
|
|
return last_code_point == code_point;
|
2023-03-03 04:27:50 -05:00
|
|
|
}
|
|
|
|
|
2023-11-04 05:07:01 -04:00
|
|
|
bool String::ends_with_bytes(StringView bytes, CaseSensitivity case_sensitivity) const
|
2023-03-03 04:27:50 -05:00
|
|
|
{
|
2023-11-04 05:07:01 -04:00
|
|
|
return bytes_as_string_view().ends_with(bytes, case_sensitivity);
|
2023-02-18 01:34:37 -05:00
|
|
|
}
|
|
|
|
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
unsigned Traits<String>::hash(String const& string)
|
|
|
|
{
|
|
|
|
return string.hash();
|
|
|
|
}
|
|
|
|
|
2023-01-11 08:26:49 -05:00
|
|
|
String String::fly_string_data_to_string(Badge<FlyString>, uintptr_t const& data)
|
|
|
|
{
|
|
|
|
if (has_short_string_bit(data))
|
|
|
|
return String { *reinterpret_cast<ShortString const*>(&data) };
|
|
|
|
|
|
|
|
auto const* string_data = reinterpret_cast<Detail::StringData const*>(data);
|
2023-02-19 17:00:24 -05:00
|
|
|
return String { NonnullRefPtr<Detail::StringData const>(*string_data) };
|
2023-01-11 08:26:49 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
StringView String::fly_string_data_to_string_view(Badge<FlyString>, uintptr_t const& data)
|
|
|
|
{
|
|
|
|
if (has_short_string_bit(data)) {
|
|
|
|
auto const* short_string = reinterpret_cast<ShortString const*>(&data);
|
|
|
|
return short_string->bytes();
|
|
|
|
}
|
|
|
|
|
|
|
|
auto const* string_data = reinterpret_cast<Detail::StringData const*>(data);
|
|
|
|
return string_data->bytes_as_string_view();
|
|
|
|
}
|
|
|
|
|
2023-03-08 17:11:59 -05:00
|
|
|
u32 String::fly_string_data_to_hash(Badge<FlyString>, uintptr_t const& data)
|
|
|
|
{
|
|
|
|
if (has_short_string_bit(data)) {
|
|
|
|
auto const* short_string = reinterpret_cast<ShortString const*>(&data);
|
|
|
|
auto bytes = short_string->bytes();
|
|
|
|
return string_hash(reinterpret_cast<char const*>(bytes.data()), bytes.size());
|
|
|
|
}
|
|
|
|
|
|
|
|
auto const* string_data = reinterpret_cast<Detail::StringData const*>(data);
|
|
|
|
return string_data->hash();
|
|
|
|
}
|
|
|
|
|
2023-01-11 08:26:49 -05:00
|
|
|
uintptr_t String::to_fly_string_data(Badge<FlyString>) const
|
|
|
|
{
|
|
|
|
return reinterpret_cast<uintptr_t>(m_data);
|
|
|
|
}
|
|
|
|
|
|
|
|
void String::ref_fly_string_data(Badge<FlyString>, uintptr_t data)
|
|
|
|
{
|
|
|
|
if (has_short_string_bit(data))
|
|
|
|
return;
|
|
|
|
|
|
|
|
auto const* string_data = reinterpret_cast<Detail::StringData const*>(data);
|
|
|
|
string_data->ref();
|
|
|
|
}
|
|
|
|
|
|
|
|
void String::unref_fly_string_data(Badge<FlyString>, uintptr_t data)
|
|
|
|
{
|
|
|
|
if (has_short_string_bit(data))
|
|
|
|
return;
|
|
|
|
|
|
|
|
auto const* string_data = reinterpret_cast<Detail::StringData const*>(data);
|
|
|
|
string_data->unref();
|
|
|
|
}
|
|
|
|
|
|
|
|
void String::did_create_fly_string(Badge<FlyString>) const
|
|
|
|
{
|
|
|
|
VERIFY(!is_short_string());
|
|
|
|
m_data->set_fly_string(true);
|
|
|
|
}
|
|
|
|
|
2023-12-16 09:19:34 -05:00
|
|
|
ByteString String::to_byte_string() const
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
{
|
2023-12-16 09:19:34 -05:00
|
|
|
return ByteString(bytes_as_string_view());
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
}
|
|
|
|
|
2023-12-16 09:19:34 -05:00
|
|
|
ErrorOr<String> String::from_byte_string(ByteString const& byte_string)
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
{
|
2023-12-16 09:19:34 -05:00
|
|
|
return String::from_utf8(byte_string.view());
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
}
|
|
|
|
|
2023-11-04 05:07:01 -04:00
|
|
|
bool String::equals_ignoring_ascii_case(StringView other) const
|
|
|
|
{
|
|
|
|
return StringUtils::equals_ignoring_ascii_case(bytes_as_string_view(), other);
|
|
|
|
}
|
|
|
|
|
2023-12-29 07:20:11 -05:00
|
|
|
String String::repeated(String const& input, size_t count)
|
|
|
|
{
|
|
|
|
VERIFY(!Checked<size_t>::multiplication_would_overflow(count, input.bytes().size()));
|
|
|
|
u8* buffer = nullptr;
|
|
|
|
auto data = MUST(Detail::StringData::create_uninitialized(count * input.bytes().size(), buffer));
|
|
|
|
|
|
|
|
if (input.bytes().size() == 1) {
|
|
|
|
memset(buffer, input.bytes().first(), count);
|
|
|
|
return String { move(data) };
|
|
|
|
}
|
|
|
|
|
|
|
|
for (size_t i = 0; i < count; ++i) {
|
|
|
|
memcpy(buffer + (i * input.bytes().size()), input.bytes().data(), input.bytes().size());
|
|
|
|
}
|
|
|
|
return String { data };
|
|
|
|
}
|
|
|
|
|
AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
2022-12-01 07:27:43 -05:00
|
|
|
}
|