This adds the m_username, m_password, m_paths and m_cannot_be_a_base_url
member variables to the URL class. These are necessary for the upcoming
new URL parser.
The deprecated m_path variable shadows the m_paths variable if it is
non-null. This behavior will be removed once the old URL parser has been
removed.
This removes URLParser, because its two exposed functions, urlencode()
and urldecode(), have been superseded by URL::percent_encode() and
URL::percent_decode(). This is in preparation for the introduction of a
new URL parser.
This replaces all occurrences of those functions with the newly
implemented functions URL::percent_encode() and URL::percent_decode().
The old functions will be removed in a further commit.
This adds a few new functions to percent encode/decode strings according
to the URL specification. The functions allow specifying a
PercentEncodeSet, which is defined by the specification. It will be used
to replace the current urlencode() and urldecode() functions in a
further commit.
This commit adds a few duplicate helper functions in the URL class, such
as is_digit() and is_ascii_digit(). This will be cleaned up as soon as
the upcoming new URL parser will replace the current one.
This adds a peek method for Utf8CodepointIterator, which enables it to
be used in some parsing cases where peeking is necessary.
peek(0) is equivalent to operator*, expect that peek() does not contain
any assertions and will just return an empty Optional<u32>.
This also implements a test case for iterating UTF-8.
This renames all references to protocol to scheme, which is the name
used by the URL standard (https://url.spec.whatwg.org/). Externally, all
methods referencing "protocol" were duplicated with "scheme". The old
methods still exist as compatibility.
This patch removes unnecessary function parameter names in declarations
of the URL class. It also changes parameter types from String to
StringView where applicable.
For non-x86 targets, it's not very nice to define inline functions in
AK/Memory.h with asm volatile implementations. Guard this inline
assembly with ARCH(I386) and provide portable alternatives. Since we
always compile with optimizations, the hand-vectorized memset and
memcpy seem to be of dubious value, but we'll keep them here until
proven one way or another.
This should fix the Lagom build on native M1 macOS that was reported
on Discord the other day.
Previously the StringBuilder class would use memcpy() to write
directly into the ByteBuffer's buffer. Instead we should use the
append() method which ensures we don't overrun the buffer.
This allows us to mark the slow part (i.e. where we copy the buffer) as
NEVER_INLINE because this should almost never get called and therefore
should also not get inlined into callers.
Previously ByteBuffer::grow() behaved like Vector<T>::resize().
However the function name was somewhat ambiguous - and so this patch
updates ByteBuffer to behave more like Vector<T> by replacing grow()
with resize() and adding an ensure_capacity() method.
This also lets the user change the buffer's capacity without affecting
the size which was not previously possible.
Additionally this patch makes the capacity() method public (again).
Previously, we would go crazy and shift things way out of bounds.
Add tests to verify that the decoding algorithm is safe around the
limits of the result type.
printf didn't check whether the additional integer variable belongs to
the field width specifier or to the precision specifier, and always
applied it to the field width instead.
Implement the case distinction that we already use in literal width
and precision specifiers for the variable version as well so that
they are correctly attributed.
These dbgln's caused excessive load in the WebServer process,
accounting for ~67% of the processing time when serving a webpage
with a bunch of resources like serenityos.org/happy/2nd/.
We want to discourage folks from using APIs which lull you into a sense
of false safety in terms of OOM. There are cases where you want to force
allocations to succeed or crash, but those should use a more explicit
API than `AK::adopt_own(.)`.
The ASAN_[UN]POISON_MEMORY_REGION macros can be used to manually notify
the AddressSanitizer runtime about the reachability of instrumented code
accessing a memory region. This is most useful for manually managed
heaps and arenas that do not go directly to malloc or alligned_alloc.
Previously GCC came to the conclusion that we were reading
m_outline_capacity via ByteBuffer(ByteBuffer const&) -> grow()
-> capacity() even though that could never be the case because
m_size is 0 at that point which means we have an inline buffer
and capacity() would return inline_capacity in that case without
reading m_outline_capacity.
This makes GCC inline parts of the grow() function into the
ByteBuffer copy constructor which seems sufficient for GCC to
realize that m_outline_capacity isn't actually being read.
When compiling the Kernel with Og, the compiler complains that
m_outline_capacity might be uninitialized when calling capacity()
Note that this fix is not really what we want. Ideally only outline
buffer and outline capacity would need initialized, not the entire
inline buffer. However, clang considers the class to not be
default-constructible if we make that change, while gcc accepts it.
We had two functions for doing mostly the same thing. Combine both
of them into String::find() and use that everywhere.
Also add some tests to cover basic behavior.
Problem:
- The constructor is defined to be the default constructor.
Solution:
- Let the compiler generate the destructor by setting it to the
default.
We were accidentally calling calculate_base64_decoded_length instead,
which resulted in extra allocations during the StringBuilder::append
calls that can be avoided.
This patch implements a Unicode-safe substring method, which can be used
when offset and length should be specified in actual characters instead
of bytes.
This can be used to mitigate issues where a string is split in the
middle of a UTF-8 multi-byte character, which leads to invalid UTF-8.
Furthermore, it implements to common shorthands for substring methods
which take only an offset and return the substring until the end of the
string.