Commit graph

35 commits

Author SHA1 Message Date
Gingeh
a630c67a32 AK: Avoid returning null StringViews instead of empty views
This was error-prone and most users were just checking the length anyway

(cherry picked from commit 57ba720fb184ddf59fb09d3290e07cd425be450e)
2024-11-25 09:21:14 -05:00
Ali Mohammad Pur
bc9e03ea38 AK: Cache all the line positions in LineTrackingLexer
Also updates a LibWeb text test that used to report the wrong line
number.

(cherry picked from commit 02b50d463b174e5d525c7ab8ce8dd173d550de28;
amended to exclude LineTrackingLexer from KERNEL, since that now use
make<>)
2024-11-12 04:25:50 -05:00
Timothy Flynn
c56a965126 AK: Make a couple of GenericLexer helper methods protected
We will want to use the exact behavior of these methods in JsonParser.

(cherry picked from commit c39a3fef17e913da944c35d57541298232bcea53)
2024-07-07 18:47:09 +02:00
Ali Mohammad Pur
bc301b6f40 AK+LibXML+JSSpecCompiler: Move LineTrackingLexer to AK
This is a simple extension of GenericLexer, and is used in more than
just LibXML, so let's move it into AK.
The move also resolves a FIXME, which is removed in this commit.
2024-02-16 15:26:43 +01:00
kleines Filmröllchen
eada4f2ee8 AK: Remove ByteString from GenericLexer
A bunch of users used consume_specific with a constant ByteString
literal, which can be replaced by an allocation-free StringView literal.

The generic consume_while overload gains a requires clause so that
consume_specific("abc") causes a more understandable and actionable
error.
2024-01-12 17:03:53 -07:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Dan Klishch
b65d281bbb AK: Add GenericLexer::{consume_decimal_integer,peek_string} 2023-11-04 18:06:30 +01:00
Ali Mohammad Pur
aeee98b3a1 AK+Everywhere: Remove the null state of DeprecatedString
This commit removes DeprecatedString's "null" state, and replaces all
its users with one of the following:
- A normal, empty DeprecatedString
- Optional<DeprecatedString>

Note that null states of DeprecatedFlyString/StringView/etc are *not*
affected by this commit. However, DeprecatedString::empty() is now
considered equal to a null StringView.
2023-10-13 18:33:21 +03:30
Sam Atkins
c06f4ac6f5 AK+Everywhere: Make GenericLexer::ignore_until() stop before the value
`consume_until(foo)` stops before foo, and so does
`ignore_until(Predicate)`, so let's make the other `ignore_until()`
overloads consistent with that so they're less confusing.
2023-02-28 12:55:10 +00:00
Ali Mohammad Pur
9c61fed37c AK: Add an input() accessor to GenericLexer
It's sometimes useful to get the input from the lexer instead of wiring
it all the way down to where it's needed.
2023-02-18 06:55:46 +03:30
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Linus Groh
d26aabff04 Everywhere: Run clang-format 2022-12-03 23:52:23 +00:00
Andreas Kling
ae3ffdd521 AK: Make it possible to not using AK classes into the global namespace
This patch adds the `USING_AK_GLOBALLY` macro which is enabled by
default, but can be overridden by build flags.

This is a step towards integrating Jakt and AK types.
2022-11-26 15:51:34 +01:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
c70f45ff44 Everywhere: Explicitly specify the size in StringView constructors
This commit moves the length calculations out to be directly on the
StringView users. This is an important step towards the goal of removing
StringView(char const*), as it moves the responsibility of calculating
the size of the string to the user of the StringView (which will prevent
naive uses causing OOB access).
2022-07-12 23:11:35 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Ali Mohammad Pur
b3c18db463 AK: Add a 'is_not_any_of' similar to 'is_any_of' to GenericLexer
It's often useful to have the negated version, so instead of making a
local lambda for it, let's just add the negated form too.
2022-03-28 23:11:48 +02:00
Idan Horowitz
b22cb40565 AK: Exclude GenericLexer String APIs from the Kernel
These APIs are only used by userland, and String is OOM-infallible,
so let's just ifdef it out of the Kernel.
2022-02-16 22:21:37 +01:00
Idan Horowitz
d49d2c7ec4 AK: Add a consume_until(StringView) overload to GenericLexer
This allows us to skip a strlen call.
2022-01-25 13:41:09 +03:30
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Timothy Flynn
fd8ccedf2b AK: Add GenericLexer API to consume an escaped Unicode code point
This parsing is already duplicated between LibJS and LibRegex, and will
shortly be needed in more places in those libraries. Move it to AK to
prevent further duplication.

This API will consume escaped Unicode code points of the form:
    \\u{code point}
    \\unnnn (where each n is a hexadecimal digit)
    \\unnnn\\unnnn (where the two escaped values are a surrogate pair)
2021-08-19 23:49:25 +02:00
Idan Horowitz
39a9cf4bb4 AK: Add a retreat(count) method to GenericLexer
This method can be used to rewind a constant amount backwards in the
source instead of one by one with retract()
2021-07-12 19:05:17 +01:00
Lenny Maiorani
254e010c75 AK/GenericLexer: constexpr where possible
Problem:
- Much of the `GenericLexer` can be `constexpr`, but is not.

Solution:
- Make it `constexpr` and de-duplicate code.
- Extend some of `StringView` with `constexpr` to support.
- Add tests to ensure `constexpr` behavior.

Note:
- Construction of `StringView` from pointer and length is not
  `constexpr`-compatible at the moment because the VERIFY cannot be,
  yet.
2021-04-22 20:27:21 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Linus Groh
1daa5158eb AK: Add GenericLexer::retreat()
This allows going back one character at a time, and then re-consume
previously consumed chars.
The code I need this for looks something like this:

    ASSERT(lexer.consume_specific('\\'));
    if (lexer.next_is("foo"))
        ...
    lexer.retreat();
    lexer.consume_escaped_character();  // This expects lexer.peek() == '\\'
2020-10-29 11:52:31 +01:00
AnotherTest
27040e65eb AK: Add `GenericLexer::consume_escaped_character()'
...and use it in `consume_and_unescape_string()'.
2020-10-22 23:49:51 +02:00
asynts
6351a56d27 AK+Format: Do some housekeeping in the format implementation. 2020-10-02 20:48:19 +02:00
Benoît Lormeau
f0f6b09acb AK: Remove the ctype adapters and use the actual ctype functions instead
This finally takes care of the kind-of excessive boilerplate code that were the
ctype adapters. On the other hand, I had to link `LibC/ctype.cpp` to the Kernel
(for `AK/JsonParser.cpp` and `AK/Format.cpp`). The previous commit actually makes
sense now: the `string.h` includes in `ctype.{h,cpp}` would require to link more LibC
stuff to the Kernel when it only needs the `_ctype_` array of `ctype.cpp`, and there
wasn't any string stuff used in ctype.
Instead of all this I could have put static derivatives of `is_any_of()` in the
concerned AK files, however that would have meant more boilerplate and workarounds;
so I went for the Kernel approach.
2020-09-27 21:15:25 +02:00
Benoit Lormeau
e4da2875c5 AK: Use templates instead of Function for Conditions in the GenericLexer
Since commit 1ec59f28ce turns the ctype macros
into functions we can now feed them directly to a GenericLexer! This will lead to
removing the ctype adapters that were kind-of excessive boilerplate, but needed as
the Kernel doesn't compile with the LibC.
2020-09-27 21:15:25 +02:00
Benoit Lormeau
8f34b493e4 AK: Enhance GenericLexer's string consumption
The `consume_quoted_string()` can now take an escape character. This allows it
(for example) to capture a string's enclosing quotes. The escape character is
optional by default.

You can also consume and unescape a quoted string with the eponymous method
`consume_and_unescape_string()`. It takes an escape character as parameter
(backslash by default). It builds a String in which common escape sequences
get... unescaped :^) (e.g. \n, \r, \t...).
2020-09-26 17:17:53 +02:00
Benoit Lormeau
66481ad279 AK: Added explanatory comments in GenericLexer.h 2020-09-26 17:17:53 +02:00
asynts
84d276dba0 AK: Add GenericLexer::remaining.
This is useful for debugging with printf :^).
2020-09-26 00:00:50 +02:00
AnotherTest
441807f96d AK: Add is_any_of(StringView) to GenericLexer 2020-08-31 23:05:58 +02:00
Nico Weber
064159d215 LibWeb: Use GenericLexer in WrapperGenerator 2020-08-21 16:01:48 +02:00
Benoît Lormeau
7b356c33cb
AK: Add a GenericLexer and extend the JsonParser with it (#2696) 2020-08-09 11:34:26 +02:00