Commit graph

15 commits

Author SHA1 Message Date
Max Wipfli
99d5555134 AK: Do not trim away non-ASCII bytes when parsing URL
Because non-ASCII code points have negative byte values, trimming away
control characters requires checking for negative bytes values.

This also adds a test case with a URL containing non-ASCII code points.
2021-06-05 10:53:31 +02:00
Max Wipfli
44937e2dfc AK: Update URLParser.{cpp,h} to use east const 2021-06-05 10:53:31 +02:00
DexesTTP
e01f1c949f AK: Do not VERIFY on invalid code point bytes in UTF8View
The previous behavior was to always VERIFY that the UTF-8 bytes were
valid when iterating over the code points of an UTF8View. This change
makes it so we instead output the 0xFFFD 'REPLACEMENT CHARACTER'
code point when encountering invalid bytes, and keep iterating the
view after skipping one byte.

Leaving the decision to the consumer would break symmetry with the
UTF32View API, which would in turn require heavy refactoring and/or
code duplication in generic code such as the one found in
Gfx::Painter and the Shell.

To make it easier for the consumers to detect the original bytes, we
provide a new method on the iterator that returns a Span over the
data that has been decoded. This method is immediately used in the
TextNode::compute_text_for_rendering method, which previously did
this in a ad-hoc waay.

This also add tests for the new behavior in TestUtf8.cpp, as well
as reinforcements to the existing tests to check if the underlying
bytes match up with their expected values.
2021-06-03 18:28:27 +04:30
Max Wipfli
bc8d16ad28 Everywhere: Replace ctype.h to avoid narrowing conversions
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
2021-06-03 13:31:46 +02:00
Andreas Kling
c0d1a75881 AK: Strip leading/trailing C0-control-or-space in URLs correctly
We have to stop scanning once we hit a non-strippable character.
Add some tests to cover this.
2021-06-01 13:22:04 +02:00
Andreas Kling
407d6cd9e4 AK: Rename Utf8CodepointIterator => Utf8CodePointIterator 2021-06-01 09:45:52 +02:00
Max Wipfli
0d0ed4962f AK: Add a new, spec-compliant URLParser
This adds a new URL parser, which aims to be compliant with the URL
specification (https://url.spec.whatwg.org/). It also contains a
rudimentary data URL parser.
2021-06-01 09:28:05 +02:00
Max Wipfli
0d41a7d39a AK: Remove URLParser
This removes URLParser, because its two exposed functions, urlencode()
and urldecode(), have been superseded by URL::percent_encode() and
URL::percent_decode(). This is in preparation for the introduction of a
new URL parser.
2021-06-01 09:28:05 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Linus Groh
50e3b122c7 AK: Add optional parameter for excluding chars to urlencode() 2021-01-31 19:05:55 +01:00
Conrad Pankoff
13f13a9e59 AK: Fix urlencode() with high byte values
Previously urlencode() would encode bytes above 127 incorrectly, printing
them as negative hex values.
2020-12-12 23:50:23 +01:00
Linus Groh
ba020a5907 AK: Fix logic error in urldecode() percent-decoding
We also need to append the raw consumed value if *either* of the two
characters after the % isn't a hex digit, not only if *both* aren't.

Fixes #4257.
2020-11-30 11:35:01 +01:00
asynts
1d96d5eea4 AK: Use new format functions. 2020-10-08 09:59:55 +02:00
Andreas Kling
fdfda6dec2 AK: Make string-to-number conversion helpers return Optional
Get rid of the weird old signature:

- int StringType::to_int(bool& ok) const

And replace it with sensible new signature:

- Optional<int> StringType::to_int() const
2020-06-12 21:28:55 +02:00
Andreas Kling
86eeac86a4 AK: Add basic percent encoder/decoder (urlencode and urldecode) 2020-06-07 21:05:05 +02:00