Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
Previously we might swallow invalid unicode point which would skip valid
ascii characters. This could be dangerous as we might skip a '"' thus
not closing a string where we should.
This might have been exploitable as it would not have been clear what
code gets executed when looking at a script.
Another approach to this would be simply replacing all invalid
characters with the replacement character (this is what v8 does). But
our lexer and parser are currently not set up for such a change.
When we save/load state in the parser, we preserve the lexer state by
simply making a copy of it. This was made extremely heavy by the lexer
keeping a cache of all parsed identifiers.
It keeps the cache to ensure that StringViews into parsed Unicode escape
sequences don't become dangling views when the Token goes out of scope.
This patch solves the problem by replacing the Vector<FlyString> which
was used to cache the identifiers with a ref-counted
HashTable<FlyString> instead.
Since the purpose of the cache is just to keep FlyStrings alive, it's
fine for all Lexer instances to share the cache. And as a bonus, using a
HashTable instead of a Vector replaces the O(n) accesses with O(1) ones.
This makes a 1.9 MiB JavaScript file parse in 0.6s instead of 24s. :^)
Added a test to ensure the behavior stays the same.
We now throw on a direct usage of an escaped keywords with a specific
error to make it more clear to the user.
For example, "property.br\u{64}wn" should resolve to "property.brown".
To support this behavior, this commit changes the Token class to hold
both the evaluated identifier name and a view into the original source
for the unevaluated name. There are some contexts in which identifiers
are not allowed to contain Unicode escape sequences; for example, export
statements of the form "export {} from foo.js" forbid escapes in the
identifier "from".
The test file is added to .prettierignore because prettier will replace
all escaped Unicode sequences with their unescaped value.
Some of the code assumed that chars were always signed while that is
not the case on ARM hosts.
Also, some of the code tried to use EOF (-1) in a way similar to what
fgetc() does, however instead of storing the characters in an int
variable a char was used.
While this seemed to work it also meant that character 0xFF would be
incorrectly seen as an end-of-file.
Careful reading of fgetc() reveals that fgetc() stores character
data in an int where valid characters are in the range of 0-255 and
the EOF value is explicitly outside of that range (usually -1).
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *