Otherwise we'd just loop trying to parse it over and over again, for
instance in `/a{/` or `/a{1,/`.
Unless we're parsing in Annex B mode, which allows `{` as a normal
ExtendedSourceCharacter.
This patch finally adds the actual calculation that goes into calc()
expressions. When the resolution of a Length that is a calculated value
the parsed CalculatedStyleValue gets traversed and appropriate values
get calculated.
This is a bit hackish, but this way the existance of the calc()
becomes transparent to the user who just wants a Length and doesn't
care where it came from.
This patch adds the parsing of previously tokenized calc() expressions
into the CSS-Parser. The tokens are processed into a complete
CalculatedStyleValue.
This also converts the GetSubstitution abstract operation take its input
strings as UTF-16 now that all callers are UTF-16 capable. This means
String.prototype.replace (and replaceAll) no longer needs UTF-8 and
UTF-16 copies of these strings.
This converts RegExpExec to perform matching with UTF-16 strings. As a
very temporary stop-gap, this adds overloads to RegExpExec and friends
for both UTF-8 and UTF-16 strings. This is only needed until the rest
of RegExp.prototype is UTF-16 capable.
This also addresses a FIXME regarding code point index correction in
RegExpExec when the Unicode flag is set.
RegExpInitialize specifies how the pattern string should be created
before passing it to [[RegExpMatcher]]. Rather than passing it as-is,
the string should be converted to code points and back to a "List" (if
the Unicode flag is present), or as a "List" of UTF-16 code units.
Further. the spec requires that we keep both the original pattern string
and this parsed string in the RegExp object.
The caveat is that the LibRegex parser further requires any multi-byte
code units to be escaped (as "\unnnn"). Otherwise, the code unit is
recognized as individual UTF-8 bytes.
When the Unicode flag is set, regular expressions may escape code points
by surrounding the hexadecimal code point with curly braces, e.g. \u{41}
is the character "A".
When the Unicode flag is not set, this should be considered a repetition
symbol - \u{41} is the character "u" repeated 41 times. This is left as
a TODO for now.
When the Unicode option is not set, regular expressions should match
based on code units; when it is set, they should match based on code
points. To do so, the regex parser must combine surrogate pairs when
the Unicode option is set. Further, RegexStringView needs to know if
the flag is set in order to return code point vs. code unit based
string lengths and substrings.