This converts RegExpExec to perform matching with UTF-16 strings. As a
very temporary stop-gap, this adds overloads to RegExpExec and friends
for both UTF-8 and UTF-16 strings. This is only needed until the rest
of RegExp.prototype is UTF-16 capable.
This also addresses a FIXME regarding code point index correction in
RegExpExec when the Unicode flag is set.
RegExpInitialize specifies how the pattern string should be created
before passing it to [[RegExpMatcher]]. Rather than passing it as-is,
the string should be converted to code points and back to a "List" (if
the Unicode flag is present), or as a "List" of UTF-16 code units.
Further. the spec requires that we keep both the original pattern string
and this parsed string in the RegExp object.
The caveat is that the LibRegex parser further requires any multi-byte
code units to be escaped (as "\unnnn"). Otherwise, the code unit is
recognized as individual UTF-8 bytes.
When the Unicode flag is set, regular expressions may escape code points
by surrounding the hexadecimal code point with curly braces, e.g. \u{41}
is the character "A".
When the Unicode flag is not set, this should be considered a repetition
symbol - \u{41} is the character "u" repeated 41 times. This is left as
a TODO for now.
When the Unicode option is not set, regular expressions should match
based on code units; when it is set, they should match based on code
points. To do so, the regex parser must combine surrogate pairs when
the Unicode option is set. Further, RegexStringView needs to know if
the flag is set in order to return code point vs. code unit based
string lengths and substrings.
To be used as a RegexStringView variant, Utf16View must provide a couple
more helper methods. It must also not default its assignment operators,
because that implicitly deletes move/copy constructors.
This bug manifests it self when the caller to sys$pledge() passes valid
promises, but invalid execpromises. The code would apply the promises
and then return an error for the execpromises. This leaves the user in
a confusing state, as the promises were silently applied, but we return
an error suggesting the operation has failed.
Avoid this situation by tweaking the implementation to only apply the
promises / execpromises after all validation has occurred.