This particularly implements these two points:
- "If the adjusted current node is an HTML integration point and the
token is a start tag"
- "If the adjusted current node is an HTML integration point and the
token is a character token"
This also adds spec comments to the tree constructor.
We now fire "pageshow" events at the appropriate time during document
loading (done by the parser.)
Note that there are no corresponding "pagehide" events yet.
This will be used to determine whether "pageshow" and "pagehide" events
are appropriate. We won't actually make use of it until we implement
more of history traversal and document unloading.
The only difference from what we were already doing is that setting the
same ready state twice no longer fires a "readystatechange" event.
I don't think that could happen in practice though.
We will now spin in "the end" until there are no more "things delaying
the load event". Of course, nothing actually uses this yet, and there
are a lot of things that need to.
Instead of firing up a network request and synchronously blocking for it
to finish via a nested event loop, we now start an asynchronous request
when encountering <script src>.
Once the script load finishes (or fails), it gets executed at one of the
synchronization points in the HTML parser.
This solves some long-standing issues with random unexpected events
getting dispatched in the middle of parsing.
The previous implementation was about a half implementation and was
tied to Element::innerHTML. This separates it and puts it into
HTMLDocumentParser, as this is in the parsing section of the spec.
This provides a near finished HTML fragment serialisation algorithm,
bar namespaces in attributes and the `is` value.
This namespace will be used for all interfaces defined in the URL
specification, like URL and URLSearchParams.
This has the unfortunate side-effect of requiring us to use the fully
qualified AK::URL name whenever we want to refer to the AK class, so
this commit also fixes all such references.
Call HTML::EventLoop::spin_until() from the HTML parser when deciding
whether we can run a script yet.
Note that spin_until() actually doesn't do any work yet.
This completely changes how HTMLTokens store their data. Previously,
space was allocated for all token types separately. Now, the HTMLToken's
data is stored in just a String, two booleans and a Variant.
This change reduces sizeof(HTMLToken) from 68 to 32. Also, this reduces
raw tokenization time by around 20 to 50 percent, depending on the page.
Full document parsing time (with HTMLDocumentParser, on a local HTML
page without any dependency files) is reduced by between 4 and 20
percent, depending on the page.
Since tokenizing HTML pages can easily generated 50'000 tokens and more,
the storage has been designed in a way that avoids heap allocations
where possible, while trying to reduce the size of the tokens. The only
tokens which need to allocate on the heap are thus DOCTYPE tokens (max.
1 per document), and tag tokens (but only if they have attributes). This
way, only around 5 percent of all tokens generated need to allocate on
the heap (except for StringImpl allocations).
Since all interaction with the HTMLToken class now happens over getters
and setters, there is no more need for HTMLTokenizer and
HTMLDocumentParser to have direct access to the members.
This is in preparation for an upcoming storage change of HTMLToken. In
contrast to the other token types, the accessor can hand out a mutable
reference to allow users to change parts of the DoctypeData easily.
Previously, HTMLToken would expose the Vector<Attribute> directly to
its users. In preparation for a future change, all users now use
implementation-agnostic APIs which do not expose the Vector directly.
This fixes parsing the following regular expression: /</g;
It also adds a simple script element to the HTMLTokenizer regression
test, which also contains that specific regex.