serenity/Documentation/Browser/BrowsingContextsAndNavigables.md

12 KiB

LibWeb: Browsing Contexts and Navigables

NOTE: This document is a work in progress!

Introduction: How does code execute, really?

Before we can dive into how LibWeb and Ladybird implement the HTML web page navigation operations, we need to dive into some fundamental specification concepts. Starting with, how does code actually execute in a (possibly virtual) machine? Next we'll look at what that means for the ECMAScript Specification (JavaScript), and finally how the ECMAScript code execution model ties into the HTML specification to model how to display web content into a browser tab.

Native Code Execution: A Primer

When modeling the execution of a native program written in a popular systems language like C, C++, or Rust, most systems programmers should be familiar with the concepts of threads and processes. In a "hosted" environment, the execution of one's userspace program generally starts with an underlying operating system creating a process for the application to run in. This process will contain a memory space for program data and code to live in, and an initial, or main thread to start execution on. In order for the operating system to change which thread is executing on a particular CPU core, it needs to save and restore the Execution Context for that thread. The Execution Context for a native thread generally consists of a set of CPU registers, any floating point state, a program counter that tracks which instruction should be loaded next, and a stack pointer that points to the local data the thread was using to track its function call stack and local variables. The programmer can also request additional threads through a system call, providing a set of thread attributes and a function to call as the entry point.

For traditional compiled programs, the concept of accessing variables and functions is split into two phases. At compile time, local variables and arguments are folded into stack slots and allocated into registers. Exported variables and functions are written into the executable object file (ELF, Mach-O, PE, etc.) and are visible to external tools as symbols, as referenced by a symbol table contained within the object file format. Normally local variable and argument names and locations are lost in the compile+link steps, but the compiler can be configured to emit extra debug information to allow debuggers to access and modify them at runtime. In order to support something like the dynamic imports of interpreted languages, the programmer has to call a platform-specific function to load the new module (e.g. dlopen or LoadLibrary). But after the module is opened, in order to actually refer to any exported symbols from that module the programmer has to retrieve the address of each symbol through another platform specific function (e.g. dlsym or GetProcAddress), once per symbol.

ECMAScript Execution Model: Realms and Agents

The ECMAScript specification has analogs for almost all of these concepts in the section on Executable Code and Execution Contexts.

Working in the other direction from the native code explanation, ECMAScript describes the accessibility and scopes of functions, variables, and arguments in terms of Environment Records. Note that these Environment Records are not actually visible to executing code, and are simply a mechanism used by the specification authors to model the language. Every function and module has a type of Environment Record that contains the variables, functions, catch clause bindings, and other language constructs that affect which names are visible at any location in the code. These Environment Records are nested, in a tree-like structure that somewhat matches the Abstract Syntax Tree (AST).

The root of the tree of Environment Records is the Global Environment Record, which corresponds to the Global Object and its properties. In JavaScript, there is always a this value representing the current object context. At global scope, the Global Object normally takes that responsibility. In a REPL, that might be some REPL specific global object that has global functions to call for doing things like loading from the filesystem, or even be as complex as Node or Bun. In a Browser context, the Global object is normally the Window, unless there's a Worker involved. For historical reasons the global this binding for Window contexts is actually a WindowProxy that wraps the Window. This concept is quite different from a native executable, where there's no actual object representing the global scope, simply symbols that the linker and loader make available to each module.

While the Global Object and its Global Environment represent the root of the tree of identifiers visible to the executing JavaScript code, the Global Object isn't sufficient to model all the state around a conceptual thread of execution in ECMAScript. This is where the two concepts of Realms and Execution Contexts come into play. A Realm Record is a container that holds a global object, its associated Global Environment, a set of intrinsic objects, and any host (also called an embedder in some specification documents) defined extra state that needs to be associated with the realm. In LibWeb, the Host Defined slot holds an object that has the HTML Environment Settings Object for each realm, as well as all the prototypes, constructors, and namespaces that need to be exposed on the Global Object for Web APIs. On top of the Realm abstraction, ECMAScript uses the Execution Context to model the state of execution of one particular script or module. Each Execution Context belongs to an execution context stack with the topmost context named the running execution context. An Execution Context has information about the current function, the script or module that the current code block belongs to, additional Environment Records required to access names in the current scope, any running generator state, and most importantly to the thread analogy, the state needed to suspend and resume execution of that script. As with Environment Records, Realms and Execution contexts are not directly accessible to running JavaScript code.

The final missing piece for the JavaScript execution model is how these stacks of Execution Contexts are actually scheduled to run by the ECMAScript implementation. In the most common case, this means directly mapping the ECMAScript model to the earlier native concepts of threads and processes in a way that allows for flexibility in the implementation strategies. The last thing that the specification authors want to do is constrain implementations so much that innovation and experimentation becomes impossible. The method for this mapping is the two related specification mechanisms Agents and Agent Clusters. The Execution Context stack mentioned above actually belongs to an Agent, which holds said stack, a set of metadata about the memory model, and a shared reference to an executing thread. According to ECMAScript, there should always be at least one Execution Context on the stack, to allow concepts such as the running execution context to always refer to the topmost Execution Context of the surrounding agent. However, the HTML specification opts to remove the default execution context from the execution context stack at creation, and instead manually pushes and pops execution contexts for script, module, and callback execution. The relationship between Realms and Agents is not 1-1, but N-1. In the ECMAScript specification, this manifests as a part of the Shadow Realm proposal, while the Web platform requires multiple Realms per Agent to specify the historical behavior of <iframe> and related elements.

An Agent holds a stack of Execution Contexts, with the topmost entry being the running execution context. Each Execution Context holds a Realm and a specific script's context, including the current function and any state required to pause and resume the execution for that context. The Realm holds the Global Object for the Execution Context, and any ECMAScript or host-specific intrinsics required to create the desired environment for code to run in. More loosely, an Agent is a specification artefact that somewhat maps the execution of a JavaScript script or module to a native thread of execution. But the specification does so in a way that allows a host/embedder to choose to switch out which Agent is currently executing its running execution context on that native thread, and which Realm within that Agent owns the running execution context.

SharedArrayBuffers and Atomics add a special kind of wrinkle to the ECMAScript specification. Defining how that work required the formalization of a memory model, similar to what C++11 and C11 and Java 5 had to do before them. The Agent Cluster is the formalism that ties the memory model back to the execution model. As described in the specification, an Agent Cluster is a set of Agents that can communicate via shared memory. The exact mechanism is unspecified, but the hard rule is that all Agents within a particular Agent Cluster must observe the same order of reads and writes to SharedArrayBuffers and as a result of ECMAScript Atomic objects.

The net result of all this memory model and atomic specification language is that loosely, an Agent models a "candidate execution" of some code module that can execute on a thread, and any suspended execution contexts from things like generators or async that are part of that module and its dynamic imports. An Agent Cluster models the interaction of agents that share the ability to communicate via shared memory. The simplest reading of this is that the specification authors had in mind the type of memory sharing that threads within the same process have in native code execution. So an Agent Cluster loosely models a collection of Agents (read: threads) that execute independently of each other within the same implementation defined manner for sharing memory between different threads (read: process).

HTML Execution Model: Global Scopes

The Document Object Model (DOM) specifications are written in such a way that implementers can create language bindings for any language to access the page. However, experience has shown that the most popular way to script web content in modern web browsers is through JavaScript bindings. As such, the HTML specification is specifically tailored to meet the constraints of JavaScript execution in its scripting APIs and related concepts. Great care is taken to ensure that JavaScript written by different authors cannot interfere with each other, and that arbitrary scripts cannot exfiltrate information about the page content to third-party destinations.

The HTML specification therefore has a section on Agents and Agent Clusters at the start of the section on how scripting behaves on the Web platform.

TODO: Finish this section

HTML Navigation: Juggling Origins

Global Scopes, Browsing Contexts, Browsing Context Groups, Navigables, and Traversable Navigables

TODO:

  • Agents defined by the HTML Spec
  • Global Objects (Global Scopes) defined by the HTML Spec
  • Agents and Browsing Context Groups
  • Navigables and their relationship to Browsing Contexts
  • Walk through construction of a browser tab, its traversable navigable, and its navigation both same and cross-origin
  • Walk through construction of a browser tab with a nested browsing context and what happens when the nested context within its navigable container navigates on its own