Emoji sequences in the grapheme segmentation spec are a bit tricky:
\p{Extended_Pictographic} Extend* ZWJ × \p{Extended_Pictographic}
Our current strategy of tracking a boolean to indicate if we are in an
emoji sequence was causing us to break up emoji made of multiple sub-
sequences. For example, in the "family: man, woman, girl, boy" sequence:
U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466
We would break at indices 0 (correctly) and 6 (incorrectly).
Instead of tracking a boolean, it's quite a bit simpler to reason about
emoji sequences by just skipping past them entirely. Note that in cases
like the above emoji, we skip one sub-sequence at a time.
We briefly discussed this when adding the new String type but couldn't
settle on a name. However, having to use String::from_utf8() on every
literal string is a bit unwieldy, so let's have these options available!
Naming-wise '_string' is not as short as 'sv' but should be relatively
clear; it also matches '_bigint' and '_ubigint' in length.
'_short_string' may be longer than the actual string itself, but it's
still an improvement over the static function :^)
Since our C++ source files are UTF-8 encoded anyway, it should be
impossible to create a string literal with invalid UTF-8, so including
that in the name is not as important as in the function that can receive
arbitrary data.
The LibWeb fuzzer build is really slow, so for local builds it is useful
to disable it when you're not interested in running that fuzzer.
Co-authored-by: Andrew Kaster <akaster@serenityos.org>
This is for lossy compression, in which case a WebP file is
a single VP8 key frame.
This only parses the 10-byte frame header, which contains image
dimensions (and some other things).
For now, just dbgln_if() all data. Eventually we'll want to use at
least width and height.
No behavior change.
(Well, technically, this now correctly sets the state to Error
if the first chunk is neither of 'VP8 ', 'VP8L', 'VP8X'. But no
*interesting* behavior change.)
This was a rather easy change, since only parameter names make use of
strings in the first place.
This also improves OOM resistance: If we can't create a parameter name,
we will just set it to the empty string.
try_append() checks if the vector should increase capacity and if
so grows the vector. unchecked_append() verifies the vector already
has enough capacity, and will never grow the vector.
Instead of using a special case of the annotate_mapping syscall, let's
introduce a new prctl option to disallow further annotations of Regions
as new syscall Region(s).
Since the ProcFS doesn't hold many global objects within it, the need
for a fully-structured design of backing components and a registry like
with the SysFS is no longer true.
To acommodate this, let's remove all backing store and components of the
ProcFS, so now it resembles what we had in the early days of ProcFS in
the project - a mostly-static filesystem, with very small amount of
kmalloc allocations needed.
We still use the inode index mechanism to understand the role of each
inode, but this is done in a much "static"ier way than before.
We are currently converting parsed expiry times to local time, whereas
the RFC dictates we parse them as UTC. When expiring cookies, we must
also use the current UTC time to compare against the cookies' expiry
times.
This reverts commit eb1ef59603c13c43b87c099c43c4d118dc8441f6.
The idea of saving clip box to apply it to handle `overflow: hidden`
turned out to break painting if box is painted before it's containing
block (it is possible if box has negative z-index).
This makes undoing actions performed on layer masks work as
expected.
did_modify_bitmap() is now also called on redo, to ensure the layer
mask is displayed correctly.
The PDFFont class hierarchy was very simple (a top-level PDFFont class,
followed by all the children classes that derived directly from it).
While this design was good enough for some things, it didn't correctly
model the actual organization of font types:
* PDF fonts are first divided between "simple" and "composite" fonts.
The latter is the Type0 font, while the rest are all simple.
* PDF fonts yield a glyph per "character code". Simple fonts char codes
are always 1 byte long, while Type0 char codes are of variable size.
To this effect, this commit changes the hierarchy of Font classes,
introducing a new SimpleFont class, deriving from PDFFont, and acting as
the parent of Type1Font and TrueTypeFont, while Type0 still derives from
PDFFont directly. This distinction allows us now to:
* Model string rendering differently from simple and composite fonts:
PDFFont now offers a generic draw_string method that takes a whole
string to be rendered instead of a single char code. SimpleFont
implements this as a loop over individual bytes of the string, with
T1 and TT implementing draw_glyph for drawing a single char code.
* Some common fields between T1 and TT fonts now live under SimpleFont
instead of under PDFfont, where they previously resided.
* Some other interfaces specific to SimpleFont have been cleaned up,
with u16/u32 not appearing on these classes (or in PDFFont) anymore.
* Type0Font's rendering still remains unimplemented.
As part of this exercise I also took the chance to perform the following
cleanups and restructurings:
* Refactored the creation and initialisation of fonts. They are all
centrally created at PDFFont::create, with a virtual "initialize"
method that allows them to initialise their inner members in the
correct order (parent first, child later) after creation.
* Removed duplicated code.
* Cleaned up some public interfaces: receive const refs, removed
unnecessary ctro/dtors, etc.
* Slightly changed how Type1 and TrueType fonts are implemented: if
there's an embedded font that takes priority, otherwise we always
look for a replacement.
* This means we don't do anything special for the standard fonts. The
only behavior previously associated to standard fonts was choosing an
encoding, and even that was under questioning.
Errors can (and do) occur when trying to render text, and so far we've
silently ignored them, making us think that all is well when it isn't.
Letting show_text return errors will allow us to inform the user about
these errors instead of having to hiding them.
I drew the two webp files in Photoshop and saved them using the
"Save a Copy..." dialog, with ICC profile and all other boxes checked.
(I also tried saving with all the boxes unchecked, but it still wrote an
extended webp instead of a basic file.)
The lossless file exposed a bug: I didn't handle chunk padding
correctly before this patch.