"TPGD" is short for "Typical Prediction for Generic Direct coding",
and the "ON" bit turns it on. In this mode, before decoding a line,
we decode a single bit first that controls if the current line is
just a copy of the previous line. If so, the line's pixels aren't
encoded, the decoder just copies the previous line.
I created this by running
jbig2 -i Tests/LibGfx/test-inputs/bmp/bitmap -f bmp \
-o bitmap -F jb2 -ini tpgdon.ini
where tpgdon.ini contained:
-Gen -Seg 1
-Gen -Param -TpGDon 1
See previous commits in this directory for details on the `jbig2` tool.
Sadly, the TPGDON writing path in `jbig2` wasn't implemented yet,
so I had to add this. See the PR that added this commit for my
local diff to `jbig2`.
I'm somewhat confident that my change to `jbig2` (and hence the
image added in this commit) is correct because:
1. `jbig2` succeeds in converting this file to a bmp file,
while it failed without my patch (the decoding codepath in
`jbig2` does have TPGDON support)
2. Other pdf viewers display the output of
`Meta/jbig2_to_pdf.py -o foo.pdf path/to/bitmap-tpgdon.jbig2 399 400`
the same way we do
Most image viewers can't display JBIG2 files.
All PDF viewers can display JBIG2 files.
This is useful for checking that PDF viewers render JBIG2 files the
same way we do.
These are standalone applications meant to be run by the user directly,
as opposed to other libexec processes which are programmatically forked
by the browser. To do this, we simply remove these processes from the
`ladybird_helper_processes` list. We must also explicitly list the
dependencies for these processes.
Text can be rendered in various ways in PDFs: Filled, stroked,
both filled and stroked, set as clipping path, hidden, or
some combinations thereof.
We don't implement any of this at the moment except "filled".
Hidden text is used in scanned documents: The image of the scan is
drawn in the background, and then OCRd text is "drawn" as hidden
on top of the scanned bitmap. That way, the (hidden) text can be
selected and copied, and it looks like you're selecting text from
the scanned bitmap. Find-in-page also works similarly. (We currently
have neither text selection nor find-in-page, but one day we will.)
Now that we have pretty good support for CCITT and are growing some
support for JBIG2, we now draw both the scanned background image
as well as the foreground text. They're not always perfectly aligned.
This change makes it so that we don't render text that's marked as
hidden. (We still do most of the coordinate math, which will probably
come in handy at some point when we implement text selection.)
This makes these scanned documents appear as they're supposed to
appear (at least in documents where we manage to decode the background
bitmap).
This also adds a debug option to force rendering of hidden text.
Before this change, we would wake up on every event loop iteration to
drive animations in single-frame images. This was a complete waste of
time and caused 100% CPU usage on our main GitHub repo page.
With this change, CPU usage is ~1% when idle on the same page. :^)
This commit introduces a WEB_SET_PROTOTYPE_FOR_INTERFACE macro that
caches the interface name in a local static FlyString. This means that
we only pay for FlyString-from-literal lookup once per browser lifetime
instead of every time the interface is instantiated.
Document::navigable() can be unpleasantly slow, since we don't have a
direct link between documents and navigables at the moment. So let's not
call it twice when once is enough.
This allows us to skip evaluating selectors like "[foo=bar]" for any
element that doesn't have a "foo" attribute.
Note that the bucket is case-insensitively keyed on the attribute name
since case sensitivity is depending on evaluation context. This ensures
we may get some false positives but no false negatives.
Reduces the number of selectors evaluated by 36% when loading our GitHub
repo at https://github.com/SerenityOS/serenity
We already grow the "rules to run" vector before appending to it, so we
can actually use unchecked_append() here and avoid the "needs to grow"
checks every time we append to it.
This takes appending from 3% to <1% when loading our GitHub repo.
When matching a CSS attribute selector against an HTML element, the
attribute name is case-insensitive. Before this change, that meant we
had to call equals_ignoring_ascii_case() on all the attribute names.
We now cache the attribute name lowercased on each Attr node, which
allows us to do FlyString-to-FlyString comparison (simple pointer
comparison).
This brings attribute selector matching from 6% to <1% when loading our
GitHub repo at https://github.com/SerenityOS/serenity
It seems to do the right thing already, and nothing in the spec says
not to do this as far as I can tell.
With this, we can finally decode
Tests/LibGfx/test-inputs/jbig2/bitmap.jbig2 and add a test for
decoding simple arithmetic-coded images.
This errors out on many special cases. None of those seem to be hit
in practice (with the exception of TPGDON, which is used in a handful
PDFs. I have an implementation of that locally, but I'll put that
in a separate PR. The code for it is straightforward, but adding a
test for it is a bit involved.)
With this, we can decode about half of the JBIG2 images in my PDF
test dataset.
In practice, everything uses white backgrounds and operators `or`
or `xor` to turn them black, at least for the simple images we're
about to be able to decode.
To make sure we don't forget implementing this for real once needed,
reject other ops, and also reject black backgrounds (because 1 | 0
is 1, not 0 like our overwrite implementation will produce).
This means we have to remove a test, but since this scenario doesn't
seem to happen in practice, that seems ok.
The context can vary for every bit we read.
This does not affect the one use in the test which reuses the same
context for all bits, but it is necessary for future changes.
The API value of a <textarea> element is its raw value with normalized
newlines. This should be used in a couple of places where we currently
use the raw value.