0000440.pdf contains an xref stream object (at offset 3643676) starting:
```
294 0 obj <<
/Type /XRef
/Index [0 295]
/Size 295
```
and an object stream object (at offset 3640121) starting:
```
230 0 obj <<
/Type /ObjStm
/N 73
/First 614
```
In both cases, the `obj` and the `<<` are separated by non-newline
whitespace.
633e1632d0 made parse_indirect_value() tolerate this, but it didn't
update neither parse_xref_stream() (which parses xref streams) nor
parse_compressed_object_with_index() (which parses object streams),
despite all three changes being part of #14873.
Make parse_xref_stream() and parse_compressed_object_with_index()
call parse_indirect_value() to pick up the fix over there. It's a bit
less code too.
(0000440.pdf is the only PDF in my 1000 test PDFs that this helps,
somewhat surprisingly.)
At first I tried implmenting the quirk from PDF 1.7 Appendix H,
3.4.4, "File Trailer": """Acrobat viewers require only that the %%EOF
marker appear somewhere within the last 1024 bytes of the file.""
This would've been like #22548 but at end-of-file instead of at
start-of-file.
This helped a bunch of files, but also broke a bunch of files that
made more than 1024 bytes of stuff at the end, and it wouldn't have
helped 0000059.pdf, which has over 40k of \0 bytes after the %%EOF.
So just tolerate whitespace after the %%EOF line, and keep ignoring
and arbitrary amount of other stuff after that like before.
This helps:
* 0000599.pdf
One trailing \0 byte after %%EOF. Due to that byte, the
is_linearized() check fails and we go down the non-linearized
codepath. But with this fix, that code path succeeds.
* 0000937.pdf
Same.
* 0000055.pdf
Has one space followed by a \n after %%EOF
* 0000059.pdf
Has over 40kB of trailing \0 bytes
The following files keep working with it:
* 0000242.pdf
5586 bytes of trailing HTML
* 0000336.pdf
5586 bytes of trailing HTML fragment
* 0000136.pdf
2054 bytes of trailing space characters
This one kind of only worked by accident before since it found
the %%EOF block before the final %%EOF block. Maybe this is
even an intentional XRefStm compat hack? Anyways, now it
find the final block instead.
* 0000327.pdf
11044 bytes of trailing HTML
It's pretty tricky to do, and also tricky with respect to skipping
trailing bytes after %%EOF: The check requires knowning the full size of
the PDF (which means web servers not sending content lengths are out),
but that size has to be after stripping trailing bytes, which normal
static file servers won't do. So PDF viewers would have to download the
last couple bytes of the PDF unconditionally, then strip trailing bytes
and use the count to figure out the final actual PDF size.
Luckily, we don't incrementally download PDFs from the net but
instead require all data to be available in one chunk, so it's
not currently a problem.
The spec isn't super clear on if this is allowed:
"""Each cross-reference section shall begin with a line containing the
keyword xref. Following this line..."""
"""The two preceding lines shall contain, one per line and in order, the
keyword startxref and..."""
It kind of sounds like anything goes on both lines as long as they
contain `xref` and `startxref`.
In practice, both seem to always occur at the start of their line,
but in 0000780.pdf (and nowhere else), there's one space after each
keyword before the following linebreak, and this makes that file load.
Per "TABLE 5.11 Entries in an encoding dictionary", /Differences is
optional.
(Per "Encodings for TrueType Fonts" in 5.5.5 Character Encoding,
nonsymbolic truetype fonts are even recommended to have "no Differences
array." But in practice, most seem to have it.)
Fixes crashes on:
* 0000001.pdf
* 0000574.pdf
* 0000337.pdf
All three don't render super great, but at least they no longer crash.
Similar to another problem we had in CharacterData, we were assuming
that the offsets were raw utf8 byte offsets into the data, instead of
utf16 code units. Fix this by using the substring helpers in
CharacterData to get the text data from the Range.
There are more instances of this issue around the place that we will
need to track down and add tests for, but this fixes one of them :^)
For the test included in this commit, we were previously returning:
llo💨😮
Instead of the expected:
llo💨😮 Wo
As the name of the Browser app is now titled Ladybird this was resulting in a
double up if installed fresh then rebooted (or likely after an upgrade). This
change corrects this by using the Ladybird title
We previously skipped updating the lookback buffer when copying
uncompressed data, which resulted in a wrong total byte count.
With a wrong total byte count, our decompressor implementation
ended up choosing a wrong offset into the dictionary.
We were previously doing a *lot* of unnecessary memcpy work when
transferring large files.
This patch addresses the issue by introducing a simple segmented buffer
with no additional work when appending new data, or when transfering out
of the buffer.
For solid color fills (with alpha = 255), the rasterizer now tracks
spans of solid colors within a scanline and fills the entire span with
a single call to fast_u32_fill().
This gave up to a 1.5x speedup drawing the Ghostscript Tiger within
SerenityOS.
Hand-written (with offsets fixed up by `mutool clean`).
Uses the default encoding for each font. Manual test for now.
Byte strings generated with:
python3 -c "for i in range(4):
print('<' +
''.join('%02x' % r for r in range(i * 64, (i + 1) * 64)) +
'>')"
For every IPC message sent, we currently prepend the message size to the
IPC message buffer. This incurs the cost of copying the entire message
to its newly allocated position. Instead, reserve the bytes for the size
at the front of the buffer upon creation. Prevent dangerous access to
the buffer with specific public methods.
This large block of code is repeated nearly verbatim in LibWeb. Move it
to a helper function that both LibIPC and LibWeb can defer to. This will
let us make changes to this method in a singular location going forward.
Note this is a bit of a regression for the MessagePort. It now suffers
from the same performance issue that IPC messages face - we prepend the
meessage size to the message buffer. This degredation is very temporary
though, as a fix is imminent, and this change makes that fix easier.
The type of MessageBuffer will be changing, and it was a bit awkward to
look around to find where the forward declaration was. This patch just
moves it to the obvious forwarding header.
We would previously always generate string parameters to pass through
to functions as a `String`. This works fine if the argument is a
`FlyString const&`, but falls apart for optional types where we need to
accept an `Optional<FlyString> const&`.
Support this by implementing a [FlyString] extended attribute which
if present results in the parameter for the function being generated
as a FlyString.
We cannot port over Optional<FlyString> until the IDL generator supports
passing that through as an argument (as opposed to an Optional<String>).
Change to FlyString where possible, and resolve any fallout as a result.
Other readers do this too, and files depend on this.
Fixes opening these four files from the PDFA 0000.zip dataset:
* 0000015.pdf
Starts with `C:\web\webeuncet\_cat\_docs\_publics\` before header
* 0000408.pdf
Starts with UTF-8 BOM
* 0000524.pdf
Starts with 867 bytes of HTML containing a PHP backtrace
* 0000680.pdf
Starts with `C:\web\webeuncet\_cat\_docs\_publics\` too
A local (non-public) PDF I have lying around contains this in
a page's operator stream:
```
[<00b4003e> 3 <002600480051> 3 <005700550044004f0003> -29
<00330044> 3 <0055> -3 <004e0040> 4 <0003> -29 <004c00560003> -31
<0057004b> 4 <00480003> -37 <0050
>] TJ
```
That is, there's a newline in a hexstring after a character.
This led to `Parser error at offset 5184: Unexpected character`.
The spec says in 3.2.3 String Objects, Hexadecimal Strings:
"""Each pair of hexadecimal digits defines one byte of the string.
White-space characters (such as space, tab, carriage return, line feed,
and form feed) are ignored."""
But we didn't ignore whitespace before or after a character, only
in between the bytes.
The spec also says:
"""If the final digit of a hexadecimal string is missing—that is, if
there is an odd number of digits—the final digit is assumed to be 0."""
In that case, we were skipping the closing `>` twice -- or, more
accurately, we ignored the character after it too. This has been
wrong all the way back in #6974.
Add a test that fails if either of the two changes isn't present.
In the UI process, we encode generated HTML as Base64 to avoid having to
deal with things like arbitrarily nested quotes. The HTML is encoded as
UTF-8, and the raw bytes of that encoding are transcoded to Base64.
In the Inspector process, we are decoding the Base64 string using atob,
which has awkward non-Unicode limitations. The resulting string is only
a byte string. We must further decode the bytes as UTF-8, which we do
using TextDecoder.
Instead of implementing stacking context painting order exactly as it
is defined in CSS2.2 "Appendix E. Elaborate description of Stacking
Contexts" we need to account for changes in the latest standards where
a box can establish a stacking context without being positioned, for
example, by having an opacity different from 1.
Fixes https://github.com/SerenityOS/serenity/issues/21137
With this, it's possible to build Ladybird without having Qt installed.
(Previously, the build required `moc` to exist.)
In fact, it's possible to build Ladybird without anything off `brew`
as long as you have `ninja` and `gn` (both of which don't have any
dependencies themselves and are easy to build).
This change fixes the initial tool selection when pixelpaint is started
with a path. Previously an already existing editor was expected when
the default tool was initially propagated - which was not the case if
pixelpaint was launched to directly load an existing image.