- Pre-allocate and reuse sample decompression buffers. In many FLAC
files, the amount of samples per frame is either constant or the
largest frame will be hit within the first couple of frames. Also,
during audio output, we need to move and combine the samples from the
decompression buffers into the final output buffers anyways. Avoiding
the reallocation of these large buffers provides an improvement from
16x to 18x decode speed on strongly compressed but otherwise usual
input.
- Leave a FIXME for a similar improvement that can be made in the
residual decoder.
- Pre-allocate audio chunks if frame size is known.
- Use reasonable inline capacities in several places where we know the
maximum or usual capacity needed.
Since we can have up to 32 bits of input data, multiplications may need
up to 63 bits. This was accounted for in some places, but by far not in
all, and oss-fuzz found multiple integer overflows. We now use i64 in
all of the decoding, since we need to rescale samples to float later on
anyways. If a final sample value ends up out of range (and the range can
be a maximum of 32 bits), we may get samples past 1, but that then is a
non-compliant input file, and using over-range samples (and most likely
clipping audio) is considerably less weird than overflowing and
glitching audio.
This removes a lot of duplicated stream creation code from the plugins,
and also simplifies the way that the appropriate plugin is found. This
mirrors the ImageDecoderPlugin design and necessitates new sniffing
methods on the loaders.
We report a rounded up PCM sample format to the outside, but use the
exact bit depth as specified in header and frames.
This makes the three FLAC spec tests with a a bit depth of 20 pass.
"Improve" is an understatement, since this commit makes all FLAC files
seek without errors, mostly with high accuracy, and partially even fast:
- A new generic seek table type is introduced, which keeps an
always-sorted list of seek points, which allows it to use binary
search and fast insertion.
- Automatic seek points are inserted according to two heuristics
(distance between seek points and minimum seek precision), which not
only builds a seek table for already-played sections of the file, but
improves seek precision even for files with an existing seek table.
- Manual seeking by skipping frames works properly now and is still used
as a last resort.
Before, some loader plugins implemented their own buffering (FLAC&MP3),
some didn't require any (WAV), and some didn't buffer at all (QOA). This
meant that in practice, while you could load arbitrary amounts of
samples from some loader plugins, you couldn't do that with some others.
Also, it was ill-defined how many samples you would actually get back
from a get_more_samples call.
This commit fixes that by introducing a layer of abstraction between the
loader and its plugins (because that's the whole point of having the
extra class!). The plugins now only implement a load_chunks() function,
which is much simpler to implement and allows plugins to play fast and
loose with what they actually return. Basically, they can return many
chunks of samples, where one chunk is simply a convenient block of
samples to load. In fact, some loaders such as FLAC and QOA have
separate internal functions for loading exactly one chunk. The loaders
*should* load as many chunks as necessary for the sample count to be
reached or surpassed (the latter simplifies loading loops in the
implementations, since you don't need to know how large your next chunk
is going to be; a problem for e.g. FLAC). If a plugin has no problems
returning data of arbitrary size (currently WAV), it can return a single
chunk that exactly (or roughly) matches the requested sample count. If a
plugin is at the stream end, it can also return less samples than was
requested! The loader can handle all of these cases and may call into
load_chunk multiple times. If the plugin returns an empty chunk list (or
only empty chunks; again, they can play fast and loose), the loader
takes that as a stream end signal. Otherwise, the loader will always
return exactly as many samples as the user requested. Buffering is
handled by the loader, allowing any underlying plugin to deal with any
weird sample count requirement the user throws at it (looking at you,
SoundPlayer!).
This (not accidentally!) makes QOA work in SoundPlayer.
`Stream` will be qualified as `AK::Stream` until we remove the
`Core::Stream` namespace. `IODevice` now reuses the `SeekMode` that is
defined by `SeekableStream`, since defining its own would require us to
qualify it with `AK::SeekMode` everywhere.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This doesn't have any immediate uses, but this adapts the code a bit
more to `Core::Stream` conventions (as most functions there use
NonnullOwnPtr to handle streams) and it makes it a bit clearer that this
pointer isn't actually supposed to be null. In fact, MP3LoaderPlugin
and FlacLoaderPlugin apparently forgot to check for that completely
before starting to decode data.
This now prepares all the needed (fallible) components before actually
constructing a LoaderPlugin object, so we are no longer filling them in
at an arbitrary later point in time.
The file is now renamed to Queue.h, and the Resampler APIs with
LegacyBuffer are also removed. These changes look large because nobody
actually needs Buffer.h (or Queue.h). It was mostly transitive
dependencies on the massive list of includes in that header, which are
now almost all gone. Instead, we include common things like Sample.h
directly, which should give faster compile times as very few files
actually need Queue.h.
The format of these names is "Full Abbreviation (.fileformat)". For
example: "FLAC (.flac)", "RIFF WAVE (.wav)", "MPEG Layer III (.mp3)",
"Vorbis (.ogg)" The reasoning is that the container and therefore the
file ending may differ significantly from the actual format, and the
format should be given as unambiguously as possible and necessary.
Previously, FlacLoader would read the data for each frame into a
separate vector, which are then combined via extend() in the end. This
incurs an avoidable copy per frame. By having the next_frame() function
write into a given Span, there's only one vector allocated per call to
get_more_samples().
This increases performance by at least 100% realtime, as measured by
abench, from about 1200%-1300% to (usually) 1400% on complex test files.
Previously, a libc-like out-of-line error information was used in the
loader and its plugins. Now, all functions that may fail to do their job
return some sort of Result. The universally-used error type ist the new
LoaderError, which can contain information about the general error
category (such as file format, I/O, unimplemented features), an error
description, and location information, such as file index or sample
index.
Additionally, the loader plugins try to do as little work as possible in
their constructors. Right after being constructed, a user should call
initialize() and check the errors returned from there. (This is done
transparently by Loader itself.) If a constructor caused an error, the
call to initialize should check and return it immediately.
This opportunity was used to rework a lot of the internal error
propagation in both loader classes, especially FlacLoader. Therefore, a
couple of other refactorings may have sneaked in as well.
The adoption of LibAudio users is minimal. Piano's adoption is not
important, as the code will receive major refactoring in the near future
anyways. SoundPlayer's adoption is also less important, as changes to
refactor it are in the works as well. aplay's adoption is the best and
may serve as an example for other users. It also includes new buffering
behavior.
Buffer also gets some attention, making it OOM-safe and thereby also
propagating its errors to the user.
Decoding the residual in FLAC subframes is by far the most I/O-heavy
operation in FLAC decoding, as the residual data makes up the majority
of subframe data in LPC subframes. As the residual consists of many
Rice-encoded numbers with different bit sizes for differently large
numbers, the residual decoder frequently reads only one or two bytes at
a time. As we use a normal FileInputStream, that directly translates to
many calls to the read() syscall. We can see that the I/O overhead while
FLAC decoding is quite large, and much time is spent in the read()
syscall's kernel code.
This is optimized by using a Buffered<FileInputStream> instead, leading
to 4K blocks being read at once and a large reduction in I/O overhead.
Benchmarking with the new abench utility gives a 15-20% speedup on
identical files, usually pushing FLAC decoding to 10-15x realtime speed
on common sample rates.
This fixes all current code smells, bugs and issues reported by
SonarCloud static analysis. Other issues are almost exclusively false
positives. This makes much code clearer, and some minor benefits in
performance or bug evasion may be gained.
"Frame" is an MPEG term, which is not only unintuitive but also
overloaded with different meaning by other codecs (e.g. FLAC).
Therefore, use the standard term Sample for the central audio structure.
The class is also extracted to its own file, because it's becoming quite
large. Bundling these two changes means not distributing similar
modifications (changing names and paths) across commits.
Co-authored-by: kleines Filmröllchen <malu.bertsch@gmail.com>
All audio applications (aplay, Piano, Sound Player) respect the ability
of the system to have theoretically any sample rate. Therefore, they
resample their own audio into the system sample rate.
LibAudio previously had its loaders resample their own audio, even
though they expose their sample rate. This is now changed. The loaders
output audio data in their file's sample rate, which the user has to
query and resample appropriately. Resampling code from Buffer, WavLoader
and FlacLoader is removed.
Note that these applications only check the sample rate at startup,
which is reasonable (the user has to restart applications when changing
the sample rate). Fully dynamic adaptation could both lead to errors and
will require another IPC interface. This seems to be enough for now.
FlacLoader initialized, but never used its resampler; this is now fixed
and all subframes are resampled before decorrelation occurs. FLAC files
with non-44100-Hz sample rates now play properly.
The FlacLoader already has numerous checks for invalid data reads and
for invalid stream states, but it never actually handles the stream
errors on the stream object. By handling them properly we can actually
run FuzzFlacLoader for longer than a few seconds before it hits the
first assertion :^).
This commit adds a loader for the FLAC audio codec, the Free Lossless
Audio codec by the Xiph.Org foundation. LibAudio will automatically
read and parse FLAC files, so users do not need to adjust.
This implementation is bare-bones and needs to be improved upon.
There are many bugs, verbatim subframes and any kind of seeking is
not supported. However, stereo files exported by libavcodec on
highest compression setting seem to work well.