This option is already enabled when building Lagom, so let's enable it
for the main build too. We will no longer be surprised by Lagom Clang
CI builds failing while everything compiles locally.
Furthermore, the stronger `-Wsuggest-override` warning is enabled in
this commit, which enforces the use of the `override` keyword in all
classes, not just those which already have some methods marked as
`override`. This works with both GCC and Clang.
There are 443 number system objects generated, each of which held an
array of number system symbols. Of those 443 arrays, only 39 are unique.
To uniquely store these, this change moves the generated NumericSymbol
enumeration to the public LibUnicode/NumberFormat.h header with a pre-
defined set of symbols that we need. This is to ensure the generated,
unique arrays are created in a known order with known symbols. While it
is unfortunate to no longer discover these symbols at generation time,
it does allow us to ignore unwanted symbols and perform less string-to-
enumeration conversions at lookup time.
The evolution of UniqueStorage has been as follows:
1. It was created as UniqueStringStorage to ensure only one copy of each
unique string is generated. Interested parties stored an index into
a unique string list, rather than the string itself.
Commits: f9e605397c and 04e6b43f05
2. It became apparent that non-string structures could also be de-
duplicated to reduce the size of libunicode.so. UniqueStringStorage
was generalized to UniqueStorage for this purpose.
Commit: d8e6beb14f
It's now also apparent that there's heavy duplication of lists of
structures. For example, the NumberFormat generator stores 4 lists of
NumberFormat objects. In total, we currently generate nearly 2,000 lists
of these objects, of which 275 are unique.
This change updates UniqueStorage to support storing lists. The only
change is how the storage is generated - we generate each stored list
individually, then an array storing spans of those lists.
In the CLDR, there aren't "night" values, there are "night1" & "night2"
values. This is for locales which use a different name for nighttime
depending on the hour. For example, the ja locale uses "夜" between the
hours of 19:00 and 23:00, and "夜中" between the hours of 23:00 and
04:00. Our CLDR parser is currently ignoring "night2", so this rename
is to prepare for that.
We could probably come up with better names, but in the end, the API in
LibUnicode will be such that outside callers won't even see Night1, etc.
Pattern skeletons are more or less the "key" of format patterns. Every
format pattern is assigned a skeleton. Interval patterns (which are not
yet parsed) are also assigned a skeleton - this is used to match them to
an "owning" format pattern. So we will use the skeleton generated here
to match format patterns at runtime with their available interval
patterns.
An alternative approach would be to append interval patterns directly to
their owning format pattern, but this has some draw backs:
1. Skeletons aren't totally unique. A skeleton may appear in both
the "dateFormats" and "availableFormats" objects, in which case
the same interval formats would be generated more than once.
2. Otherwise unique format patterns may only differ by the interval
patterns assigned to them. This would cause the UniqueStorage for
the format patterns to increase in size, impacting both compile
times and libunicode.so size.
The parsing in parse_calendar_symbols() might be a bit more verbose than
it really needs to be, but it is to ensure the symbols are generated in
a known order that we can control with enumerations.
TR-35's Matching Skeleton algorithm dictates how user requests including
fractional second digits should be handled when the CLDR format pattern
does not include that field. When the format pattern contains {second},
but does not contain {fractionalSecondDigits}, generate a second pattern
which appends "{decimal}{fractionalSecondDigits}" to the {second} field.
TR-35 does define lengths for {ampm}, but they are unused by ECMA-402.
To the contrary, defining the day_period length for this segment will
prevent BasicFormatMatcher from ever selecting a pattern that contains
this segment. Instead, ECMA-402 will only use the short length for
{ampm} segments.
TR-35 describes how to combine date, time, and available formats with
date-time format patterns to generate more available format patterns:
https://unicode.org/reports/tr35/tr35-dates.html#Missing_Skeleton_Fields
Use these steps to generate ~400 new patterns for each calendar. These
are required for ECMA-402's BasicFormatMatcher to produce reasonable
results.
Similar to NumberFormat, replace the segments of date-time patterns with
partitions that can be split at runtime. Also generate the pattern style
fields for e.g. era, day, hour, etc.
Add unique storage for parsed CalendarPattern structures to ensure only
one copy of each structure is generated.
This doesn't have any impact on libunicode.so with the current generated
data. Rather, this prevents the amount of generated data from needlessly
growing astronomically once date-time patterns are fully parsed. There
will be 173,459 patterns parsed, of which only 22,495 (about 12%) are
unique. This change will save a few MB, and will also help compilation
times.
Currently, there's only a handful of entries in these arrays, so it is
not a huge deal to generate them inline with the struct that holds them.
But they will each soon contain a few hundred entries. Generate them out
of line for easier viewing in the generated code.
Add unique storage for parsed NumberFormat structures to ensure only one
copy of each structure is generated. Reduces libunicode.so on x86 from
13.2 MB to 11.4 MB.
UniqueStringStorage is used to ensure only one copy of a string will be
generated, and interested parties store just an index into the generated
storage. Generalize this class to allow any* type to be stored uniquely.
* To actually be storable, the type must have both an AK::Format and an
AK::Traits overload available.
The synchronous call returns a NonnullOwnPtr that we don't use, so we
have to cast to prevent a compiler warning once smart pointers become
[[nodiscard]].
This is not a calendar supported by ECMA-402, so let's not waste space
with its data.
Further, don't generate "gregorian" as a valid Unicode locale extension
keyword. It's an invalid type identifier, thus cannot be used in locales
such as "en-u-ca-gregorian".
For example, consider the following adjacent entries in UnicodeData.txt:
3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
4DBF;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
Our current implementation would assign the display name "CJK Ideograph
Extension A" to code points U+3400 & U+4DBF, but not to the code points
in between. Not only should those code points be assigned a name, but
the Unicode spec also has formatting rules on what the names should be
(the names for these ranged code points are not as they appear in
UnicodeData.txt).
The spec also defines names for code point ranges that actually are
listed individually in UnicodeData.txt. For example:
2F800;CJK COMPATIBILITY IDEOGRAPH-2F800;Lo;0;L;4E3D;;;;N;;;;;
2F801;CJK COMPATIBILITY IDEOGRAPH-2F801;Lo;0;L;4E38;;;;N;;;;;
2F802;CJK COMPATIBILITY IDEOGRAPH-2F802;Lo;0;L;4E41;;;;N;;;;;
Code points are only coalesced into a range if all fields after the name
are equivalent. Our parser will insert the range and its name formatting
pattern when it comes across the first code point in that range, then
ignore other code points in that range. This reduces the number of names
we generated by nearly 2,000.
Unlike most data in the CLDR, hour cycles are not stored on a per-locale
basis. Instead, they are keyed by a string that is usually a region, but
sometimes is a locale. Therefore, given a locale, to determine the hour
cycles for that locale, we:
1. Check if the locale itself is assigned hour cycles.
2. If the locale has a region, check if that region is assigned hour
cycles.
3. Otherwise, maximize that locale, and if the maximized locale has
a region, check if that region is assigned hour cycles.
4. If the above all fail, fallback to the "001" region.
Further, each locale's default hour cycle is the first assigned hour
cycle.
This hasn't mattered yet by chance, because the source for all enums
contains names of the same case. But the enum generated for hour cycle
regions will have mixed case. Sort them case-insensitively in order to
traverse these names in the same order in both generate_enum and
generate_mapping.
Similar to number formatting, the data for date-time formatting will be
located in its own generated file. This extracts the cldr-dates package
from the CLDR and sets up the generator plumbing to create the date-time
data files.
Currently, we generate separate data files for locale and number format
related tables/methods, but provide public accessors for all of the data
in one Locale.h file. Rather than continuing this trend for date-time,
relative time, etc. formatting, it's a bit easier to reason about if the
public accessors are also in separate files.
At the moment we just check if we *can* render a simple triangle, we do
not yet actually test if the image is indeed the triangle we wanted.
This test also outputs the rendered image when GL_DEBUG is enabled to a
file called "picture.bmp" for manual verification.
Co-authored-by: sunverwerth <s.unverwerth@serenityos.org>
Previously, a libc-like out-of-line error information was used in the
loader and its plugins. Now, all functions that may fail to do their job
return some sort of Result. The universally-used error type ist the new
LoaderError, which can contain information about the general error
category (such as file format, I/O, unimplemented features), an error
description, and location information, such as file index or sample
index.
Additionally, the loader plugins try to do as little work as possible in
their constructors. Right after being constructed, a user should call
initialize() and check the errors returned from there. (This is done
transparently by Loader itself.) If a constructor caused an error, the
call to initialize should check and return it immediately.
This opportunity was used to rework a lot of the internal error
propagation in both loader classes, especially FlacLoader. Therefore, a
couple of other refactorings may have sneaked in as well.
The adoption of LibAudio users is minimal. Piano's adoption is not
important, as the code will receive major refactoring in the near future
anyways. SoundPlayer's adoption is also less important, as changes to
refactor it are in the works as well. aplay's adoption is the best and
may serve as an example for other users. It also includes new buffering
behavior.
Buffer also gets some attention, making it OOM-safe and thereby also
propagating its errors to the user.