Commit graph

50 commits

Author SHA1 Message Date
Andrew Kaster
941a9846a3 Meta: Assume files already extracted for ENABLE_NETWORK_DOWNLOADS=OFF
This allows external meta build systems to extract a cached archive into
our Cache directory without having to also copy the .tar.gz file.
2023-08-10 20:10:05 -06:00
Timothy Flynn
fd1fbad1d2 LibGfx+LibUnicode: Support specifying the path to search for emoji
Similar to the FontDatabase, this will be needed for Ladybird to find
emoji images. We now generate just the file name of emoji image in
LibUnicode, and look for that file in the specified path (defaulting to
/res/emoji) at runtime.
2023-03-01 14:54:16 +00:00
Timothy Flynn
8c38d46c1a LibUnicode: Generate the path to emoji images alongside emoji data
This will provide for quicker emoji lookups, rather than having to
discover and allocate these paths at runtime before we find out if they
even exist.
2023-02-24 19:48:47 +01:00
Timothy Flynn
8f2589b3b0 LibUnicode: Parse and generate case folding code point data
Case folding rules have a similar mapping style as special casing rules,
where one code point may map to zero or more case folding rules. These
will be used for case-insensitive string comparisons. To see how case
folding can differ from other casing rules, consider "ß" (U+00DF):

    >>> "ß".lower()
    'ß'

    >>> "ß".upper()
    'SS'

    >>> "ß".title()
    'Ss'

    >>> "ß".casefold()
    'ss'
2023-01-18 14:43:40 +00:00
Timothy Flynn
2334b4cebd Meta: Move UCD/CLDR/TZDB downloaded artifacts to Build/caches
They currently reside under Build/<arch>, meaning that they would be
redownloaded for each architecture/toolchain build combo. Move them to a
location that can be re-used for all builds.
2022-12-24 09:46:28 -05:00
Timothy Flynn
ed84a6f6ee LibUnicode: Use www.unicode.org domain to download emoji-test.txt
The non-www domain does not appear to be available now. We use the www
domain for UCD.zip already.

Co-authored-by: Stephan Unverwerth <s.unverwerth@serenityos.org>
2022-12-20 12:09:16 -05:00
Timothy Flynn
bd592480e4 Meta: Replace Bash script for generating emoji.txt with C++ generator
We currently have two build-time parsers for the UCD's emoji-test.txt
file. To prepare for future changes, this removes the Bash parser and
moves its functionality to the newer C++ parser.
2022-10-27 12:59:56 +02:00
Timothy Flynn
9fad23018a Meta: Remove unused "prefix" variable from invoke_generator() helper
This became unused after 1ae0cfd.
2022-10-16 21:16:48 +02:00
Andrew Kaster
1ae0cfd08b CMake+Userland: Use CMakeLists from Userland to build Lagom Libraries
Also do this for Shell.

This greatly simplifies the CMakeLists in Lagom, replacing many glob
patterns with a big list of libraries. There are still a few special
libraries that need some help to conform to the pattern, like LibELF and
LibWebView.

It also lets us remove essentially all of the Serenity or Lagom binary
directory detection logic from code generators, as now both projects
directories enter the generator logic from the same place.
2022-10-16 16:36:39 +02:00
Timothy Flynn
51854e345a LibUnicode: Update to Unicode version 15.0.0
https://unicode.org/versions/Unicode15.0.0/
2022-09-21 14:04:22 +01:00
Timothy Flynn
b7ef36aa36 LibUnicode: Parse and generate custom emoji added for SerenityOS
Parse emoji from emoji-serenity.txt to allow displaying their names and
grouping them together in the EmojiInputDialog.

This also adds an "Unknown" value to the EmojiGroup enum. This will be
useful for emoji that aren't found in the UCD, or for when UCD downloads
are disabled.
2022-09-11 20:33:57 +01:00
Timothy Flynn
b61eca0a1e LibUncode: Parse and generate emoji code point data
According to TR #51, the "best definition of the full set [of emojis] is
in the emoji-test.txt file". This defines not only the emoji themselves,
but the order in which they should be displayed, and what "group" of
emojis they belong to.
2022-09-08 23:12:31 +01:00
Timothy Flynn
89d1813b5d LibUnicode: Move CLDR data generators to a LibLocale subfolder
To prepare for placing all CLDR generated data in a new library,
LibLocale, this moves the code generators for the CLDR data to the
LibLocale subfolder.
2022-09-05 14:37:16 -04:00
Tim Schumacher
854792c340 Meta: Don't generate emoji.txt into the source tree 2022-09-05 09:50:31 -04:00
Timothy Flynn
2e0b20ef01 Meta: Only run the emoji generator for Serenity builds
It is not needed on Lagom, and was incidentally run twice.
2022-08-23 19:03:43 +01:00
Timothy Flynn
6dd8161002 Meta: Ensure the emoji generator depends on its own script
If the script changes, it better be re-run.
2022-08-23 19:03:43 +01:00
Timothy Flynn
d86b25c460 Meta: Move downloading of emoji-test.txt to unicode_data.cmake
The current emoji_txt.cmake does not handle download errors (which were
a common source of issues in the build problems channel) or Unicode
versioning. These are both handled by unicode_data.cmake. Move the
download to unicode_data.cmake so that we can more easily handle next
month's Unicode 15 release.
2022-08-22 16:00:29 +01:00
Timothy Flynn
ea78bac36d LibUnicode: Parse and generate per-locale plural rules from the CLDR
Plural rules in the CLDR are of the form:

"cs": {
    "pluralRule-count-one": "i = 1 and v = 0 @integer 1",
    "pluralRule-count-few": "i = 2..4 and v = 0 @integer 2~4",
    "pluralRule-count-many": "v != 0 @decimal 0.0~1.5, 10.0, 100.0 ...",
    "pluralRule-count-other": "@integer 0, 5~19, 100, 1000, 10000 ..."
}

The syntax is described here:
https://unicode.org/reports/tr35/tr35-numbers.html#Plural_rules_syntax

There are up to 2 sets of rules for each locale, a cardinal set and an
ordinal set. The approach here is to generate a C++ function for each
set of rules. Each condition in the rules (e.g. "i = 1 and v = 0") is
transpiled to a C++ if-statement within its function. Then lookup tables
are generated to match locales to their generated functions.

NOTE: -Wno-parentheses-equality is added to the LibUnicodeData compile
flags because the generated plural rules have lots of extra parentheses
(because e.g. we need to selectively negate and combine rules). The code
to generate only exactly the right number of parentheses is quite hairy,
so this just tells the compiler to ignore the extras.
2022-07-08 11:51:54 +02:00
Timothy Flynn
1f2542247f LibUnicode: Upgrade to CLDR version 41.0.0
Release notes: https://cldr.unicode.org/index/downloads/cldr-41

Note that the HourCycleRegion enum now contains 272 entires, thus needs
to be bumped from u8 to u16.
2022-04-07 08:29:10 -04:00
Timothy Flynn
8a46794ff8 LibUnicode: Replace individual UCD file downloads with single UCD.zip
Instead of downloading nearly 20 files individually, we can download a
single .zip file similar to how we download a single CLDR .zip. This is
to reduce the number of connections/downloads to/from unicode.org.
2022-04-06 17:12:08 -07:00
Brian Gianforcaro
66e7ac1954 Meta: Error out on find_program errors with CMake less than 3.18
We have seen some cases where the build fails for folks, and they are
missing unzip/tar/gzip etc. We can catch some of these in CMake itself,
so lets make sure to handle that uniformly across the build system.

The REQUIRED flag to `find_program` was only added on in CMake 3.18 and
above, so we can't rely on that to actually halt the program execution.
2022-03-19 15:01:22 -07:00
Timothy Flynn
d0fc61e79b LibUnicode: Extract the BCP 47 package from the CLDR
This package was originally meant to be included in CLDR version 40, but
was missed in their release scripts. This has been resolved:
https://unicode-org.atlassian.net/browse/CLDR-15158

Unfortunately, the CLDR was re-released with the same version number. So
to bust the build's CLDR cache, change the "version" used to detect that
we need to redownload the CLDR.
2022-02-16 07:23:07 -05:00
thankyouverycool
0505e031f1 Meta+LibUnicode: Download and parse Unicode block properties
This parses Blocks.txt for CharacterType properties and creates
a global display array for use in apps.
2022-02-15 10:13:19 -05:00
Idan Horowitz
2d50c08f34 LibUnicode: Download and parse {Grapheme,Word,Sentence} break props 2022-01-31 21:05:04 +02:00
Timothy Flynn
27eda77c97 LibUnicode: Create a nearly empty generator for relative-time formatting
This sets up the generator plumbing to create the relative-time data
files. This data could probably be included in the date-time generator,
but that generator is large enough that I'd rather put this tangentially
related data in its own file.
2022-01-27 21:16:44 +00:00
Timothy Flynn
c7ef86f5d9 Meta: Download UCD and CLDR data with fallible download function 2022-01-26 00:22:53 +00:00
Timothy Flynn
c5138f0f2b LibUnicode: Parse number system digits from the CLDR
We had a hard-coded table of number system digits copied from ECMA-402.
Turns out these digits are in the CLDR, so let's parse the digits from
there instead of hard-coding them.
2022-01-12 10:49:07 +01:00
Timothy Flynn
9ba386a7bb Meta: Move invoke_generator to utils.cmake 2022-01-08 12:45:34 +01:00
Timothy Flynn
d5f14b5ff9 Meta: Move remove_unicode_data_if_version_changed to utils.cmake
This function will be used by the time zone database parser. Move it to
the common utilities file, and rename it remove_path_if_version_changed
to be more generic.
2022-01-08 12:45:34 +01:00
Timothy Flynn
71903ea7e1 LibUnicode: Parse and generate calendar (ca) Unicode keywords
Also removes a few fly-by "StringView x = nullptr;" unnecessary
initializers.
2021-11-29 22:48:46 +00:00
Timothy Flynn
48ce72e472 LibUnicode: Parse and generate regional hour cycles
Unlike most data in the CLDR, hour cycles are not stored on a per-locale
basis. Instead, they are keyed by a string that is usually a region, but
sometimes is a locale. Therefore, given a locale, to determine the hour
cycles for that locale, we:

    1. Check if the locale itself is assigned hour cycles.
    2. If the locale has a region, check if that region is assigned hour
       cycles.
    3. Otherwise, maximize that locale, and if the maximized locale has
       a region, check if that region is assigned hour cycles.
    4. If the above all fail, fallback to the "001" region.

Further, each locale's default hour cycle is the first assigned hour
cycle.
2021-11-29 22:48:46 +00:00
Timothy Flynn
5c57341672 LibUnicode: Create a nearly empty generator for date-time formatting
Similar to number formatting, the data for date-time formatting will be
located in its own generated file. This extracts the cldr-dates package
from the CLDR and sets up the generator plumbing to create the date-time
data files.
2021-11-29 22:48:46 +00:00
Timothy Flynn
1539ed12f1 LibUnicode: Functionalize the Unicode generator CMake commands
Makes it a bit easier to add a new generator.
2021-11-23 22:58:05 +01:00
Ben Wiederhake
b06b54772e Meta+LibUnicode: Provide code point names through library 2021-11-20 00:31:55 +01:00
Timothy Flynn
4b535ce1c8 LibUnicode: Stop passing the cldr-core package to UnicodeNumberFormat
This is no longer needed now that this generator isn't parsing the
default-content locales.
2021-11-19 11:45:35 +01:00
Timothy Flynn
cafb717486 LibUnicode: Parse and generate CLDR unit data for Intl.NumberFormat
The units data is in another CLDR package, cldr-units.
2021-11-16 23:14:09 +00:00
Timothy Flynn
e9493a2cd5 LibUnicode: Ensure UnicodeNumberFormat is aware of default content
For example, there isn't a unique set of data for the en-US locale;
rather, it defaults to the data for the en locale. See this commit for
much more detail: 357c97dfa8
2021-11-13 11:52:45 +00:00
Timothy Flynn
1f2ac0ab41 LibUnicode: Move number formatting code generator to UnicodeNumberFormat 2021-11-12 20:46:38 +00:00
Timothy Flynn
03c023d7e9 LibUnicode: Upgrade to CLDR version 40.0.0
Release notes:
https://github.com/unicode-org/cldr-json/releases/tag/40.0.0
2021-11-09 20:44:52 +01:00
Timothy Flynn
becbb0ea97 LibUnicode: Upgrade to Unicode version 14.0.0 2021-09-30 17:37:57 +01:00
Timothy Flynn
a466b78bf3 LibUnicode: Remove cached UCD and CLDR files when their version changes 2021-09-30 17:37:57 +01:00
Timothy Flynn
0ec6b4f132 LibUnicode: Use consistent variable naming in unicode_data.cmake
I kept mixing these up in my head.
2021-09-30 17:37:57 +01:00
Timothy Flynn
583d703f61 LibUnicode: Functionalize extraction of CLDR data files 2021-09-30 17:37:57 +01:00
Timothy Flynn
ead30c26e5 LibUnicode: Functionalize downloading of UCD data files 2021-09-30 17:37:57 +01:00
Timothy Flynn
87bca78ef2 LibUnicode: Extract UCD and CLDR versions to a CMake variable
This also surrounds expansion of affected URL and path variables with
quotes.
2021-09-30 17:37:57 +01:00
Andrew Kaster
fc8d1bf3ce Meta: Allow specifying alternative paths for downloaded Unicode data
This lets us possibly share downloaded artifacts between different
builds without re-downloading them every time you change toolchains.
2021-09-15 19:04:52 +04:30
Timothy Flynn
9cd986d8c0 LibUnicode: Extract cldr-misc dataset from CLDR database 2021-09-06 23:49:56 +01:00
Timothy Flynn
caf5b6fa6f LibUnicode: Extract cldr-core dataset from CLDR database 2021-09-01 14:14:47 +01:00
Brian Gianforcaro
619200774b CMake: Add custom target to build only the generated sources
This is needed so all headers and files exist on disk, so that
the sonar cloud analyzer can find them when executing the compilation
commands contained in compile_commands.json, without actually building.

Co-authored-by: Andrew Kaster <akaster@serenityos.org>
2021-08-30 16:44:16 +02:00
Andrew Kaster
e88761b2b9 Meta+LibUnicode: Move unicode_data helper to Meta/CMake
Moving this helper CMake file to the centralized Meta/CMake folder helps
to get a better grasp on what extra files are required for the build,
and what files are generated.

While we're at it, don't use add_compile_definitions for
ENABLE_UNICODE_DATA, which only needs to be seen by LibUnicode sources.
2021-08-28 08:44:17 +01:00
Renamed from Userland/Libraries/LibUnicode/unicode_data.cmake (Browse further)