We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
The shared parts are now firmly compiled into LibC instead of being
defined as a static library and then being copied over manually.
The non-shared ("local") parts are kept as a static library that is
linked into each binary on demand.
This finally allows us to support linking with the -fstack-protector
flag, which now replaces the `ssp` target being linked into each binary
accidentally via CMake.
There's still a GLOB pattern for the LibC assembly files, but orgaizing
the patterns to use ${SERENITY_ARCH} instead of a big if-else chain
makes the patterns easier to understand.
This lets us remove a glob pattern from LibC, the DynamicLoader, and,
later, Lagom. The Kernel already has its own separate list of AK files
that it wants, which is only a subset of all AK files.
Note that as part of this commit semaphore.cpp is excluded from the
DynamicLoader, as the dynamic loader does not build with pthread.cpp
which semaphore.cpp uses.
Most of the string.h and wchar.h functions are implemented quite naively
at the moment, and GCC's pattern recognition pass might realize what we
are trying to do, and transform them into libcalls. This is usually a
useful optimization, but not when we're implementing the functions
themselves :^)
Relevant discussion from the GCC Bugzilla:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102725
This prevents the infamous recursive `strlen`.
A more proper fix would be writing these functions in assembly. That
would likely give a small performance boost as well ;)
This option sets -fprofile-instr-generate -fcoverage-mapping for Clang
builds only on almost all of Userland. Loader and LibTimeZone are
exempt. This can be used for generating code coverage reports, or even
PGO in the future.
This commit addresses the following shortcomings of our current, simple
and elegant memset function:
- REP STOSB/STOSQ has considerable startup overhead, it's impractical to
use for smaller sizes.
- Up until very recently, AMD CPUs didn't have support for "Enhanced REP
MOVSB/STOSB", so it performed pretty poorly on them.
With this commit applied, I could measure a ~5% decrease in `test-js`'s
runtime when I used qemu's TCG backend. The implementation is based on
the following article from Microsoft:
https://msrc-blog.microsoft.com/2021/01/11/building-faster-amd64-memset-routines
Two versions of the routine are implemented: one that uses the ERMS
extension mentioned above, and one that performs plain SSE stores. The
version appropriate for the CPU is selected at load time using an IFUNC.
LibTimeZone will be needed directly within LibC for functions such as
localtime(). This change adds LibTimeZone directly within LibC, so that
LibTimeZone isn't its own .so library anymore.
LibTimeZone itself is compiled as an object library to make it easier to
give it generator-specific compilation flags.
This commit updates the Clang toolchain's version to 13.0.0, which comes
with better C++20 support and improved handling of new features by
clang-format. Due to the newly enabled `-Bsymbolic-functions` flag, our
Clang binaries will only be 2-4% slower than if we dynamically linked
them, but we save hundreds of megabytes of disk space.
The `BuildClang.sh` script has been reworked to build the entire
toolchain in just three steps: one for the compiler, one for GNU
binutils, and one for the runtime libraries. This reduces the complexity
of the build script, and will allow us to modify the CI configuration to
only rebuild the libraries when our libc headers change.
Most of the compile flags have been moved out to a separate CMake cache
file, similarly to how the Android and Fuchsia toolchains are
implemented within the LLVM repo. This provides a nicer interface than
the heaps of command-line arguments.
We no longer build separate toolchains for each architecture, as the
same Clang binary can compile code for multiple targets.
The horrible mess that `SERENITY_CLANG_ARCH` was, has been removed in
this commit. Clang happily accepts an `i686-pc-serenity` target triple,
which matches what our GCC toolchain accepts.
Replace the old logic where we would start with a host build, and swap
all the CMake compiler and target variables underneath it to trick
CMake into building for Serenity after we configured and built the Lagom
code generators.
The SuperBuild creates two ExternalProjects, one for Lagom and one for
Serenity. The Serenity project depends on the install stage for the
Lagom build. The SuperBuild also generates a CMakeToolchain file for the
Serenity build to use that replaces the old toolchain file that was only
used for Ports.
To ensure that code generators are rebuilt when core libraries such as
AK and LibCore are modified, developers will need to direct their manual
`ninja` invocations to the SuperBuild's binary directory instead of the
Serenity binary directory.
This commit includes warning coalescing and option style cleanup for the
affected CMakeLists in the Kernel, top level, and runtime support
libraries. A large part of the cleanup is replacing USE_CLANG_TOOLCHAIN
with the proper CMAKE_CXX_COMPILER_ID variable, which will no longer be
confused by a host clang compiler.
GCC implements `fputc`, `fputs` and `fwrite` as builtin functions, whose
`FILE*` argument is implicitly marked `__attribute__((nonnull))`. This
causes our `VERIFY(stream)` statements to be removed. This does not
happen with Clang, as they do not use the `nonnull` attribute in this
way.
We have had these for quite a while, but we didn't compile them, and
used GCC's version instead. Clang does not come with these, so we have
to provide our own implementation.
Our implementation follows what `musl` and `FreeBSD` do, so this should
work fine, even if documentation can hardly be found for them.
The System V ABI for both x86 and x86_64 requires that the stack pointer
is 16-byte aligned on entry. Previously we did not align the stack
pointer properly.
As far as "main" was concerned the stack alignment was correct even
without this patch due to how the C++ _start function and the kernel
interacted, i.e. the kernel misaligned the stack as far as the ABI
was concerned but that misalignment (read: it was properly aligned for
a regular function call - but misaligned in terms of what the ABI
dictates) was actually expected by our _start function.
Using LibELF to do the initial relocations doesn't work when building
SerenityOS with Clang. We seem to be accessing a global symbol that
hasn't been relocated yet somewhere along the path to
ELF::DynamicObject::create().
Take Kernel/UBSanitizer.cpp and make a copy in LibSanitizer.
We can use LibSanitizer to hold other sanitizers as people implement
them :^).
To enable UBSAN for LibC, DynamicLoader, and other low level system
libraries, LibUBSanitizer is built as a serenity_libc, and has a static
version for LibCStatic to use. The approach is the same as that taken in
Note that this means now UBSAN is enabled for code generators, Lagom,
Kernel, and Userspace with -DENABLE_UNDEFINED_SANTIZER=ON. In userspace
however, UBSAN is not deadly (yet).
Co-authored-by: ForLoveOfCats <ForLoveOfCats@vivaldi.net>
This links the dynamic linker against libgcc.a instead of having
our own copy of the math functions.
For now we need to specify -fbuilding-libgcc as a hack to work
around a bug with the -nodefaultlibs flag. Once everyone is on
the latest toolchain version this can be removed.
math.cpp: In function 'int64_t __moddi3(int64_t, int64_t)':
math.cpp:168:13: error: 'r' may be used uninitialized
[-Werror=maybe-uninitialized]
168 | return ((int64_t)r ^ s) - s; // negate if s == -1
| ^~~~~~~~~~
In a1720eed2a I added this new test,
but missed that there were already some "unit tests" for LibC over
in Userland/Tests/LibC. So lets unify these two locations.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
GCC will insert various calls to pthread functions when compiling
C++ code with static initializers, even when the user doesn't link
their program against libpthread explicitly.
This is used to make static initializers thread-safe, e.g. when
building a library that does not itself use thread functionality
and thus does not link against libpthread - but is intended to
be used with other code that does use libpthread explicitly.
This makes these symbols available in libc.
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
This achieves two things:
- Programs can now intentionally perform arbitrary syscalls by calling
syscall(). This allows us to work on things like syscall fuzzing.
- It restricts the ability of userspace to make syscalls to a single
4KB page of code. In order to call the kernel directly, an attacker
must now locate this page and call through it.