serenity/AK/FloatingPointStringConversions.h

88 lines
3.6 KiB
C
Raw Normal View History

AK: Add an exact and fast floating point parsing algorithm This is based on the paper by Daniel Lemire called "Number parsing at a Gigabyte per second", currently available at https://arxiv.org/abs/2101.11408 An implementation can be found at https://github.com/fastfloat/fast_float To support both strtod like methods and String::to_double we have two different APIs. The parse_first_floating_point gives back both the result, next character to read and the error/out of range status. Out of range here means we rounded to infinity 0. The other API, parse_floating_point_completely, will return a floating point only if the given character range contains just the floating point and nothing else. This can be much faster as we can skip actually computing the value if we notice we did not parse the whole range. Both of these APIs support a very lenient format to be usable in as many places as possible. Also it does not check for "named" values like "nan", "inf", "NAN" etc. Because this can be different for every usage. For integers and small values this new method is not faster and often even a tiny bit slower than the current strtod implementation. However the strtod implementation is wrong for a lot of values and has a much less predictable running time. For correctness this method was tested against known string -> double datasets from https://github.com/nigeltao/parse-number-fxx-test-data This method gives 100% accuracy. The old strtod gave an incorrect value in over 50% of the numbers tested.
2022-10-12 20:18:56 -04:00
/*
* Copyright (c) 2022, David Tuin <davidot@serenityos.org>
*
* SPDX-License-Identifier: BSD-2-Clause
*/
#pragma once
#ifdef KERNEL
# error This file should not be included in the KERNEL as it deals with doubles \
and there is no guraantee does not do any floating point computations.
#endif
#include <AK/StringView.h>
namespace AK {
static constexpr char floating_point_decimal_separator = '.';
enum class FloatingPointError {
None,
NoOrInvalidInput,
OutOfRange,
RoundedDownToZero
};
template<FloatingPoint T>
struct FloatingPointParseResults {
char const* end_ptr { nullptr };
FloatingPointError error = FloatingPointError::None;
T value {};
[[nodiscard]] bool parsed_value() const
{
// All other errors do indicate out of range but did produce a valid value.
return error != FloatingPointError::NoOrInvalidInput;
}
};
/// This function finds the first floating point within [start, end). The accepted format is
/// intentionally as lenient as possible. If your format is stricter you must validate it
/// first. The format accepts:
/// - An optional sign, both + and - are supported
/// - 0 or more decimal digits, with leading zeros allowed [1]
/// - A decimal point '.', which can have no digits after it
/// - 0 or more decimal digits, unless the first digits [1] doesn't have any digits,
/// then this must have at least one
/// - An exponent 'e' or 'E' followed by an optional sign '+' or '-' and at least one digit
/// This function additionally detects out of range values which have been rounded to
/// [-]infinity or 0 and gives the next character to read after the floating point.
template<FloatingPoint T = double>
FloatingPointParseResults<T> parse_first_floating_point(char const* start, char const* end);
/// This function finds the first floating point starting at start up to the first '\0'.
/// The format is identical to parse_first_floating_point above.
template<FloatingPoint T = double>
FloatingPointParseResults<T> parse_first_floating_point_until_zero_character(char const* start);
/// This function will return either a floating point, or an empty optional if the given StringView
/// does not a floating point or contains more characters beyond the floating point. For the format
/// check the comment on parse_first_floating_point.
template<FloatingPoint T = double>
Optional<T> parse_floating_point_completely(char const* start, char const* end);
/// This function finds the first floating point as a hex float within [start, end).
/// The accepted format is intentionally as lenient as possible. If your format is
/// stricter you must validate it first. The format accepts:
/// - An optional sign, both + and - are supported
/// - Optionally either 0x or OX
/// - 0 or more hexadecimal digits, with leading zeros allowed [1]
/// - A decimal point '.', which can have no digits after it
/// - 0 or more hexadecimal digits, unless the first digits [1] doesn't have any digits,
/// then this must have at least one
/// - An exponent 'p' or 'P' followed by an optional sign '+' or '-' and at least one decimal digit
/// NOTE: The exponent is _not_ hexadecimal and gives powers of 2 not 16.
/// This function additionally detects out of range values which have been rounded to
/// [-]infinity or 0 and gives the next character to read after the floating point.
template<FloatingPoint T = double>
FloatingPointParseResults<T> parse_first_hexfloat_until_zero_character(char const* start);
AK: Add an exact and fast floating point parsing algorithm This is based on the paper by Daniel Lemire called "Number parsing at a Gigabyte per second", currently available at https://arxiv.org/abs/2101.11408 An implementation can be found at https://github.com/fastfloat/fast_float To support both strtod like methods and String::to_double we have two different APIs. The parse_first_floating_point gives back both the result, next character to read and the error/out of range status. Out of range here means we rounded to infinity 0. The other API, parse_floating_point_completely, will return a floating point only if the given character range contains just the floating point and nothing else. This can be much faster as we can skip actually computing the value if we notice we did not parse the whole range. Both of these APIs support a very lenient format to be usable in as many places as possible. Also it does not check for "named" values like "nan", "inf", "NAN" etc. Because this can be different for every usage. For integers and small values this new method is not faster and often even a tiny bit slower than the current strtod implementation. However the strtod implementation is wrong for a lot of values and has a much less predictable running time. For correctness this method was tested against known string -> double datasets from https://github.com/nigeltao/parse-number-fxx-test-data This method gives 100% accuracy. The old strtod gave an incorrect value in over 50% of the numbers tested.
2022-10-12 20:18:56 -04:00
}
#if USING_AK_GLOBALLY
AK: Add an exact and fast floating point parsing algorithm This is based on the paper by Daniel Lemire called "Number parsing at a Gigabyte per second", currently available at https://arxiv.org/abs/2101.11408 An implementation can be found at https://github.com/fastfloat/fast_float To support both strtod like methods and String::to_double we have two different APIs. The parse_first_floating_point gives back both the result, next character to read and the error/out of range status. Out of range here means we rounded to infinity 0. The other API, parse_floating_point_completely, will return a floating point only if the given character range contains just the floating point and nothing else. This can be much faster as we can skip actually computing the value if we notice we did not parse the whole range. Both of these APIs support a very lenient format to be usable in as many places as possible. Also it does not check for "named" values like "nan", "inf", "NAN" etc. Because this can be different for every usage. For integers and small values this new method is not faster and often even a tiny bit slower than the current strtod implementation. However the strtod implementation is wrong for a lot of values and has a much less predictable running time. For correctness this method was tested against known string -> double datasets from https://github.com/nigeltao/parse-number-fxx-test-data This method gives 100% accuracy. The old strtod gave an incorrect value in over 50% of the numbers tested.
2022-10-12 20:18:56 -04:00
using AK::parse_first_floating_point;
using AK::parse_first_hexfloat_until_zero_character;
AK: Add an exact and fast floating point parsing algorithm This is based on the paper by Daniel Lemire called "Number parsing at a Gigabyte per second", currently available at https://arxiv.org/abs/2101.11408 An implementation can be found at https://github.com/fastfloat/fast_float To support both strtod like methods and String::to_double we have two different APIs. The parse_first_floating_point gives back both the result, next character to read and the error/out of range status. Out of range here means we rounded to infinity 0. The other API, parse_floating_point_completely, will return a floating point only if the given character range contains just the floating point and nothing else. This can be much faster as we can skip actually computing the value if we notice we did not parse the whole range. Both of these APIs support a very lenient format to be usable in as many places as possible. Also it does not check for "named" values like "nan", "inf", "NAN" etc. Because this can be different for every usage. For integers and small values this new method is not faster and often even a tiny bit slower than the current strtod implementation. However the strtod implementation is wrong for a lot of values and has a much less predictable running time. For correctness this method was tested against known string -> double datasets from https://github.com/nigeltao/parse-number-fxx-test-data This method gives 100% accuracy. The old strtod gave an incorrect value in over 50% of the numbers tested.
2022-10-12 20:18:56 -04:00
using AK::parse_floating_point_completely;
#endif