LibWeb: Fix numeric character reference at EOF leaking its last digit

Previously, if the NumericCharacterReferenceEnd state was reached when
current_input_character was None, then the
DONT_CONSUME_NEXT_INPUT_CHARACTER macro would restore back before the
EOF, and allow the next state (after the SWITCH_TO_RETURN_STATE) to
proceed with the last digit of the numeric character reference.

For example, with something like `&#1111`, before this commit the
output would incorrectly be `<code point with the value 1111>1` instead
of just `<code point with the value 1111>`.

Instead of putting the `if (current_input_character.has_value())` check
inside NumericCharacterReferenceEnd directly, it was instead added to
DONT_CONSUME_NEXT_INPUT_CHARACTER, because all usages of the macro
benefit from this check, even if the other existing usage sites don't
exhibit any bugs without it:

- In MarkupDeclarationOpen, if the current_input_character is EOF, then
  the previous character is always `!`, so restoring and then checking
  forward for strings like `--`, `DOCTYPE`, etc won't match and the
  BogusComment state will run one extra time (once for `!` and once
  for EOF) with no practical consequences. With the `has_value()` check,
  BogusComment will only run once with EOF.

- In AfterDOCTYPEName, ConsumeNextResult::RanOutOfCharacters can only
  occur when stopping at the insertion point, and because of how
  the code is structured, it is guaranteed that current_input_character
  is either `P` or `S`, so the `has_value()` check is irrelevant.
This commit is contained in:
Ryan Liptak 2024-12-20 06:05:37 -08:00 committed by Jelle Raaijmakers
parent 752deaf6ef
commit df87a9689c
Notes: github-actions[bot] 2025-01-06 23:44:49 +00:00
3 changed files with 15 additions and 6 deletions

View file

@ -94,9 +94,10 @@ namespace Web::HTML {
} \ } \
} while (0) } while (0)
#define DONT_CONSUME_NEXT_INPUT_CHARACTER \ #define DONT_CONSUME_NEXT_INPUT_CHARACTER \
do { \ do { \
restore_to(m_prev_utf8_iterator); \ if (current_input_character.has_value()) \
restore_to(m_prev_utf8_iterator); \
} while (0) } while (0)
#define ON(code_point) \ #define ON(code_point) \

View file

@ -199,6 +199,15 @@ TEST_CASE(character_reference_in_attribute)
END_ENUMERATION(); END_ENUMERATION();
} }
TEST_CASE(numeric_character_reference)
{
auto tokens = run_tokenizer("&#1111"sv);
BEGIN_ENUMERATION(tokens);
EXPECT_CHARACTER_TOKEN(1111);
EXPECT_END_OF_FILE_TOKEN();
END_ENUMERATION();
}
TEST_CASE(comment) TEST_CASE(comment)
{ {
auto tokens = run_tokenizer("<p><!-- This is a comment --></p>"sv); auto tokens = run_tokenizer("<p><!-- This is a comment --></p>"sv);

View file

@ -2,8 +2,7 @@ Harness status: OK
Found 63 tests Found 63 tests
62 Pass 63 Pass
1 Fail
Pass html5lib_tests2.html e070301fb578bd639ecbc7ec720fa60222d05826 Pass html5lib_tests2.html e070301fb578bd639ecbc7ec720fa60222d05826
Pass html5lib_tests2.html aaf24dabcb42470e447d241a40def0d136c12b93 Pass html5lib_tests2.html aaf24dabcb42470e447d241a40def0d136c12b93
Pass html5lib_tests2.html b6c1142484570bb90c36e454ee193cca17bb618a Pass html5lib_tests2.html b6c1142484570bb90c36e454ee193cca17bb618a
@ -27,7 +26,7 @@ Pass html5lib_tests2.html 73b97cd984a62703ec54ec4a876ec32aa5fd3b8c
Pass html5lib_tests2.html 2db9616ed62fc2a26056f3395459869cf556974d Pass html5lib_tests2.html 2db9616ed62fc2a26056f3395459869cf556974d
Pass html5lib_tests2.html b59aa1c714892618eaccd51696658887fcbd2045 Pass html5lib_tests2.html b59aa1c714892618eaccd51696658887fcbd2045
Pass html5lib_tests2.html 98818e7fda2506603bd208662613edb40297c2d3 Pass html5lib_tests2.html 98818e7fda2506603bd208662613edb40297c2d3
Fail html5lib_tests2.html e0c43080cf61c0696031bdb097bea4f2a647cfc2 Pass html5lib_tests2.html e0c43080cf61c0696031bdb097bea4f2a647cfc2
Pass html5lib_tests2.html f7753d80a422c40b5fa04d99e52d8ae83369757a Pass html5lib_tests2.html f7753d80a422c40b5fa04d99e52d8ae83369757a
Pass html5lib_tests2.html 7cbd584aef9508a90c98f80040078149a92ec869 Pass html5lib_tests2.html 7cbd584aef9508a90c98f80040078149a92ec869
Pass html5lib_tests2.html e0f7f130b1e3653dd06f10f3492e4f0bf4cd3cfa Pass html5lib_tests2.html e0f7f130b1e3653dd06f10f3492e4f0bf4cd3cfa