This makes RegExpObject compile and store a Regex<ECMA262>, adds
all flag-related properties, and implements `RegExpPrototype.test()`
(complete with 'lastIndex' support) :^)
It should be noted that this only implements `test()' using the builtin
`exec()'.
To allow storing unicode ranges compactly; this is not utilised at the
moment, but changing this later would've been significantly more
difficult.
Also fixes a few debug logs.
Grep supports only extended regular expressions, and is able to handle one pattern
handed over via -e or directly after the options. Also, multiple files can be
handed over. Recursive mode is outstanding, but no real magic :^)
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
We already have a wrap() in EventWrapperFactory.cpp. It would be nice
to generate that at some point but it will require a lot more work on
the wrapper generator.
Inline layout nodes cannot have block children (except inline-block,
of course.)
When encountering a block box child of an inline, we now hoist the
block up to the inline's containing block, and also wrap any preceding
inline siblings in an anonymous wrapper block.
This improves the ACID2 situation quite a bit (although we still need
floats to really bring it home.)
I also took this opportunity to move all tree building logic into
Layout::TreeBuilder, to continue the theme of absolving our LayoutNode
objects of responsibilities. :^)
Rather than waiting until we get the first mouse packet, enable the
absolute mode immediately. This avoids having to click first to be
able to move the mouse.
CodeQL is a static analysis technology that was purchased by GitHub
and has been tightly integrated into the platform. It's different
from most other static analysis solutions because it's based on a
database built from your codebase, and then language specific rules
can be executed against that database. The rules are fully user
extensible, and are written in a datalog/query language.
The default cpp language rules coming from CodeQL will probably find
some issues, the ability to easily write custom rules/queries will
lend it self nicely to allowing us to validate Serenity specific
semantics are followed throughout the code.
References:
- https://www.youtube.com/watch?v=AMzGorD28Ks
- https://securitylab.github.com/tools/codeql
The Lexer constructor calls consume() once, which initializes m_position
to be > 0 and sets m_character. consume() calls is_line_terminator(),
which wasn't accounting for this state.
When creating a StringImpl for a C string that starts with a null-byte,
we would ignore the explicitly given length and return the empty
StringImpl - presumably to check for "\0", but this leads to false
positives ("\0foo") so let's only care about the length.
This was broken with the JS::Parser::Error position changes, but I don't
actually see a reason to do anything with the parser errors here, so
let's remove it and consider simply not crashing a success. :^)
This patch adds a 128-byte inline buffer that we use before switching
to using a dynamically growing ByteBuffer.
This allows us to avoid heap allocations in many cases, and totally
incidentally also speeds up @nico's favorite test, "disasm /bin/id"
more than 2x. :^)
If the percentage is 100, we were trying to get the heat gradient pixel
at (100, 0), which was one pixel past the end. Fix this by making the
heat gradient 101 pixels wide :^)
The implementation uses atomics and futexes (yay!) and is heavily based on the
implementation I did for my learning project named "Let's write synchronization
primitives" [0].
That project, in fact, started when I tried to implement pthread_once() for
Serenity (because it was needed for another project of mine, stay tuned ;) ) and
was not very sure I got every case right. So now, after learning some more about
code patterns around atomics and futexes, I am reasonably sure, and it's time to
contribute the implementation of pthread_once() to Serenity :^)
[0] To be published at https://github.com/bugaevc/lets-write-sync-primitives
If a receiver is given, e.g. via Reflect.get/set(), forward it to the
target object's get()/put() or use it as last argument of the trap
function. The default value is the Proxy object itself.
We need to not only add a record for a reference, but we need
to copy the reference count on fork as well, because the code
in the fork assumes that it has the same amount of references,
still.
Also, once all references are dropped when a process is disowned,
delete the shared buffer.
Fixes#4076
This makes misses in the BlockBasedFS's LRU block cache faster by
storing the cache entries in one of two doubly-linked list.
Dirty and clean cache entries are kept in two separate lists, and
move between them when their state changes. This can probably be
improved upon further.
If the inode's block list cache is empty, we forgot to assign the
result of computing the block list. The fact that this worked anyway
makes me wonder when we actually don't have a cache..
Thanks to szyszkienty for spotting this! :^)
Instead of doing a linear scan of the entire cache when doing a lookup,
we now have a nice O(1) HashMap in front of the cache.
The cache miss case can still be improved, this patch really only helps
the cache hit case.
This dramatically improves cached filesystem I/O. :^)