The previous solution of "lol whats a UB" was not nice and tripped over
itself when it was run under UBSAN, fix this by doing explicit
byte-by-byte reads where needed.
If set, this callback gets called right after fork() in the child
process.
It can be used by the caller if it wants to perform some logic in the
child process before it starts executing the debuggee program.
The format of the address range section is different between DWARF
version 4 and version 5. This meant that we parsed programs compiled
with `-gdwarf-4` incorrectly.
The parameter of this operator is an unsigned LEB128 integer, so it can
be more than 1 byte in length. If we only read 1 byte, we might mess up
the offsets for the instructions following it.
ProcessInspector is an abstract base class for an object that can
inspect the address space of a process.
Concrete sub classes need to implement methods for peeking & poking
memory and walking the loaded libraries.
It is currently only implemented by DebugSession.
Also add slightly richer parse errors now that we can include a string
literal with returned errors.
This will allow us to use TRY() when working with JSON data.
Our implementation (naively) assumes that there is a one-to-one
correspondence between compilation units and line programs, and that
their orders are identical. This is not the case for embedded resources,
as Clang only creates line programs for it, but not compilation units.
This mismatch caused an assertion failure, which made generating
backtraces for GUI applications impossible. This commit introduces a
hack that skips creating CompilationUnit objects for LinePrograms that
come from embedded resources.
These forms were introduced in DWARF5, and have a fair deal of
advantages over the more traditional encodings: they reduce the size of
the binary and the number of relocations.
GCC does not emit these with `-g1` by default, but Clang does at all
debug levels.
This will be needed when we add `DW_FORM_strx*` and `DW_FORM_addrx*`
support, which requires us to fetch `DW_AT_str_offsets_base` and
`DW_AT_addr_base` attributes from the parent compilation unit. This
can't be done as we read the values, because it would create infinite
recursion (as we might try to parse the compilation unit's
`DW_FORM_strx*` encoded name before we get to the attribute). Having
getters ensures that we will perform lookups if they are needed.
Previously, we only supported DIEs with a contiguous address ranges and
ignored ones with a non-contiguous set of ranges.
We now check if a DIE has the DW_AT_ranges attribute, and if it does we
parse its range list.
This improves the quality of our backtraces - we previously missed many
inlined function calls because their DIEs had non-contigues address
ranges.
This adds support for parsing DWARF "range lists", which are identified
by the DW_AT_ranges form.
They contain code addresses for DIEs whose location is not contiguous.
These API's are used in a variety of ways when building the die cache.
Each AbbreviationEntry has vector and other members, so avoid copying
it at all costs.
This algorithm is both iterative and recursive, so allocating on every
recursion, or when iterating each child is extremely costly.
Instead allow the on stack DIE to be re-initialized so it can be reused.
When parsing the libraries of the debugee process, we previously
assumed that the region that's called `<library name>: .text` was also
the base of the ELF file.
However, since we started linking with `-z separate-code`, this is no
longer the case - our executables have a read-only segment before the
.text segment, and that segment is actually at the base of the ELF.
This broke inserting breakpoints with the debugger since they were
inserted at a wrong offset.
To fix that, we now use the address of the first segment in the memory
map for the ELF's base address (The memory map is sorted by address).
This helps us avoid weird truncation issues and fixes a bug on Clang
builds where truncation while reading caused the DIE offsets following
large LEB128 numbers to be incorrect. This removes the need for the
separate `LongUnsignedNumber` type.
This LineProgram instruction is emitted by Clang. Although we currently
have no use for it (it's mostly a debugger feature), we need to handle
this opcode, as otherwise CrashReporter wouldn't work.
This commit makes LibRegex (mostly) capable of operating on any of
the three main string views:
- StringView for raw strings
- Utf8View for utf-8 encoded strings
- Utf32View for raw unicode strings
As a result, regexps with unicode strings should be able to properly
handle utf-8 and not stop in the middle of a code point.
A future commit will update LibJS to use the correct type of string
depending on the flags.
The LexicalPath instance methods dirname(), basename(), title() and
extension() will be changed to return StringView const& in a further
commit. Due to this, users creating temporary LexicalPath objects just
to call one of those getters will recieve a StringView const& pointing
to a possible freed buffer.
To avoid this, static methods for those APIs have been added, which will
return a String by value to avoid those problems. All cases where
temporary LexicalPath objects have been used as described above haven
been changed to use the static APIs.
This function returns the source position of a given address in the
program. If that address exists in an inline chain, then it also returns
the source positions that are in the chain.
This function returns the die object whose address range intersects
with the given address.
This function will also construct the DIE cache, if it hasn't been
constructed yet.
There is one cache that indexes DIE objects by the start address of
their range, and another cache that indexes by their offset in the
debug_info section.
Both caches are implemented with RedBlackTree, and are optional - they
will only be populated if 'build_cached_dies' is invoked.
In the current implementation, only DIE objects that are created via
DIE::for_each_child() will have parent offsets.
DIE objects that are created with CompilationUnit::get_die_at_offset()
do not currently store a parent offset.
We may improve this in the future, but this is enough for what we
currently need.
In some contexts, it's helpful to also know the "Attribute Form",
in addition to the "Attribute Type".
An example for such context is the interpretation of the
"DW_AT_high_pc" attribute, which has different meaning if the form
is an address or a constant.