Problem:
- `constexpr` functions are decorated with the `inline` specifier
keyword. This is redundant because `constexpr` functions are
implicitly `inline`.
- [dcl.constexpr], §7.1.5/2 in the C++11 standard): "constexpr
functions and constexpr constructors are implicitly inline (7.1.2)".
Solution:
- Remove the redundant `inline` keyword.
This reduces malloc()/free() calls in `disasm /bin/id` by 30%
according to LIBC_DUMP_MALLOC_STATS.
No measurable performance change (the number of empty block hits
remains unchanged, and that's what's slow), but maybe a nice
change regardless?
Don't require clients to templatize modrm().read{8,16,32,64}() with
the ValueWithShadow type when we can figure it out automatically.
The main complication here is that ValueWithShadow is a UE concept
while the MemoryOrRegisterReference inlines exist at the lower LibX86
layer and so doesn't have direct access to those types. But that's
nothing we can't solve with some simple template trickery. :^)
This is useful for reading and writing doubles for #3329.
It is also useful for emulating 64-bit binaries.
MemoryOrRegisterReference assumes that 64-bit values are always
memory references since that's enough for fpu support. If we
ever want to emulate 64-bit binaries, that part will need minor
updating.
Instruction::to_string used to copy a string literal into a String,
and then the String into a StringBuilder. Copy it to the StringBuilder
directly.
No measurable performance benefit, but it's also less code.
From a layering perspective, it's maybe a bit surprising that the
X86::SymbolProvider implementation also lives in LibX86, but since
everything depends on LibELF via LibC, and since all current
LibX86-based disassemblers want to use ELFSymbolProvider, it makes
some amount of sense to put it there.
Some of the remaining instructions have different behavior for
register and non-register ops. Since we already have the
two-level flags tables, model this by setting all handlers in
the two-level table to the register op handler, while the
first-level flags table stores the action for the non-reg handler.
Some of these don't just use the REG bits of the mod/rm byte
as slashes, but also the R/M bits to have up to 9 different
instructions per opcode/slash combination (1 opcode requires
that MOD is != 11, the other 8 have MODE == 11).
This is done by making the slashes table two levels deep for
these cases.
Some of this is cosmetic (e.g "FST st0" has no effect already,
but its bit pattern gets disassembled as "FNOP"), but for
most uses it isn't.
FSTENV and FSTCW have an extraordinary 0x9b prefix. This is
not yet handled in this patch.
This patch introduces the concept of shadow bits. For every byte of
memory there is a corresponding shadow byte that contains metadata
about that memory.
Initially, the only metadata is whether the byte has been initialized
or not. That's represented by the least significant shadow bit.
Shadow bits travel together with regular values throughout the entire
CPU and MMU emulation. There are two main helper classes to facilitate
this: ValueWithShadow and ValueAndShadowReference.
ValueWithShadow<T> is basically a struct { T value; T shadow; } whereas
ValueAndShadowReference<T> is struct { T& value; T& shadow; }.
The latter is used as a wrapper around general-purpose registers, since
they can't use the plain ValueWithShadow memory as we need to be able
to address individual 8-bit and 16-bit subregisters (EAX, AX, AL, AH.)
Whenever a computation is made using uninitialized inputs, the result
is tainted and becomes uninitialized as well. This allows us to track
this state as it propagates throughout memory and registers.
This patch doesn't yet keep track of tainted flags, that will be an
important upcoming improvement to this.
I'm sure I've messed up some things here and there, but it seems to
basically work, so we have a place to start! :^)
The a32 bit tells us whether a memory address is 32-bit or not.
We already have this information in Instruction, so just plumb that
around instead of double-caching the bit.
Use some template hacks to force GCC to inline more of the instruction
decoding stuff into the UserspaceEmulator main execution loop.
This is my last optimization for today, and we've gone from ~60 seconds
when running "UserspaceEmulator UserspaceEmulator id" to ~8 seconds :^)
Since this code is performance-sensitive, let's have the compiler do
whatever it can to help us with the most important files.
This yields a ~8% speedup.
This patch adds a PartAddressableRegister type, which divides a 32-bit
value into separate parts needed for the EAX/AX/AL/AH register splits.
Clean up the code around register access to make it a little less
cumbersome to use.
This patch adds a pure virtual X86::SymbolProvider that can be passed
to Instruction::to_string(). If the instruction contains what appears
to be a program address, stringification will try to symbolicate that
address via the SymbolProvider.
This makes it possible (and very flexible) to add symbolication to
clients of the disassembler. :^)