That's what this class really is; in fact that's what the first line of
the comment says it is.
This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
The Multiboot header stores the framebuffer's pitch in bytes, so
multiplying it by the pixel's size is not necessary. We ended up
allocating 4 times as much memory as needed, which caused us to overlap
the MMIO reserved memory area on the Raspberry Pi.
Rename the initialize method to initialize_virtio_resources so it's
clear what this method is intended for.
To ensure healthier device initialization, we could also return the type
of ErrorOr<void> from this method, so in all overriden instances and in
the original method code, we could leverage TRY() pattern which also
does simplify the code a bit.
This is done mainly by implementing safe locking on the data structure
keeping the pointers to the PerContextState objects. Therefore, this now
eliminates the need for using LockRefPtr, as SpinlockProtected is enough
for the whole list.
The usage of HashMap in this class was questionable, and according to
Sahan Fernando (the original contributor to the VirGL work also known as
ccapitalK) there was no deep research on which data structure to use for
keeping all pointers to PerContextState objects.
Therefore, this structure is changed to IntrusiveList as the main reason
and advantage to use it is that handling OOM conditions is much more
simple, because if we succeeded to create a PerContextState object, we
can be sure now that inserting it to the list will not cause OOM error
condition.
To do this we also need to get rid of LockRefPtrs in the USB code as
well.
Most of the SysFS nodes are statically generated during boot and are not
mutated afterwards.
The same goes for general device code - once we generate the appropriate
SysFS nodes, we almost never mutate the node pointers afterwards, making
locking unnecessary.
It appeared that we sometimes failed to invoke synchronous commands on
the GPU. To temporarily fix this, wait 10 milliseconds for commands to
complete before failing.
According to the specification, modesetting can be invoked with no need
for flushing the framebuffer nor with DMA to transfer the framebuffer
rendering.
Before doing a check if offset_in_region + num_bytes of the transfer
descriptor are together more than NUM_TRANSFER_REGION_PAGES * PAGE_SIZE,
check that addition of both of these parameters will not simply overflow
which could lead to out-of-bounds read/write.
Fixes#17518.
Dealing with the specific details of how to program a PLL should be done
in a separate file to ensure we can easily expand it to support future
generations of the Intel graphics device.
ENODEV better represents the fact that there might be no display device
(e.g. a monitor) connected to the connector, therefore we should return
this error.
Another reason to not use ENOTIMPL is that it's a requirement for all
DisplayConnectors to put a valid EDID in place even for a hardware we
don't currently support mode-setting in runtime.
Instead of doing that on the IntelDisplayPlane class, let's have this in
derived classes so these classes can decide how to use the settings that
were provided before calling the enable method.
It became apparent to me that future generations of the Intel graphics
chipset utilize the same register set as part of the Transcoder register
set. Therefore, it should be included now in the Transcoder class.
In the real world, graphics hardware tend to have multiple display
connectors. However, usually the connectors share one register space but
still keeping different PLL timings and display lanes.
This new class should represent a group of multiple display connectors
working together in the same Intel graphics adapter. This opens an
opportunity to abstract the interface so we could support future Intel
iGPU generations.
This is also a preparation before the driver can support newer devices
and utilize their capabilities.
The mentioned preparation is applied in a these aspects:
1. The code is splitted into more classes to adjust to future expansion.
2 classes are introduced: IntelDisplayPlane and IntelDisplayTranscoder,
so the IntelDisplayPlane controls the plane registers and second class
controls the pipeline (transcoder, encoder) registers. On gen4 it's not
really useful because there are probably one plane and one encoder to
care about, but in future generations, there are likely to be multiple
transcoders and planes to accommodate multi head support.
2. The set_edid_bytes method in the DisplayConnector class can now be
told to not assume the provided EDID bytes are always invalid. Therefore
it can refrain from printing error messages if this flag parameter is
true. This is useful for supporting real hardware situation when on boot
not all ports are connected to a monitor, which can result in floating
bus condition (essentially all the bytes we read are 0xFF).
3. An IntelNativeDisplayConnector could now be set to flag other types
of connections such as eDP (embedded DisplayPort), Analog output, etc.
This is important because on the Intel gen4 graphics we could assume to
have one analog output connector, but on future generations this is very
likely to not be the case, as there might be no VGA outputs, but rather
only an eDP connector which is converted to VGA by a design choice of
the motherboard manufacturer.
4. Add ConnectorIndex to IntelNativeDisplayConnector class - Currently
this is used to verify we always handle the correct connector when doing
modesetting.
Later, it will be used to locate special settings needed when handling
connector requests.
5. Prepare to support more types of display planes. For example, the
Intel Skylake register set for display planes is a bit different, so
let's ensure we can properly support it in the near future.
Returning literal strings is not the proper action here, because we
should always assume that error could be propagated back to userland, so
we need to keep a valid errno when returning an Error.
Splitting the I2C-related code lets the DisplayConnector code to utilize
I2C operations without caring about the specific details of the hardware
and allow future expansion of the driver to other newer generations
sharing the same GMBus code.
We should require a timeout for GMBus operations always, because faulty
hardware could let us just spin forever. Also, if nothing is listening
to the bus (which should result in a NAK), we could also spin forever.
Thanks to Andrew Kaster, which gave a review back in October, about a
big PR I opened (#15502), I managed to figure out why we always had a
problem with the first byte being read into the EDID buffer with the
GMBus code. It turns out that this simple invalid cast was making the
entire problem and using the correct AK::Array::data() method fixed this
notorious long standing problem for good.
There are now 2 separate classes for almost the same object type:
- EnumerableDeviceIdentifier, which is used in the enumeration code for
all PCI host controller classes. This is allowed to be moved and
copied, as it doesn't support ref-counting.
- DeviceIdentifier, which inherits from EnumerableDeviceIdentifier. This
class uses ref-counting, and is not allowed to be copied. It has a
spinlock member in its structure to allow safely executing complicated
IO sequences on a PCI device and its space configuration.
There's a static method that allows a quick conversion from
EnumerableDeviceIdentifier to DeviceIdentifier while creating a
NonnullRefPtr out of it.
The reason for doing this is for the sake of integrity and reliablity of
the system in 2 places:
- Ensure that "complicated" tasks that rely on manipulating PCI device
registers are done in a safe manner. For example, determining a PCI
BAR space size requires multiple read and writes to the same register,
and if another CPU tries to do something else with our selected
register, then the result will be a catastrophe.
- Allow the PCI API to have a united form around a shared object which
actually holds much more data than the PCI::Address structure. This is
fundamental if we want to do certain types of optimizations, and be
able to support more features of the PCI bus in the foreseeable
future.
This patch already has several implications:
- All PCI::Device(s) hold a reference to a DeviceIdentifier structure
being given originally from the PCI::Access singleton. This means that
all instances of DeviceIdentifier structures are located in one place,
and all references are pointing to that location. This ensures that
locking the operation spinlock will take effect in all the appropriate
places.
- We no longer support adding PCI host controllers and then immediately
allow for enumerating it with a lambda function. It was found that
this method is extremely broken and too much complicated to work
reliably with the new paradigm being introduced in this patch. This
means that for Volume Management Devices (Intel VMD devices), we
simply first enumerate the PCI bus for such devices in the storage
code, and if we find a device, we attach it in the PCI::Access method
which will scan for devices behind that bridge and will add new
DeviceIdentifier(s) objects to its internal Vector. Afterwards, we
just continue as usual with scanning for actual storage controllers,
so we will find a corresponding NVMe controllers if there were any
behind that VMD bridge.
This header has always been fundamentally a Kernel API file. Move it
where it belongs. Include it directly in Kernel files, and make
Userland applications include it via sys/ioctl.h rather than directly.
Instead of using a clunky switch-case paradigm, we now have all drivers
being declaring two methods for their adapter class - create and probe.
These methods are linked in each PCIGraphicsDriverInitializer structure,
in a new s_initializers static list of them.
Then, when we probe for a PCI device, we use each probe method and if
there's a match, then the corresponding create method is called.
As a result of this change, it's much more easy to add more drivers and
the initialization code is more readable.
We try our best to ensure a DisplayConnector initialization succeeds,
and this makes the Intel driver to work again, because if we can't
allocate a Region for the whole PCI BAR mapped region, then we will try
to allocate a Region with 16 MiB window size, so it doesn't eat the
entire Kernel-allocated virtual memory space.
Instead of just returning nothing, let's return Error or nothing.
This would help later on with error propagation in case of failure
during this method.
This also makes us more paranoid about failure in this method, so when
initializing a DisplayConnector we safely tear down the internal members
of the object. This applies the same for a StorageDevice object, but its
after_inserting method is much smaller compared to the DisplayConnector
overriden method.
A virtual method named device_name() was added to
Kernel::PCI to support logging the PCI::Device name
and address using dmesgln_pci. Previously, PCI::Device
did not store the device name.
All devices inheriting from PCI::Device now use dmesgln_pci where
they previously used dmesgln.
These instances were detected by searching for files that include
Kernel/Debug.h, but don't match the regex:
\\bdbgln_if\(|_DEBUG\\b
This regex is pessimistic, so there might be more files that don't check
for any real *_DEBUG macro. There seem to be no corner cases anyway.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
This step would ideally not have been necessary (increases amount of
refactoring and templates necessary, which in turn increases build
times), but it gives us a couple of nice properties:
- SpinlockProtected inside Singleton (a very common combination) can now
obtain any lock rank just via the template parameter. It was not
previously possible to do this with SingletonInstanceCreator magic.
- SpinlockProtected's lock rank is now mandatory; this is the majority
of cases and allows us to see where we're still missing proper ranks.
- The type already informs us what lock rank a lock has, which aids code
readability and (possibly, if gdb cooperates) lock mismatch debugging.
- The rank of a lock can no longer be dynamic, which is not something we
wanted in the first place (or made use of). Locks randomly changing
their rank sounds like a disaster waiting to happen.
- In some places, we might be able to statically check that locks are
taken in the right order (with the right lock rank checking
implementation) as rank information is fully statically known.
This refactoring even more exposes the fact that Mutex has no lock rank
capabilites, which is not fixed here.
This has been done in multiple ways:
- Each time we modeset the resolution via the VirtIOGPU DisplayConnector
we ensure that the framebuffer is updated with the new resolution.
- Each time the cursor is updated we ensure that the framebuffer console
is marked dirty so the IO Work Queue task which is scheduled to check
if it is dirty, will flush the surface.
- We only initialize a framebuffer console after we ensure that at the
very least a DisplayConnector has being set with a known resolution.
- We only call GenericFramebufferConsole::enable() when enabling the
console after the important variables of the console (m_width, m_pitch
and m_height) have been set.
This is necessary to allow transferring frame buffers larger than
~500x500 pixels back to user space. Until the buffer management is
improved this allows us to at least test the existing game ports.
This happens to be a sad truth for the VirtIOGPU driver - it lacked any
error propagation measures and generally relied on clunky assumptions
that most operations with the GPU device are infallible, although in
reality much of them could fail, so we do need to handle errors.
To fix this, synchronous GPU commands no longer rely on the wait queue
mechanism anymore, so instead we introduce a timeout-based mechanism,
similar to how other Kernel drivers use a polling based mechanism with
the assumption that hardware could get stuck in an error state and we
could abort gracefully.
Then, we change most of the VirtIOGraphicsAdapter methods to propagate
errors properly to the original callers, to ensure that if a synchronous
GPU command failed, either the Kernel or userspace could do something
meaningful about this situation.
The performance that we achieve from this technique is visually worse
compared to turning off this feature, so let's not use this until we
figure out why it happens.
As is, we never *deallocate* them, so we will run out eventually.
Creating a context, or allocating a context ID, now returns ErrorOr if
there are no available free context IDs.
`number_of_fixmes--;` :^)