Time to get your popcorn out, as it seems NVIDIA are continuing their steps to further improve their Linux support on their GPUs.
We've seen NVIDIA make a few more direct steps towards open source lately like the 560 series and above will switch to open kernel modules by default on Turing+, a big patch set for Nouveau, and even contributions to NVK.
Now they're exploring more ways to support the an upstream kernel mode driver. As posted on the dri-devel mailing list from NVIDIA's Ben Skeggs, who was the former Nouveau lead:
NVIDIA has been exploring ways to better support the effort for an upstream kernel mode driver for GPUs that are capable of running GSP-RM firmware, since the introduction[1] to Nova.
Use cases have been identified for which separating the core GPU programming out of the full DRM driver stack is a strong requirement from our key customers.
An upstreamed NVIDIA GPU driver should be able to support current and emerging customer use cases for vGPU hosts. NVIDIA's vGPU deployments to date do not support compute or graphics functionality within the hypervisor host, and have no dependency on the Linux graphics subsystem, instead implementing the minimal functionality required to run vGPU guest VMs.
For security-sensitive environments such as cloud infrastructure, it's important to continue support for running a minimal footprint vGPU host driver in a stripped-down / barebones kernel environment.
This can be achieved by supporting both VFIO and DRM drivers as clients of a core driver, without requiring a full-fledged DRM driver (or the DRM subsystem itself) to be built into the host kernel.
A core driver would be responsible for booting and communicating with GSP-RM, enumeration of HW configuration, shared/partitioned resource management, exception handling, and event dispatch.
The DRM driver would do all the standard things a DRM driver does, and implement GPU memory management (TTM/HMM), KMS, command submission etc, as well as providing UAPI for userspace clients. These features would be implemented using HW resources allocated from a core driver, rather than the DRM driver being directly responsible for HW programming.
As Nouveau's KMD is already split (in the logical sense) along similar lines, we're using it here for the purposes of this RFC to demonstrate the feasibility of such an architecture, and open it up for discussion.
Hope this is just the beginning.
I think it's valuable that they start looking into proper GPU support for Linux out of the box. They probably have seen too what current projects achieve with NVK and a RedHat backed nouveau on the GSP stack, and that it can be valuable to customers (if you don't need CUDA or similar).
It's not as if AMD or Intel have a "huge" team behind the linux drivers. And Nvidia certainly would have the resources to sponsor at least the same amount towards a proper open source support (especially since that builds upon already implemented infrastructure - mesa).
The proprietary drivers won't go away, and some will use them for support by Nvidia especially on commercial use or other complex use cases. AMD still got them too - but I think nobody actually uses them :D.
I am happy by the development we have seen, but it's mostly sponsored by Valve and RedHat. The patchsets are nice, but not by any means the main effort.
I hope Nvidia steps up the support. Not only since I use Nvidia at the moment and don't plan to replace my gaming rig any time soon, but since it would highly benefit a large user base.
I am glad that NVidia appear to now be slowly opening up their driver stack. Hopefully in future this will force HDMI forum to open up HDMI 2.1.Unlikely. nVidia has not opened up their driver, they've moved their proprietary code into their firmware & made their firmware open-source consumable. There's no reason for a open HDMI 2.1 implementation on nVidia's side.
Hope this is just the beginning.
Use cases have been identified for which separating the core GPU programming out of the full DRM driver stack is a strong requirement from our key customers.
Why does it need separating?
I don't get this part:
Use cases have been identified for which separating the core GPU programming out of the full DRM driver stack is a strong requirement from our key customers.
Why does it need separating?
I believe it is because otherwise there would be no valid selling point for their Quardo line of GPUs.
They often times share the same architecture as the consumer GTX and RTX cards. But certain features, especially around virtualisation, are simply switched off driver side.
If this would be all open ppl could "just" enable the Quadro exclusive features and do not need to pay a fortune on Quadros.
One of the reasons why the move towards the GSP co-processor was made and the nvidia firmware blob being so huge. Instead of doing things (switch things on/off) driver side they moved into the GSP.
I think there is a strong link between these two things and the reason for "Use cases have been identified for which separating the core GPU programming out of the full DRM driver stack is a strong requirement from our key customers."
As a side note: I believe AMD is doing similar things. Certain workstation GPU features are turned on/off by firmware rather than driver and the reason there are some industry software strongly requiring the closed source AMD drivers. Especially around compute stuff.
The flip side of the coin is: This way you can server open drivers and still lock certain things behind a kind of pay wall for enterprise customers.
If I am wrong with any of these statements I encourage everyone to correct me where I might picked up things wrong.
Last edited by Vortex_Acherontic on 17 Jun 2024 at 4:47 pm UTC
we usually have a threat model in our minds where the host needs to be protected from malicious guests, but afaik the reverse is also important to cloud clients processing sensitive data
having minimal parts of the gpu driver running host-side seems to help this goal of keeping sensitive info exclusively in the VM, or at least that's what I'm reading into it
Last edited by Marlock on 18 Jun 2024 at 1:10 am UTC
the phoronix post seems to imply cloud VM clients need stuff isolated from cloud VM provider
we usually have a threat model in our minds where the host needs to be protected from malicious guests, but afaik the reverse is also important to cloud clients processing sensitive data
having minimal parts of the gpu driver running host-side seems to help this goal of keeping sensitive info exclusively in the VM, or at least that's what I'm reading into it
Sounds plausible to me! Not everybody might fully trust the big cloud hosters...
See more from me