NVIDIA have produced a brand new stable Linux driver with version 455.28, which adds in new GPU support and there's plenty of fixes for us too.
This is a proper mainline stable driver, so it should be good for anyone to upgrade with. A lot of this is coming over from previous Beta releases.
With this new 455.28 driver it sees official Linux support for the GeForce RTX 3080, GeForce RTX 3090 and the GeForce MX450. That's not all that was added. In this release they hooked up support for a new device-local VkMemoryType which is host-coherent and host-visible, which they said may lead to better performance for running certain titles with the DXVK translation layer like DiRT Rally 2.0, DOOM: Eternal and World of Warcraft. It also adds NVIDIA VDPAU driver support for decoding VP9 10- and 12-bit bitstreams.
Additionally they updated Base Mosaic support to allow five simultaneous displays instead of three, it now supports the NVIDIA NGX Updater, SLI Mosaic display configuration got its own dedicated place in the nvidia-settings app, they removed multiple SLI modes including "SFR", "AFR", and "AA" but SLI Mosaic, Base Mosaic, GL_NV_gpu_multicast, and GLX_NV_multigpu_context are still supported.
Plenty of bug fixes are included too, here's a list:
- Fixed a bug that caused X to crash when the NVIDIA RandR provider was disabled while using an NVIDIA-driven display as a PRIME Display Offload sink.
- Fixed a bug that prevented 8K displays from being used in portrait orientation on Pascal and earlier GPUs.
- Fixed a bug which caused excessive CPU usage in Vulkan applications which create a large number of VkFence objects. This was particularly prevalent in the Steam Play title Red Dead Redemption 2.
- Fixed a bug that caused WebKit-based applications to crash when running on Wayland.
- Fixed a bug that led to display corruption at some resolutions when using an NVIDIA-driven display as a PRIME Display Offload sink.
- Fixed a bug in a SPIR-V optimization that may cause conditional blocks to not execute.
- Fixed a bug where calls to vkGetRandROutputDisplayEXT with unexpected inputs would generate X11 protocol errors.
(https://bugs.winehq.org/show_bug.cgi?id=49407)- Fixed a small memory leak during exit of the NVIDIA EGL driver.
- Fixed several synchronization bugs that could momentarily lock up the X server when moving/resizing/focusing OpenGL and Vulkan windows when PRIME Sync was enabled.
- Fixed a bug that could cause dual-link DVI to be driven over a connector that only supports single-link DVI, when "NoMaxPClkCheck" is specified in the "ModeValidation" X configuration option. Note this fix may cause behavioral changes for configurations using this option.
- Fixed a bug where glGetGraphicsResetStatusARB would incorrectly return GL_PURGED_CONTEXT_RESET_NV immediately after application start-up if the system had previously been suspended.
- Fixed a regression that allowed displays to enter DPMS mode even when DPMS is disabled in the X server settings.
The release announcement can be found here.
https://www.tomshardware.com/news/nvidia-linux-basemosaic-ubuntu-parity,24519.html
Nice to see they're adding that functionality back now that everyone's forgotten about that...
See above for the possible cause of the stutter that you are seeing. The FPS drop you are seeing may be related to overhead introduced by PRIME Render Offload and Reverse PRIME. In order for a PRIME Render Offload app to be shown on the iGPU’s desktop, the contents of the window have to be copied across the PCIe bus into system memory, incurring bandwidth overhead. Then, in order for the iGPU’s desktop to be displayed on a dGPU output, Reverse PRIME has to copy that region of the desktop across the PCIe bus again into video memory, incurring more bandwidth overhead. These two combined can result in significant bandwidth usage that could affect performance, especially for laptops and eGPUs that are limited to 2-4 PCIe lanes.
A future driver release will introduce an optimization that avoids the overhead from both PRIME Render Offload and Reverse PRIME for fullscreen, unoccluded, unredirected windows, where the dGPU can just display the app’s contents directly. In this case, the bandwidth overhead should be no more than a native desktop.
I’m not exactly sure how Windows implements cross-GPU multi-monitor support behind the scenes, but I wouldn’t rule out that the framebuffer is being copied. At the very least, an app rendered on the dGPU would have to be copied to be displayed on an iGPU output, and vice versa. If the dGPU output has the desktop rendered separately, it could at least avoid the double-copy scenario described in #31.
On Linux, the implementation mainly has to do with the historical limitations of X. X requires the desktop to be controlled by a single protocol screen, which is controlled by a single GPU. Generally, apps will run on one X protocol screen at a time. As such, the compositor for example is one X / OpenGL client that renders into the framebuffer on one GPU.
The ability to use a second GPU for additional monitors or for offloaded rendering is a later addition. Since only the protocol screen can receive X protocol and control the desktop, the rendered desktop has to be copied over to the secondary GPU to be displayed on its outputs. Note that the secondary GPU is in control of its own outputs, it just isn’t in control of the desktop. Similarly, rendering offloaded to the secondary GPU needs to be copied to the protocol screen’s GPU in order to be included on the desktop.
The above is true for standard PRIME Display Offload too, where the NVIDIA dGPU is controlling the X protocol screen and the iGPU is the secondary GPU. “Reverse PRIME” is just a special case of PRIME Display Offload where the roles are reversed.
It is possible to set up a separate X screen for each GPU, in which case each GPU will be responsible for all of the rendering on its own monitors. However, apps can’t be dragged between them, since the framebuffers are independent.
So a future driver will enable reverse-PRIME gaming in fullscreen to be at maximum speed. The above details why X11 isn't exactly the best solution too, so it kind of mystifies me as to why they are sticking with it.
As a protocol, Wayland’s compositor-centric approach should be theoretically more capable of a more efficient implementation (do composition separately on each GPU, resulting in a unified desktop and no redundant copies), but it’s up to the compositor to implement something sophisticated.
So i back to official stable driver and now shaders are with super fast optimization time.
The Beta driver has bug with Proton shaders optimization time to get super long time.
So i back to official stable driver and now shaders are with super fast optimization time.
Shader optimization always been slow for me 10+ minutes in borderlands 3.
See more from me