NVIDIA is currently investigating a bug where their drivers are crashing on modern kernels (6.10+). This appears to happen across drivers 550, 555 and even the latest 560.
From the troubleshooting data that can be found on the NVIDIA forums and the crashes I'm facing myself, this seems to be a regression that started with Linux kernel 6.10 and it affects users when suspension mechanism is triggered or some application is processing 3D. It affects both the closed and open NVIDIA drivers.
This happened on my computer recently, and it is really annoying. Example log:
[ 29.168385] ------------[ cut here ]------------
[ 29.168385] WARNING: CPU: 13 PID: 7032 at include/linux/rwsem.h:80 follow_pte+0x1de/0x200
[ 29.168387] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq hid_logitech_hidpp uhid ccm nvidia_drm(OE) nvidia_uvm(OE) nvidia_modeset(OE) nvidia(OE) cmac algif_hash algif_skcipher af_alg xt_TCPMSS xt_tcpudp bnep nft_compat nf_tables libcrc32c crc32c_generic cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii xe drm_gpuvm drm_exec vfat gpu_sched fat drm_suballoc_helper drm_ttm_helper intel_uncore_frequency intel_uncore_frequency_common snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda x86_pkg_temp_thermal snd_sof_pci intel_powerclamp snd_sof_xtensa_dsp coretemp btusb snd_sof uvcvideo btrtl kvm_intel btintel videobuf2_vmalloc snd_sof_utils btbcm uvc btmtk videobuf2_memops snd_usb_audio videobuf2_v4l2 snd_soc_hdac_hda kvm snd_usbmidi_lib snd_soc_acpi_intel_match bluetooth videodev snd_ump soundwire_generic_allocation snd_rawmidi snd_soc_acpi videobuf2_common snd_seq_device soundwire_bus joydev mc
[ 29.168413] usbhid mousedev crc16 snd_soc_avs iwlmvm crct10dif_pclmul crc32_pclmul snd_soc_hda_codec crc32c_intel i915 snd_hda_ext_core polyval_clmulni mac80211 polyval_generic snd_soc_core gf128mul snd_hda_codec_hdmi ghash_clmulni_intel snd_compress sha512_ssse3 ac97_bus sha256_ssse3 libarc4 snd_pcm_dmaengine sha1_ssse3 aesni_intel snd_hda_intel snd_intel_dspcfg crypto_simd snd_intel_sdw_acpi cryptd processor_thermal_device_pci snd_hda_codec processor_thermal_device iTCO_wdt drm_buddy iwlwifi hid_multitouch rapl processor_thermal_wt_hint intel_pmc_bxt snd_hda_core e1000e i2c_algo_bit asus_nb_wmi hid_generic processor_thermal_rfim mei_hdcp mei_pxp spi_nor ttm iTCO_vendor_support asus_wmi intel_cstate processor_thermal_rapl snd_hwdep intel_rapl_msr platform_profile wmi_bmof intel_uncore pcspkr ucsi_acpi mtd cfg80211 snd_pcm intel_rapl_common ptp mei_me drm_display_helper snd_timer intel_lpss_pci typec_ucsi i2c_i801 pps_core processor_thermal_wt_req intel_lpss snd i2c_smbus cec typec processor_thermal_power_floor mei
[ 29.168439] idma64 thunderbolt i2c_mux soundcore rfkill intel_gtt processor_thermal_mbox roles video intel_pmc_core int3403_thermal int340x_thermal_zone intel_vsec i2c_hid_acpi int3400_thermal pmt_telemetry intel_hid wmi pmt_class i2c_hid acpi_thermal_rel sparse_keymap pinctrl_tigerlake acpi_pad mac_hid vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) crypto_user acpi_call(OE) dm_mod loop nfnetlink ip_tables x_tables zfs(POE) spl(OE) nvme nvme_core nvme_auth serio_raw atkbd libps2 vivaldi_fmap xhci_pci spi_intel_pci vmd xhci_pci_renesas spi_intel i8042 serio
[ 29.168456] CPU: 13 PID: 7032 Comm: nv_queue Tainted: P W OE 6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8
[ 29.168457] Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Dash F15 FX517ZR_FX517ZR/FX517ZR, BIOS FX517ZR.317 05/03/2023
[ 29.168457] RIP: 0010:follow_pte+0x1de/0x200
[ 29.168459] Code: cc cc cc 48 81 e2 00 00 00 c0 48 09 c2 48 f7 d2 48 85 fa 75 20 e8 b2 f5 ff ff 48 8b 35 6b f1 5c 01 48 81 e6 00 00 00 c0 eb 8d <0f> 0b 48 3b 1f 0f 83 50 fe ff ff bd ea ff ff ff eb b6 49 8b 3c 24
[ 29.168460] RSP: 0018:ffffae0a06117b48 EFLAGS: 00010246
[ 29.168461] RAX: 0000000000000000 RBX: 000076e7d773e000 RCX: ffffae0a06117b88
[ 29.168462] RDX: ffffae0a06117b80 RSI: 000076e7d773e000 RDI: ffff9bb0c135a7e8
[ 29.168462] RBP: ffffae0a06117bc8 R08: ffffae0a06117d20 R09: 0000000000000000
[ 29.168463] R10: 0000000000000200 R11: 0000000000000003 R12: ffffae0a06117b88
[ 29.168464] R13: ffffae0a06117b80 R14: ffff9bb0c8ea6880 R15: 0000000000000000
[ 29.168465] FS: 0000000000000000(0000) GS:ffff9bb470480000(0000) knlGS:0000000000000000
[ 29.168466] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 29.168466] CR2: 000078eb813de600 CR3: 00000004aec20000 CR4: 0000000000f50ef0
[ 29.168467] PKRU: 55555554
[ 29.168468] Call Trace:
[ 29.168468]
[ 29.168469] ? follow_pte+0x1de/0x200
[ 29.168470] ? __warn.cold+0x8e/0xe8
[ 29.168471] ? follow_pte+0x1de/0x200
[ 29.168473] ? report_bug+0xff/0x140
[ 29.168475] ? handle_bug+0x3c/0x80
[ 29.168476] ? exc_invalid_op+0x17/0x70
[ 29.168477] ? asm_exc_invalid_op+0x1a/0x20
[ 29.168479] ? follow_pte+0x1de/0x200
[ 29.168481] follow_phys+0x49/0x110
[ 29.168484] untrack_pfn+0x55/0x120
[ 29.168485] unmap_single_vma+0xa6/0xe0
[ 29.168487] zap_page_range_single+0x122/0x1d0
[ 29.168490] unmap_mapping_range+0x116/0x140
[ 29.168492] ? __pfx__main_loop+0x10/0x10 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.168578] nv_revoke_gpu_mappings+0x67/0xb0 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.168657] RmHandleIdleSustained+0x3b/0x140 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.168787] ? gpumgrGetGpu+0x69/0xa0 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.168918] rm_execute_work_item+0xda/0x150 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.169054] _main_loop+0x95/0x150 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.169153] kthread+0xcf/0x100
[ 29.169156] ? __pfx_kthread+0x10/0x10
[ 29.169157] ret_from_fork+0x31/0x50
[ 29.169159] ? __pfx_kthread+0x10/0x10
[ 29.169161] ret_from_fork_asm+0x1a/0x30
[ 29.169163]
[ 29.169164] ---[ end trace 0000000000000000 ]---
Symptoms:
- Video gets choppy, and hiccups happen approximately every 45s.
- The game keeps playing while the video is frozen so, extra care if you are playing something on "hardcore mode". I've got killed twice playing Core Keeper and getting a freeze during a battle.
- After more than 30 minutes playing, the load average of the machine can spike to three digits without any sign of bottleneck.
top
,iostat
,vmstat
orfree
show no clear sign of system performance degradation. dmesg
gets flooded with similar stack trace messages.
Possible Solution:
- Use
linux-lts
(6.6) with nvidia 550 and 555 modules.
NVIDIA is investigating the issue and more information can be found on the following forum threads:
- Nvidia driver kernel random call trace
- Multiple kernel oopses before suspending caused by nvidia-sleep.sh, Linux 6.10 regression? WARNING: CPU: PID: at include/linux/rwsem.h:80 follow_pte
There's links to other threads in the above with more people seeing the issues too.
From reports, it seems the latest NVIDIA 560 driver does not solve it.
Speculating on the fact that the open and the closed ones are causing the aame issues and given that the userspace part of the driver "should" not crash the kernel by its own, maybe there is common code between the open and the closed drivers.
That's not speculation. Nvidia said as much from the beginning.
Though the kernel modules in the two flavors are different, they are based on the same underlying source code.
The open drivers just have any code removed that they could not relicense to MIT/GPL, such functions were rewritten or ported to the GSP.
Last edited by Vash63 on 24 August 2024 at 12:03 pm UTC
Im having problems with the 535.183.01-0ubuntu0.24.04.1 driver that was in the mint update manager today...... Steam seems to play games fine but PCSX2 and Dolphin run slow and choppy with sound glitches....... So I guess Nvidia is just being sh*t as per usual.......
I'm on arch (btw), but are said emus installed as flatpaks? if so, you gotta sudo flatpak update as well and restart otherwise they won't detect ur GPU, at least that's the case with me
Yes they are installed as flatpaks...... While steam is native........Im having problems with the 535.183.01-0ubuntu0.24.04.1 driver that was in the mint update manager today...... Steam seems to play games fine but PCSX2 and Dolphin run slow and choppy with sound glitches....... So I guess Nvidia is just being sh*t as per usual.......
I'm on arch (btw), but are said emus installed as flatpaks? if so, you gotta sudo flatpak update as well and restart otherwise they won't detect ur GPU, at least that's the case with me
so is the command
sudo flatpak update
If so I get this
Looking for updates…
ID Branch Op Remote Download
1. org.gtk.Gtk3theme.Mint-Y-Aqua 3.22 i flathub < 114.8 kB
Proceed with these changes to the system installation? [Y/n]:
Dunno if that looks right or not........
Sorry im not good at this.....
Yes they are installed as flatpaks...... While steam is native........Im having problems with the 535.183.01-0ubuntu0.24.04.1 driver that was in the mint update manager today...... Steam seems to play games fine but PCSX2 and Dolphin run slow and choppy with sound glitches....... So I guess Nvidia is just being sh*t as per usual.......
I'm on arch (btw), but are said emus installed as flatpaks? if so, you gotta sudo flatpak update as well and restart otherwise they won't detect ur GPU, at least that's the case with me
so is the command
sudo flatpak update
If so I get this
Looking for updates…
ID Branch Op Remote Download
1. org.gtk.Gtk3theme.Mint-Y-Aqua 3.22 i flathub < 114.8 kB
Proceed with these changes to the system installation? [Y/n]:
Dunno if that looks right or not........
Sorry im not good at this.....
Hmm, have you tried just restarting your PC and trying again? Seems like there's no GPU updates there (org.freedesktop.Platform.GL32.nvidia-xx, etc) but I can't figure out what else would be your issue.
I need to restart in order for GPU to get detected by flatpaks again after update
Hmm, have you tried just restarting your PC and trying again? Seems like there's no GPU updates there (org.freedesktop.Platform.GL32.nvidia-xx, etc) but I can't figure out what else would be your issue.Yep ive rebooted several times..... Just tried it again.......
I need to restart in order for GPU to get detected by flatpaks again after update
Looking for updates…
Nothing to do.
Thanks for trying though..... Least now I have a good idea of whats going on at least..........
Last edited by StoneColdSpider on 24 August 2024 at 2:01 pm UTC
Hmm, have you tried just restarting your PC and trying again? Seems like there's no GPU updates there (org.freedesktop.Platform.GL32.nvidia-xx, etc) but I can't figure out what else would be your issue.
I need to restart in order for GPU to get detected by flatpaks again after update
Well its fixed now........ I had already installed the org.freedesktop.Platform.GL32.nvidia-xx but there was a problem with it and a new one JUST popped up in the mint update manager and now the flatpaks are working......
Thanks for your you knowledge mate :)
RTX 3060 Ti with 555 drivers
Fedora with Kernel 6.10.6-cb1 (CachyOS')
i haven't encountered any issues and have been running 6.10 and 560 drivers long time.
I also use my PC quite heavily every day and rarely shut it down. But i use X11 not wayland if thats also a difference i guess..
For me nvidia drivers have been incredibly stable since 545 at least... 530 had some regressions with some games iirc but after that its been smooth sailing for me
RTX 3080,Arch Linux, MATE desktop X11
Setting the file /etc/dracut.conf.d/nvidia-dracut-force.conf with
# https://wiki.archlinux.org/title/Dracut#Early_kernel_module_loading
force_drivers+=" nvidia nvidia_modeset nvidia_uvm nvidia_drm
And running doas/sudo dracut --force --printsize --parallel --verbose makes the system not boot fully hanging on after LUKS password and some time after throwing a bunch of error/warning messages. BUT, even on Fedora 41/rawhide with kernel 6.11.xx it also happened.
this seems to be a regression that started with Linux kernel 6.10 and it affects users when suspension mechanism is triggered or some application is processing 3D
What does this mean? Like if I am running a game or maybe Blender and then simply put my system into standby? If so than I guess I know why I didn't ran into this issue as I never did this before.
Last edited by Vortex_Acherontic on 25 August 2024 at 8:59 am UTC
This appears to happen across drivers 550, 555 and even the latest 560.
Yet again I am grateful for Debian Sid maintainers keeping me on the 535 line
Ditto for the Linux Mint maintainers. Also on the 535 line (at least, for LM 21 -- haven't upgraded to 22 yet).This appears to happen across drivers 550, 555 and even the latest 560.
Yet again I am grateful for Debian Sid maintainers keeping me on the 535 line
Last edited by Caldathras on 26 August 2024 at 4:10 pm UTC
See more from me