NVIDIA is currently investigating a bug where their drivers are crashing on modern kernels (6.10+). This appears to happen across drivers 550, 555 and even the latest 560.
From the troubleshooting data that can be found on the NVIDIA forums and the crashes I'm facing myself, this seems to be a regression that started with Linux kernel 6.10 and it affects users when suspension mechanism is triggered or some application is processing 3D. It affects both the closed and open NVIDIA drivers.
This happened on my computer recently, and it is really annoying. Example log:
[ 29.168385] ------------[ cut here ]------------
[ 29.168385] WARNING: CPU: 13 PID: 7032 at include/linux/rwsem.h:80 follow_pte+0x1de/0x200
[ 29.168387] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq hid_logitech_hidpp uhid ccm nvidia_drm(OE) nvidia_uvm(OE) nvidia_modeset(OE) nvidia(OE) cmac algif_hash algif_skcipher af_alg xt_TCPMSS xt_tcpudp bnep nft_compat nf_tables libcrc32c crc32c_generic cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii xe drm_gpuvm drm_exec vfat gpu_sched fat drm_suballoc_helper drm_ttm_helper intel_uncore_frequency intel_uncore_frequency_common snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda x86_pkg_temp_thermal snd_sof_pci intel_powerclamp snd_sof_xtensa_dsp coretemp btusb snd_sof uvcvideo btrtl kvm_intel btintel videobuf2_vmalloc snd_sof_utils btbcm uvc btmtk videobuf2_memops snd_usb_audio videobuf2_v4l2 snd_soc_hdac_hda kvm snd_usbmidi_lib snd_soc_acpi_intel_match bluetooth videodev snd_ump soundwire_generic_allocation snd_rawmidi snd_soc_acpi videobuf2_common snd_seq_device soundwire_bus joydev mc
[ 29.168413] usbhid mousedev crc16 snd_soc_avs iwlmvm crct10dif_pclmul crc32_pclmul snd_soc_hda_codec crc32c_intel i915 snd_hda_ext_core polyval_clmulni mac80211 polyval_generic snd_soc_core gf128mul snd_hda_codec_hdmi ghash_clmulni_intel snd_compress sha512_ssse3 ac97_bus sha256_ssse3 libarc4 snd_pcm_dmaengine sha1_ssse3 aesni_intel snd_hda_intel snd_intel_dspcfg crypto_simd snd_intel_sdw_acpi cryptd processor_thermal_device_pci snd_hda_codec processor_thermal_device iTCO_wdt drm_buddy iwlwifi hid_multitouch rapl processor_thermal_wt_hint intel_pmc_bxt snd_hda_core e1000e i2c_algo_bit asus_nb_wmi hid_generic processor_thermal_rfim mei_hdcp mei_pxp spi_nor ttm iTCO_vendor_support asus_wmi intel_cstate processor_thermal_rapl snd_hwdep intel_rapl_msr platform_profile wmi_bmof intel_uncore pcspkr ucsi_acpi mtd cfg80211 snd_pcm intel_rapl_common ptp mei_me drm_display_helper snd_timer intel_lpss_pci typec_ucsi i2c_i801 pps_core processor_thermal_wt_req intel_lpss snd i2c_smbus cec typec processor_thermal_power_floor mei
[ 29.168439] idma64 thunderbolt i2c_mux soundcore rfkill intel_gtt processor_thermal_mbox roles video intel_pmc_core int3403_thermal int340x_thermal_zone intel_vsec i2c_hid_acpi int3400_thermal pmt_telemetry intel_hid wmi pmt_class i2c_hid acpi_thermal_rel sparse_keymap pinctrl_tigerlake acpi_pad mac_hid vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) crypto_user acpi_call(OE) dm_mod loop nfnetlink ip_tables x_tables zfs(POE) spl(OE) nvme nvme_core nvme_auth serio_raw atkbd libps2 vivaldi_fmap xhci_pci spi_intel_pci vmd xhci_pci_renesas spi_intel i8042 serio
[ 29.168456] CPU: 13 PID: 7032 Comm: nv_queue Tainted: P W OE 6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8
[ 29.168457] Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Dash F15 FX517ZR_FX517ZR/FX517ZR, BIOS FX517ZR.317 05/03/2023
[ 29.168457] RIP: 0010:follow_pte+0x1de/0x200
[ 29.168459] Code: cc cc cc 48 81 e2 00 00 00 c0 48 09 c2 48 f7 d2 48 85 fa 75 20 e8 b2 f5 ff ff 48 8b 35 6b f1 5c 01 48 81 e6 00 00 00 c0 eb 8d <0f> 0b 48 3b 1f 0f 83 50 fe ff ff bd ea ff ff ff eb b6 49 8b 3c 24
[ 29.168460] RSP: 0018:ffffae0a06117b48 EFLAGS: 00010246
[ 29.168461] RAX: 0000000000000000 RBX: 000076e7d773e000 RCX: ffffae0a06117b88
[ 29.168462] RDX: ffffae0a06117b80 RSI: 000076e7d773e000 RDI: ffff9bb0c135a7e8
[ 29.168462] RBP: ffffae0a06117bc8 R08: ffffae0a06117d20 R09: 0000000000000000
[ 29.168463] R10: 0000000000000200 R11: 0000000000000003 R12: ffffae0a06117b88
[ 29.168464] R13: ffffae0a06117b80 R14: ffff9bb0c8ea6880 R15: 0000000000000000
[ 29.168465] FS: 0000000000000000(0000) GS:ffff9bb470480000(0000) knlGS:0000000000000000
[ 29.168466] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 29.168466] CR2: 000078eb813de600 CR3: 00000004aec20000 CR4: 0000000000f50ef0
[ 29.168467] PKRU: 55555554
[ 29.168468] Call Trace:
[ 29.168468]
[ 29.168469] ? follow_pte+0x1de/0x200
[ 29.168470] ? __warn.cold+0x8e/0xe8
[ 29.168471] ? follow_pte+0x1de/0x200
[ 29.168473] ? report_bug+0xff/0x140
[ 29.168475] ? handle_bug+0x3c/0x80
[ 29.168476] ? exc_invalid_op+0x17/0x70
[ 29.168477] ? asm_exc_invalid_op+0x1a/0x20
[ 29.168479] ? follow_pte+0x1de/0x200
[ 29.168481] follow_phys+0x49/0x110
[ 29.168484] untrack_pfn+0x55/0x120
[ 29.168485] unmap_single_vma+0xa6/0xe0
[ 29.168487] zap_page_range_single+0x122/0x1d0
[ 29.168490] unmap_mapping_range+0x116/0x140
[ 29.168492] ? __pfx__main_loop+0x10/0x10 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.168578] nv_revoke_gpu_mappings+0x67/0xb0 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.168657] RmHandleIdleSustained+0x3b/0x140 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.168787] ? gpumgrGetGpu+0x69/0xa0 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.168918] rm_execute_work_item+0xda/0x150 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.169054] _main_loop+0x95/0x150 [nvidia 6898836e29120618a557bb388a70bcdb9b6600f4]
[ 29.169153] kthread+0xcf/0x100
[ 29.169156] ? __pfx_kthread+0x10/0x10
[ 29.169157] ret_from_fork+0x31/0x50
[ 29.169159] ? __pfx_kthread+0x10/0x10
[ 29.169161] ret_from_fork_asm+0x1a/0x30
[ 29.169163]
[ 29.169164] ---[ end trace 0000000000000000 ]---
Symptoms:
- Video gets choppy, and hiccups happen approximately every 45s.
- The game keeps playing while the video is frozen so, extra care if you are playing something on "hardcore mode". I've got killed twice playing Core Keeper and getting a freeze during a battle.
- After more than 30 minutes playing, the load average of the machine can spike to three digits without any sign of bottleneck.
top
,iostat
,vmstat
orfree
show no clear sign of system performance degradation. dmesg
gets flooded with similar stack trace messages.
Possible Solution:
- Use
linux-lts
(6.6) with nvidia 550 and 555 modules.
NVIDIA is investigating the issue and more information can be found on the following forum threads:
- Nvidia driver kernel random call trace
- Multiple kernel oopses before suspending caused by nvidia-sleep.sh, Linux 6.10 regression? WARNING: CPU: PID: at include/linux/rwsem.h:80 follow_pte
There's links to other threads in the above with more people seeing the issues too.
From reports, it seems the latest NVIDIA 560 driver does not solve it.
Quoting: kokoko3kSpeculating on the fact that the open and the closed ones are causing the aame issues and given that the userspace part of the driver "should" not crash the kernel by its own, maybe there is common code between the open and the closed drivers.
That's not speculation. Nvidia said as much from the beginning.
Though the kernel modules in the two flavors are different, they are based on the same underlying source code.
The open drivers just have any code removed that they could not relicense to MIT/GPL, such functions were rewritten or ported to the GSP.
Last edited by Vash63 on 24 August 2024 at 12:03 pm UTC
Quoting: StoneColdSpiderIm having problems with the 535.183.01-0ubuntu0.24.04.1 driver that was in the mint update manager today...... Steam seems to play games fine but PCSX2 and Dolphin run slow and choppy with sound glitches....... So I guess Nvidia is just being sh*t as per usual.......
I'm on arch (btw), but are said emus installed as flatpaks? if so, you gotta sudo flatpak update as well and restart otherwise they won't detect ur GPU, at least that's the case with me
Quoting: basedYes they are installed as flatpaks...... While steam is native........Quoting: StoneColdSpiderIm having problems with the 535.183.01-0ubuntu0.24.04.1 driver that was in the mint update manager today...... Steam seems to play games fine but PCSX2 and Dolphin run slow and choppy with sound glitches....... So I guess Nvidia is just being sh*t as per usual.......
I'm on arch (btw), but are said emus installed as flatpaks? if so, you gotta sudo flatpak update as well and restart otherwise they won't detect ur GPU, at least that's the case with me
so is the command
sudo flatpak update
If so I get this
Looking for updates…
ID Branch Op Remote Download
1. org.gtk.Gtk3theme.Mint-Y-Aqua 3.22 i flathub < 114.8 kB
Proceed with these changes to the system installation? [Y/n]:
Dunno if that looks right or not........
Sorry im not good at this.....
Quoting: StoneColdSpiderQuoting: basedYes they are installed as flatpaks...... While steam is native........Quoting: StoneColdSpiderIm having problems with the 535.183.01-0ubuntu0.24.04.1 driver that was in the mint update manager today...... Steam seems to play games fine but PCSX2 and Dolphin run slow and choppy with sound glitches....... So I guess Nvidia is just being sh*t as per usual.......
I'm on arch (btw), but are said emus installed as flatpaks? if so, you gotta sudo flatpak update as well and restart otherwise they won't detect ur GPU, at least that's the case with me
so is the command
sudo flatpak update
If so I get this
Looking for updates…
ID Branch Op Remote Download
1. org.gtk.Gtk3theme.Mint-Y-Aqua 3.22 i flathub < 114.8 kB
Proceed with these changes to the system installation? [Y/n]:
Dunno if that looks right or not........
Sorry im not good at this.....
Hmm, have you tried just restarting your PC and trying again? Seems like there's no GPU updates there (org.freedesktop.Platform.GL32.nvidia-xx, etc) but I can't figure out what else would be your issue.
I need to restart in order for GPU to get detected by flatpaks again after update
Quoting: basedHmm, have you tried just restarting your PC and trying again? Seems like there's no GPU updates there (org.freedesktop.Platform.GL32.nvidia-xx, etc) but I can't figure out what else would be your issue.Yep ive rebooted several times..... Just tried it again.......
I need to restart in order for GPU to get detected by flatpaks again after update
Looking for updates…
Nothing to do.
Thanks for trying though..... Least now I have a good idea of whats going on at least..........
Last edited by StoneColdSpider on 24 August 2024 at 2:01 pm UTC
Quoting: basedHmm, have you tried just restarting your PC and trying again? Seems like there's no GPU updates there (org.freedesktop.Platform.GL32.nvidia-xx, etc) but I can't figure out what else would be your issue.
I need to restart in order for GPU to get detected by flatpaks again after update
Well its fixed now........ I had already installed the org.freedesktop.Platform.GL32.nvidia-xx but there was a problem with it and a new one JUST popped up in the mint update manager and now the flatpaks are working......
Thanks for your you knowledge mate :)
See more from me