While you're here, please consider supporting GamingOnLinux on:
Reward Tiers: Patreon. Plain Donations: PayPal.
This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!
You can find even more ways to support us on this dedicated page any time. If you already are, thank you!
Reward Tiers: Patreon. Plain Donations: PayPal.
This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!
You can find even more ways to support us on this dedicated page any time. If you already are, thank you!
Login / Register
- Fedora KDE gets approval to be upgraded to sit alongside Fedora Workstation
- Steam gets new tools for game devs to offer players version switching in-game
- Palworld dev details the patents Nintendo and The Pokemon Company are suing for
- Sony say their PSN account requirement on PC is so you can enjoy their games 'safely'
- AYANEO 3 now officially announced with AMD Ryzen AI 9 HX 370 and HDR OLED
- > See more over 30 days here
-
Mesa 24.2.7 out now and Mesa 24.3 may come sooner than …
- redneckdrow -
Mesa 24.2.7 out now and Mesa 24.3 may come sooner than …
- whizse -
Mesa 24.2.7 out now and Mesa 24.3 may come sooner than …
- axredneck -
Mesa 24.2.7 out now and Mesa 24.3 may come sooner than …
- redneckdrow -
Epic roguelike Caves of Qud now has a proper tutorial
- Jarmer - > See more comments
- Who wants a free GOG key for Dishonored?
- poke86 - No more posting on X / Twitter
- Liam Dawe - Steam and offline gaming
- damarrin - Weekend Players' Club 10/11/2024
- Pengling - Upped the limit on article titles
- eldaking - See more posts
View PC info
TL;dr - Anyone with Threadripper or Ryzen hardware still seeing stability problems? Specifically with PCIE errors.
Long version - Between motherboards, power supplies, SSD's and Video cars I've now spent several thousand upgrading to a 1950x, yet still getting hardware problems.
Example errors:
#
pcieport 0000:00:01.1: AER: Corrected error received: id=0000
pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Transmitter ID)
pcieport 0000:00:01.1: device [1022:1453] error status/mask=00001000/00006000
pcieport 0000:00:01.1: [12] Replay Timer Timeout
pcieport 0000:00:01.1: AER: Corrected error received: id=0000
pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
pcieport 0000:00:01.1: device [1022:1453] error status/mask=00000080/00006000
pcieport 0000:00:01.1: [ 7] Bad DLLP
#
# dmesg |grep pciep |grep -- '\['|sort | uniq -c
316 pcieport 0000:00:01.1: [12] Replay Timer Timeout
1689 pcieport 0000:00:01.1: [ 6] Bad TLP
17 pcieport 0000:00:01.1: [ 7] Bad DLLP
1652 pcieport 0000:00:01.1: device [1022:1453] error status/mask=00000040/00006000
17 pcieport 0000:00:01.1: device [1022:1453] error status/mask=00000080/00006000
279 pcieport 0000:00:01.1: device [1022:1453] error status/mask=00001000/00006000
37 pcieport 0000:00:01.1: device [1022:1453] error status/mask=00001040/00006000
46 pcieport 0000:01:00.2: [12] Replay Timer Timeout
46 pcieport 0000:01:00.2: device [1022:43b1] error status/mask=00001000/00002000
# # lspci -vv | egrep '(1453|43b1)'
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453 (prog-if 00 [Normal decode])
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453 (prog-if 00 [Normal decode])
01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b1 (rev 02) (prog-if 00 [Normal decode])
pcilib: sysfs_read_vpd: read failed: Input/output error
40:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453 (prog-if 00 [Normal decode])
40:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453 (prog-if 00 [Normal decode])
My concerns are twofold - Getting to the root cause and warranty should this turn out to be hardware related. Trying to find answers from vendors is an exercise in futility. I don't do Windows, at all. I don't even own a pirated copy. Disabling the AER is not an option either since the hardware is throwing errors for a reason. Neither is increasing
Something else odd is they appear to change in frequency based on where the hardware physically is. Maybe RFI / shielding problems?
Hardware:
Ryzen Threadripper 1950x (16 core)
Asus ROG Zenith
2x Samsung SM961 NVMe
2x Samsung Pro 960 SSDs
4x 3TB WD Reds
1x EVGA 1080TI/FTW3
32G DDR4
1x EVGA 1kW Supernova P2
4.13.9 Kernel.
Also tried with and still freaking own - the AORUS board, different RAM, Pro 950's, EVGA 850W PSU, EVGA 1080TI Kingpin (returned due to coil whine. Should have kept it as the FTW3 had whine too. Though there was nothing else wrong, replacing the powersupply fixed it. EVGA wouldn't comment on why).
Basically a beast.
View PC info
xpander@arch ~ $ dmesg |grep pciep |grep -- '\['|sort | uniq -c
1 [ 1.297189] pcieport 0000:00:01.3: AER enabled with IRQ 28
1 [ 1.297204] pcieport 0000:00:03.1: AER enabled with IRQ 29
and what stability issue?
i havent had any issues since april of this year, after few bios versions that fixed all this.
longest uptime has been 15 days only, but i do reboots to update kernel, so i really haven't kept the system going for longer periods.
View PC info
View PC info
4.9 doesnt even support ryzen. 4.10 had initial support and 4.11 more stuff.
im on 4.13 as well
View PC info
Ryzen 1800x
Asus Rog Crosshair VI Hero
PCIe NVME drive
Nvidia GTX 1070
$ dmesg |grep pciep |grep -- '\['|sort | uniq -c
1 [ 1.182122] pcieport 0000:00:01.1: AER enabled with IRQ 28
1 [ 1.182150] pcieport 0000:00:01.3: AER enabled with IRQ 29
1 [ 1.182164] pcieport 0000:00:03.1: AER enabled with IRQ 30
Take a look here, maybe its something with Nvidia 1080 cards?
GTX 1080 Throwing Bad TLP PCIe Bus Errors
Good luck
View PC info
Regarding the geforce URL, it's broken. Spent 20 minutes filling out capchas to no avail. NVIDIA uses Incapsula on their a lot of their sites which breaks them all... and although they use Google anyway, it's only THEM.
I'm guessing the link is like everywhere else, telling people to disable PCI memory mapping (the pci=nommconf), or dropping back to gen3 PCI. Neither are _really_ options. FWIW gen2 does make the errors go away which furthers my suspicion it's a hardware problem. Given the thousands people have spent I'm guessing hardware vendors will not be opening that can of worms.
As above, with a gigabyte aorus K7 and 1700x . No issues from release bar mem speed. which was fixed from the f4 bios end of may
dmesg |grep pciep |grep -- '\['|sort | uniq -c
1 [ 1.216868] pcieport 0000:00:01.1: AER enabled with IRQ 28
1 [ 1.216889] pcieport 0000:00:01.3: AER enabled with IRQ 29
1 [ 1.216904] pcieport 0000:00:03.1: AER enabled with IRQ 30
1 [ 1.216910] pcieport 0000:00:01.1: Signaling PME with IRQ 28
1 [ 1.216917] pcieport 0000:00:01.3: Signaling PME with IRQ 29
1 [ 1.216926] pcieport 0000:00:03.1: Signaling PME with IRQ 30
1 [ 1.216940] pcieport 0000:00:07.1: Signaling PME with IRQ 31
1 [ 1.216956] pcieport 0000:00:08.1: Signaling PME with IRQ 33
View PC info
For you case I think if you add these kernel parameters in Grub you should be fine :)
pcie_aspm=off
I am using grub Customizer just for easy editing
View PC info
I dunno about Ryzen, but usually all x86 CPUs are backwards compatible. You can even run 32bit OSes. I would suppose it is the new Ryzen features that are not supported. So running older kernel should not be optimum but should run and maybe the bugs are not triggered also. So 4.9 may work and also may work without problem. Has happened to me a many many times. On the other hand some new cpus seems that really demand new kernels so maybe you are right