[Rant]: RX 5700... a frustrating experience
Page: «2/21»
  Go to:
Shmerl Nov 4, 2019
Quoting: TuxeeWell 5.4 should be out in two or three weeks. I'll try it once it becomes stable. After all, I have to work on the machine, too...

Works pretty well for me. I.e. if you already have Navi, it's not going to be stable with 5.3 for sure, so get 5.4 if you need to work on it. Otherwise, don't use Navi until it comes out.

Last edited by Shmerl on 4 November 2019 at 2:01 pm UTC
Tuxee Nov 4, 2019
Quoting: ShmerlI.e. if you already have Navi, it's not going to be stable with 5.3 for sure, so get 5.4 if you need to work on it. Otherwise, don't use Navi until it comes out.

No. Doesn't work. 5.4rc6 can't even finish booting to the desktop.

Nov  4 17:38:04 leia kernel: [   12.814605] amdgpu: [powerplay] failed send message: TransferTableSmu2Dram (18)         param: 0x00000006 response 0xffffffc2
Nov  4 17:38:04 leia kernel: [   12.814607] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 17:38:06 leia kernel: [   15.026804] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)         param: 0x00000080 response 0xffffffc2
Nov  4 17:38:09 leia kernel: [   17.241224] amdgpu: [powerplay] failed send message: TransferTableSmu2Dram (18)         param: 0x00000006 response 0xffffffc2
Nov  4 17:38:09 leia kernel: [   17.241224] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 17:38:11 leia kernel: [   19.747899] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)         param: 0x00000080 response 0xffffffc2
Nov  4 17:38:14 leia kernel: [   22.254581] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)         param: 0x00000080 response 0xffffffc2
Nov  4 17:38:14 leia kernel: [   22.254583] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 17:38:14 leia kernel: [   22.328723] kauditd_printk_skb: 30 callbacks suppressed
Nov  4 17:38:14 leia kernel: [   22.328724] audit: type=1400 audit(1572885494.198:42): apparmor="DENIED" operation="open" profile="/usr/sbin/mysqld" name="/sys/devices/system/node/" pid=2037 comm="mysqld" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Nov  4 17:38:14 leia kernel: [   22.342333] audit: type=1400 audit(1572885494.210:43): apparmor="DENIED" operation="capable" profile="/usr/sbin/mysqld" pid=2037 comm="mysqld" capability=2  capname="dac_read_search"
Nov  4 17:38:14 leia kernel: [   22.362019] audit: type=1400 audit(1572885494.230:44): apparmor="DENIED" operation="open" profile="/usr/sbin/mysqld" name="/sys/devices/system/node/" pid=2052 comm="mysqld" requested_mask="r" denied_mask="r" fsuid=121 ouid=0
Nov  4 17:38:16 leia kernel: [   24.734193] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)         param: 0x00000080 response 0xffffffc2
Nov  4 17:38:19 leia kernel: [   27.211635] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)         param: 0x00000080 response 0xffffffc2
Nov  4 17:38:19 leia kernel: [   27.211637] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 17:38:21 leia kernel: [   29.688247] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)         param: 0x00000080 response 0xffffffc2
Nov  4 17:38:24 leia kernel: [   32.165118] amdgpu: [powerplay] failed send message: SetDriverDramAddrHigh (14)         param: 0x00000080 response 0xffffffc2
Nov  4 17:38:24 leia kernel: [   32.165119] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 17:38:24 leia kernel: [   32.165883] igb 0000:06:00.0 enp6s0: igb: enp6s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Nov  4 17:38:26 leia kernel: [   34.328291] igb 0000:06:00.0: exceed max 2 second
Nov  4 17:38:26 leia kernel: [   34.328468] IPv6: ADDRCONF(NETDEV_CHANGE): enp6s0: link becomes ready
Nov  4 17:38:27 leia kernel: [   35.244024] amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31)         param: 0x00000000 response 0xffffffc2


Then I turned it off.
Maybe next time...
Shmerl Nov 4, 2019
Breaking powerplay errors went away for me around rc5 time or so. May be your firmware is not up to date?

It could be also a VBIOS issue with specific card model. Feel free to comment on this in the bug: https://bugs.freedesktop.org/show_bug.cgi?id=111481

Powerplay issues is one of the subtopics in it.

Last edited by Shmerl on 4 November 2019 at 4:51 pm UTC
Tuxee Nov 4, 2019
Doubt that. Got these

https://people.freedesktop.org/~agd5f/radeon_ucode/navi10/

And yes, I follow the discussion on freedesktop. I first ended up there when googling for the powerplay issues. The discussion there must be quite amusing for someone not affected. "Try another kernel", "try this Mesa version", "have you applied this patch?", "set these boot parameters", "maybe it's a PCIe 4 issue", "...could be NVMe related", "the crashes went away", "they are back - just not that often", "it definitely happens when something wants to get statistics from the GPU"...

It's always a good thing to learn, that you are not the only one affected. But in this very case I can't help the impression that everybody (including me o.c.) is quite clueless.

I don't get ANY powerplay issues on my 5.3 kernel on Ubuntu 18.04. I get hundreds of them on my 5.3 kernel on 19.10. Mesa is both times from Oibaf.

On 18.04

gregor@leia:/lib/firmware/amdgpu$ dmesg | grep amdgpu
[    2.926479] [drm] amdgpu kernel modesetting enabled.
[    2.926599] amdgpu 0000:0c:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
[    2.926600] amdgpu 0000:0c:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf01fffff
[    2.926600] amdgpu 0000:0c:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xfcb00000 -> 0xfcb7ffff
[    2.926602] fb0: switching to amdgpudrmfb from EFI VGA
[    2.926663] amdgpu 0000:0c:00.0: vgaarb: deactivate vga console
[    2.952688] amdgpu 0000:0c:00.0: No more image in the PCI ROM
[    2.952738] amdgpu 0000:0c:00.0: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
[    2.952739] amdgpu 0000:0c:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    2.952806] [drm] amdgpu: 8176M of VRAM memory ready
[    2.952807] [drm] amdgpu: 8176M of GTT memory ready.
[    3.738225] amdgpu: [powerplay] SMU is initialized successfully!
[    3.951684] fbcon: amdgpudrmfb (fb0) is primary device
[    4.037119] amdgpu 0000:0c:00.0: fb0: amdgpudrmfb frame buffer device
[    4.056080] amdgpu 0000:0c:00.0: ring 0(gfx_0.0.0) uses VM inv eng 4 on hub 0
[    4.056081] amdgpu 0000:0c:00.0: ring 1(gfx_0.1.0) uses VM inv eng 5 on hub 0
[    4.056082] amdgpu 0000:0c:00.0: ring 2(comp_1.0.0) uses VM inv eng 6 on hub 0
[    4.056083] amdgpu 0000:0c:00.0: ring 3(comp_1.1.0) uses VM inv eng 7 on hub 0
[    4.056083] amdgpu 0000:0c:00.0: ring 4(comp_1.2.0) uses VM inv eng 8 on hub 0
[    4.056084] amdgpu 0000:0c:00.0: ring 5(comp_1.3.0) uses VM inv eng 9 on hub 0
[    4.056084] amdgpu 0000:0c:00.0: ring 6(comp_1.0.1) uses VM inv eng 10 on hub 0
[    4.056085] amdgpu 0000:0c:00.0: ring 7(comp_1.1.1) uses VM inv eng 11 on hub 0
[    4.056086] amdgpu 0000:0c:00.0: ring 8(comp_1.2.1) uses VM inv eng 12 on hub 0
[    4.056086] amdgpu 0000:0c:00.0: ring 9(comp_1.3.1) uses VM inv eng 13 on hub 0
[    4.056087] amdgpu 0000:0c:00.0: ring 10(kiq_2.1.0) uses VM inv eng 14 on hub 0
[    4.056088] amdgpu 0000:0c:00.0: ring 11(sdma0) uses VM inv eng 15 on hub 0
[    4.056088] amdgpu 0000:0c:00.0: ring 12(sdma1) uses VM inv eng 16 on hub 0
[    4.056089] amdgpu 0000:0c:00.0: ring 13(vcn_dec) uses VM inv eng 4 on hub 1
[    4.056089] amdgpu 0000:0c:00.0: ring 14(vcn_enc0) uses VM inv eng 5 on hub 1
[    4.056090] amdgpu 0000:0c:00.0: ring 15(vcn_enc1) uses VM inv eng 6 on hub 1
[    4.056091] amdgpu 0000:0c:00.0: ring 16(vcn_jpeg) uses VM inv eng 7 on hub 1
[    4.056214] [drm] Initialized amdgpu 3.33.0 20150101 for 0000:0c:00.0 on minor 0


on 19.10 it looks like this...

Nov  4 13:01:15 leia kernel: [    3.769236] [drm] Initialized amdgpu 3.33.0 20150101 for 0000:0c:00.0 on minor 0
Nov  4 13:03:32 leia kernel: [  140.916869] amdgpu: [powerplay] Failed to send message 0xe, response 0xfffffffb param 0x80
Nov  4 13:03:32 leia kernel: [  140.916873] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:04:17 leia kernel: [  185.932919] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
Nov  4 13:04:17 leia kernel: [  185.932923] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:04:32 leia kernel: [  200.936818] amdgpu: [powerplay] Failed to send message 0xf, response 0xfffffffb, param 0xfd6000
Nov  4 13:04:42 leia kernel: [  210.938902] amdgpu: [powerplay] Failed to send message 0xe, response 0xfffffffb, param 0x80
Nov  4 13:04:42 leia kernel: [  210.938910] amdgpu: [powerplay] Failed to send message 0xf, response 0xfffffffb param 0xfd6000
Nov  4 13:04:42 leia kernel: [  210.938913] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:04:47 leia kernel: [  215.940047] amdgpu: [powerplay] Failed to send message 0x12, response 0xfffffffb param 0x6
Nov  4 13:04:47 leia kernel: [  215.940050] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:05:07 leia kernel: [  235.943394] amdgpu: [powerplay] Failed to send message 0xe, response 0xfffffffb, param 0x80
Nov  4 13:05:07 leia kernel: [  235.943527] amdgpu 0000:0c:00.0: [mmhub] VMC page fault (src_id:0 ring:174 vmid:0 pasid:0)
Nov  4 13:05:07 leia kernel: [  235.943530] amdgpu 0000:0c:00.0:   at page 0x0000000000fd6000 from 18
Nov  4 13:05:07 leia kernel: [  235.943532] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0004115C
Nov  4 13:05:10 leia kernel: [  238.703395] amdgpu: [powerplay] Failed to send message 0x12, response 0xffffffc2 param 0x6
Nov  4 13:05:10 leia kernel: [  238.703399] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:05:15 leia kernel: [  243.695494] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2, param 0x80
Nov  4 13:05:15 leia kernel: [  243.697307] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2, param 0x80
Nov  4 13:05:18 leia kernel: [  246.443323] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2 param 0x80
Nov  4 13:05:18 leia kernel: [  246.443327] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:05:18 leia kernel: [  246.443817] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Nov  4 13:05:18 leia kernel: [  246.443902] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Nov  4 13:05:18 leia kernel: [  246.448969] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2 param 0x80
Nov  4 13:05:18 leia kernel: [  246.448974] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:05:20 leia kernel: [  248.699575] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2, param 0x80
Nov  4 13:05:20 leia kernel: [  248.700054] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2, param 0x80
Nov  4 13:05:23 leia kernel: [  251.451212] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2 param 0x80
Nov  4 13:05:23 leia kernel: [  251.451216] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:05:23 leia kernel: [  251.454606] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2 param 0x80
Nov  4 13:05:23 leia kernel: [  251.454609] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:05:25 leia kernel: [  253.693905] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2, param 0x80
Nov  4 13:05:25 leia kernel: [  253.699553] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2, param 0x80
Nov  4 13:05:28 leia kernel: [  256.446574] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2 param 0x80
Nov  4 13:05:28 leia kernel: [  256.446577] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:05:28 leia kernel: [  256.452207] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2 param 0x80
Nov  4 13:05:28 leia kernel: [  256.452210] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:05:30 leia kernel: [  258.700728] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2, param 0x80
Nov  4 13:05:30 leia kernel: [  258.701184] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2, param 0x80
Nov  4 13:05:33 leia kernel: [  261.452346] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2 param 0x80
Nov  4 13:05:33 leia kernel: [  261.452350] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:05:33 leia kernel: [  261.455480] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2 param 0x80
Nov  4 13:05:33 leia kernel: [  261.455483] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:05:35 leia kernel: [  263.716466] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2, param 0x80
Nov  4 13:05:35 leia kernel: [  263.729459] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2, param 0x80
Nov  4 13:05:38 leia kernel: [  266.491442] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2 param 0x80
Nov  4 13:05:38 leia kernel: [  266.491445] amdgpu: [powerplay] Failed to export SMU metrics table!
Nov  4 13:05:38 leia kernel: [  266.503670] amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2 param 0x80
Nov  4 13:05:38 leia kernel: [  266.503674] amdgpu: [powerplay] Failed to export SMU metrics table!
Shmerl Nov 4, 2019
Can't say anything about Ubuntu, I'm using Debian testing. Until a while ago, powerplay was causing problems (stalls) when sensors were read concurrently. Then it was fixed, I still see some errors in dmesg during usage of lm-sensors for amdgpu, but they aren't breaking anything at least.

Last edited by Shmerl on 4 November 2019 at 6:23 pm UTC
Tuxee Nov 4, 2019
Anyway, thanks for your input.
Pangaea Nov 4, 2019
First time I heard about these problems was in the "my PSU fried" thread. No reviews or tests or anything mention this -- which I find odd given how common they appear to be. Thanks for making the thread, it's good to get more info about this out there.

For quite a while I've planned to fork out a big pile of money for a new rig based on AMD GPU and CPU. But seriously, huge stability issues with totally basic stuff like using Firefox and Nemo? That's a huge turn-off, and totally unacceptable. Although Nvidia is overpriced, I've started to look at them instead. So far I've not had any problems with them, either on Linux or back in the Windows days. Closed drivers or not, they simply work.

All this talk about needing to build kernels, this or that driver, patches -- it's a problem for people like me who aren't super confident and knowledgeable. And as mentioned already, this is exactly what the Windows people say about Linux. Which incidentally isn't how the experience has been so far. Generally Linux simply works, and that's that. AMD need to get their shit together -- and fast. The products may be great, especially on the CPU side, but when the PC becomes terribly unstable, then it's a no-go.
Shmerl Nov 4, 2019
Quoting: PangaeaAll this talk about needing to build kernels, this or that driver, patches -- it's a problem for people like me who aren't super confident and knowledgeable. And as mentioned already, this is exactly what the Windows people say about Linux. Which incidentally isn't how the experience has been so far. Generally Linux simply works, and that's that. AMD need to get their shit together -- and fast. The products may be great, especially on the CPU side, but when the PC becomes terribly unstable, then it's a no-go.

If I remember correctly the previous iteration with Vega, it was also quite unstable, until at least one kernel release cycle. When I switched to Vega, it was already rock solid, so I didn't encounter that period at all, but others mentioned it.

So it seems it could be a pattern. AMD releases support in kernel a.b. If you want out of the box stable experience and aren't interested in building the kernel and the like, wait until at least kernel a.b+1 before using that hardware. Stick to older one until then. I.e. in case of Navi, initial support is 5.3, then stabilized support would be at least 5.4.

In order to avoid this period gap, AMD would need to beef up their support team.

For me, it's surely no reason to ever go back to Nvidia.

Last edited by Shmerl on 4 November 2019 at 7:00 pm UTC
YoRHa-2B Nov 7, 2019
Quoting: TuxeeI'm all for Open Source but at this point I can only recommend NVidia graphic cards with their proprietary drivers.
The same is true for Windows as well. I only use that OS to run game benchmarks these days but the sheer number of issues I've had with the graphics drivers in the past four months (on an RX 480, not even Navi!) is beyond silly, hardly ever had problems before.

They managed to break D3D9 to the point where some games that were working fine before now only run with D9VK. They released a driver advertizing support for The Outer Worlds which broke The Outer Worlds. Their official Vulkan drivers are still a mess and Red Dead Redemption 2 apparently doesn't even render correctly on Navi GPUs. Hell, they even managed to break the Windows 10 login screen at some point.

No problems with Polaris on Linux right now - which, by the way, I got a mere two months after it launched and was usable right away, even on stable kernel and mesa versions (there were a few bugs, but nothing too terrible) - but it's impossible to recommend AMD GPUs at the moment. They are doing the best they can to live up the memes of their drivers being shit, and if they don't get it together some time next year, I'll have no choice but to jump ship again.

Last edited by YoRHa-2B on 7 November 2019 at 11:42 pm UTC
Shmerl Nov 7, 2019
Quoting: YoRHa-2BNo problems with Polaris on Linux right now - thank god - but it's impossible to recommend AMD GPUs at the moment. They are doing the best they can to live up the memes of their drivers being shit, and if they don't get it together some time next year, I'll have no choice but to jump ship again.

Didn't AMD report increased profits? I hope at least some of that will translate into better support.
While you're here, please consider supporting GamingOnLinux on:

Reward Tiers: Patreon. Plain Donations: PayPal.

This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!

You can find even more ways to support us on this dedicated page any time. If you already are, thank you!
Login / Register


Or login with...
Sign in with Steam Sign in with Google
Social logins require cookies to stay logged in.