Check out our Monthly Survey Page to see what our users are running.
Working around Ryzen CPU freezes
Page: «3/4»
  Go to:
devnull May 8, 2018
Hmm.. Was about to create a new thread but it might actually be related to this. Has anyone had the same thing with other Ryzen CPUs? I get this weird latency on one of the fource CCX's of a 1950x. It's weird as in, almost exactly double. I can work around it a bit by bumping the clock on those cpus. Possibly another architecture "quirk" (aka bug), is runtimes on processes where inter CCX thread migrations happen, in general are almost double when the target CPU speed is lower (XFR / boost, or intentional).

ie
sCCX | dCCX | sCPU | dCPU | Result
--------------------------
0 |0 | 0 | 16 | Fast- expected
0 |0 | 0 | 1 | Fast- expected
0 |1 | 0 | 8 | Slow - problem
0 |1 | 0 | 24 | Slow - problem
0 |1 | 0 | 12 | Slower - expected though
0 |1 | 0 | 28 | Slower - expected though

I can go into more details about the test itself but it's basically a timed busy loop in bash. CPUs threads set realtime, nohrz, etc, everything I could do to isolate them. While not scentific (+/- 40ms), it doesn't have to when one is looking at a 480ms difference.
Shmerl May 8, 2018
I didn't specifically analyze it, so I can't say. I can run your test and compare the results.
wolfyrion May 8, 2018
Hi,
I own as well 1950x with the ASrock Fatal1ty X399 Professional Gaming motherboard
I didnt have any problem at all so far with all the things I have put to this monster :P

I am running it @3.6 all cores @3200ghz
(my only problem is that is getting a bit hot when compiling stuff , like for example compiling unreal engine it goes up to 85c with a water cooler - idle is going from 35-50)

Also I am using the latest bios from Asrock
http://www.asrock.com/mb/AMD/Fatal1ty%20X399%20Professional%20Gaming/index.asp#BIOS

Here is an inxi , in case you need me test something let me know.
devnull May 8, 2018
It isn't pretty but created as a working test case.. more like working draft.


#!/bin/bash

sysp=/sys/bus/cpu/devices

fast() {
  cpu=cpu$1
  read ttfreq < $sysp/$cpu/cpufreq/cpuinfo_max_freq
  echo $ttfreq > $sysp/$cpu/cpufreq/scaling_min_freq
}

slow() {
  cpu=cpu$1
  read ttfreq < $sysp/$cpu/cpufreq/cpuinfo_min_freq
  echo $ttfreq > $sysp/$cpu/cpufreq/scaling_min_freq
}


priority() {
  taskset -pc $2 $1
  renice -20 -p $2
}

doit() {
read ttcpu < /sys/bus/cpu/devices/cpu${1}/cpufreq/scaling_min_freq
read tpcpu < /sys/bus/cpu/devices/cpu${2}/cpufreq/scaling_min_freq

printf "OUT: Parent:%s:%s Target:%s:%s\n" "$pcpu" "$tpcpu" "$tcpu" "$ttcpu"

# for i in {1..200000}
for i in {1..50000}
do
 :;
done
}

usage() { 
printf "%s" "
$@ Required:
        --target
        --source

"
exit 1

}
##  $1 - [ fast | slow ]:TargetCPU
##  $2 - [ fast | slow ]:SourceCPU

get_args() {
while [[ "$1" ]]; do
case "$1" in                                                                                             
"--target") tcpu=${2/*:/}; tcmd=${2/:*/} ;;
"--source") pcpu=${2/*:/}; pcmd=${2/:*/};;
*) printf "Unknown option: $1"; usage ;;
esac
shift 2
done 
}        

zz() {
tcpu=${1/*:/}
tcmd=${1/:*/}

shift
pcpu=${1/*:/}
pcmd=${1/:*/}
}

if [ $# -le 4 ]
then
printf "## %s\n" "$*"
get_args $* || exit $?

$tcmd $tcpu
priority $$ $tcpu

$pcmd $pcpu
priority $PPID $pcpu

doit $tcpu $pcpu

slow $pcpu
slow $tcpu

else usage
exit 1
fi


Invocation:
Local
Start on CPU4 @ slowest speed
Test on CPU2 @ slowest speed

perf stat -d -d -d  ./child.sh --target slow:2 --source slow:4  2>&1 | egrep '(OUT:|task-clock)'|xargs

CCX jump
Start on CPU4 @ slowest speed
Test on CPU25 @ slowest speed

perf stat -d -d -d  ./child.sh --target slow:25 --source slow:4  2>&1 | egrep '(OUT:|task-clock)'|xargs
OUT: Parent:4:2200000 Target:25:2200000 278.624873 task-clock (msec) # 0.998 CPUs utilized


Same CCX jump, faster clock
Start on CPU4 @ slowest speed
Test on CPU25 @ fast speed


perf stat -d -d -d  ./child.sh --target fast:25 --source slow:4  2>&1 | egrep '(OUT:|task-clock)'|xargs
OUT: Parent:4:2200000 Target:25:4100000 140.126839 task-clock (msec) # 0.997 CPUs utilized



Example output:

OUT: Parent:4:2200000 Target:2:2200000 130.223193 task-clock (msec) # 0.994 CPUs utilized

Longer version:

# perf stat -d -d -d ./child.sh --target slow:2 --source slow:4 2>&1 | egrep '(OUT:|task-clock)'|xargs
OUT: Parent:4:2200000 Target:2:2200000 130.223193 task-clock (msec) # 0.994 CPUs utilized
# perf stat -d -d -d ./child.sh --target slow:25 --source slow:4 2>&1 | egrep '(OUT:|task-clock)'|xargs
OUT: Parent:4:2200000 Target:25:2200000 278.624873 task-clock (msec) # 0.998 CPUs utilized
# perf stat -d -d -d ./child.sh --target fast:25 --source slow:4 2>&1 | egrep '(OUT:|task-clock)'|xargs
OUT: Parent:4:2200000 Target:25:4100000 140.126839 task-clock (msec) # 0.997 CPUs utilized

## do all the things...
for spd in fast slow; do for target in $spd:{0..31}; do perf stat -d -d -d  ./child.sh --target $target --source slow:4  2>&1 | egrep '(OUT:|task-clock)'|xargs ; done ; done



Explanation:

OUT: Parent:4:2200000 - CPU and clock we're on now
Target:2:2200000 - CPU and clock we're testing
130.223193 task-clock (msec) # 0.994 CPUs utilized - how long it took

High variance in the time is what I'm after, there shouldn't be much.

There are a lot of assumptions made / aren't scripted since it was a quick test. I'm assuming for example the current scheduler is ondemand, though I've seen the same with conservative. Especially if you test with fast but find the clock drops back down it could be the scheduler, or throttling (none of which are accounted for atm).

Though I try to pin the test as much as I can, some CPUs are isolated on boot. GRUB line:

nohz_full=0,16,1,17,8,24,9,25,10,26,11,27
rcu_nocbs=0,16,1,17,8,24,9,25,10,26,11,27
isolcpus=0,16,1,17,8,24,9,25,10,26,11,27

That is intentional as I pin vms. It also doesn't really affect the test from what I can tell.

Some other things to note:
- X399 AORUS Gaming 7 board
- There is _zero_ scripted thermal monitoring
- BIOS supports setting custom power states
- c6 is disabled thus 2.2GHz is the lowest clock for me, ymmv.
- I can hit faster OC but not needed to validate the test

Clocks are intentionally reset to lowest atm due to the way Ryzen works, not all cores can run full OC/XFR, ymmv. Shouldn't be a problem with most schedulers unless you've pinned them higher. Just something to be aware of. Should check what they were before changing but I'm lazy.

What kinda makes this worse is it's running in UMA/ "Creator mode", _NOT_ NUMA / "Gaming".
Shmerl May 9, 2018
Here is what I get with Ryzen 2700X (CXX jump numbers adjusted):

perf stat -d -d -d ./child.sh --target slow:2 --source slow:4 2>&1 | egrep '(OUT:|task-clock)' | xargs
OUT: Parent:4:2200000 Target:2:2200000 99.580689 task-clock (msec) # 0.740 CPUs utilized


perf stat -d -d -d ./child.sh --target slow:15 --source slow:4 2>&1 | egrep '(OUT:|task-clock)' | xargs
OUT: Parent:4:2200000 Target:15:2200000 102.475220 task-clock (msec) # 0.995 CPUs utilized


perf stat -d -d -d ./child.sh --target fast:15 --source slow:4 2>&1 | egrep '(OUT:|task-clock)' | xargs
OUT: Parent:4:2200000 Target:15:3700000 96.033168 task-clock (msec) # 0.992 CPUs utilized


But may be your issue is Threadripper specific.
devnull May 9, 2018
Something weird with those. If the layout of the 2700X is the same (lstopo or "lscpu --all -y --extended" is awesome for this), you want to test between:

0-> {0..3,8..15}
1-> {4..7,16..23}

It's interesting you still see a few ms gain though.
Shmerl May 9, 2018
2700X has only 8 physical cores (16 virtual).

lstopo is a neat tool - never hard of it before :)
devnull May 9, 2018
Hmm.. brain fart. Are you still not testing the same ccx though? First one is close but you didn't include fast.

Assuming it's:
    CCX0        CCX1
0 1  2  3  |  4  5  6  7
-------------------------
0 1  2  3  |  4  5  6  7
8 9 10 11  | 12 13 14 15
-------------------------


lstopo/hwloc is quite handy indeed. It misses somethings like identifying nvme drives (lists the bus of course), but you can export it to an XML and add whatever you want. I have VMs for example mapped. Almost like porn on massive servers :P

Don't know why the forum is eating that ascii. It looks fine on preview but post gets garbled.. hm.
devnull May 10, 2018
Possibly related, appears I wasn't the only one to notice this. There have been some scheduling changes in 4.16. @Shmerl were your tests above still on 4.15?

I've seen something quite similar with Samsung NVME drives. The latency remains high
because the drive remains at a lower power state.

From ioping, drop is after starting dd in another term.

Quote4 KiB <<< /dev/nvme1n1 (block device 953.9 GiB): request=19 time=5.59 ms
4 KiB <<< /dev/nvme1n1 (block device 953.9 GiB): request=20 time=5.81 ms (slow)
4 KiB <<< /dev/nvme1n1 (block device 953.9 GiB): request=21 time=5.66 ms (slow)
4 KiB <<< /dev/nvme1n1 (block device 953.9 GiB): request=22 time=5.80 ms (slow)
4 KiB <<< /dev/nvme1n1 (block device 953.9 GiB): request=23 time=5.62 ms
4 KiB <<< /dev/nvme1n1 (block device 953.9 GiB): request=24 time=5.78 ms (slow)
4 KiB <<< /dev/nvme1n1 (block device 953.9 GiB): request=25 time=192.7 us (fast)
4 KiB <<< /dev/nvme1n1 (block device 953.9 GiB): request=26 time=180.0 us (fast)
4 KiB <<< /dev/nvme1n1 (block device 953.9 GiB): request=27 time=69.9 us (fast)
4 KiB <<< /dev/nvme1n1 (block device 953.9 GiB): request=28 time=54.1 us (fast)
4 KiB <<< /dev/nvme1n1 (block device 953.9 GiB): request=29 time=56.1 us (fast)
Shmerl May 11, 2018
I think I upgraded to 4.16 before running the tests.
While you're here, please consider supporting GamingOnLinux on:

Reward Tiers: Patreon. Plain Donations: PayPal.

This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!

You can find even more ways to support us on this dedicated page any time. If you already are, thank you!
Login / Register


Or login with...
Sign in with Steam Sign in with Google
Social logins require cookies to stay logged in.