Page 1 of 1

[Solved] AMD Ryzen system crashes with 4.10+ series kernels

Posted: Sun Dec 10, 2017 4:03 am
by Tallone55
Solved:
ClixTrix wrote: Mon May 07, 2018 7:32 am Since my last post to you, there is a BIOS fix issued by AMD for the Ryzen idle freeze problem. Since you've updated to a 2700x, you probably updated the BIOS to current release. Look in BIOS settings for "PSU Idle Control" and set it to "Typical Current Idle". It probably is in the same section of BIOS where the C6/Global C-States settings are found.
My system has been crashing regularly without helpful log data on recent kernels. The sequence of events is thus:

1: While running a 4.8 series kernel I replaced my motherboard, memory and CPU.

2: Linux Mint began using 4.10 series kernels.

3: I updated to the 4.10 series kernels

4: I began experiencing regular hard crashes which left behind no consistent information in syslog other than series of null bytes which sequentially grew longer if I didn't restart.

5: I downgraded to 4.8 series kernels for several months because I needed a stable system.

I unsuccessfully attempted to troubleshoot the problem before downgrading by:

A: Using 4.11, 4.12 and 4.13 series kernels.
B: Flashing my motherboard BIOS to the most recent version.
C: Not using bluetooth. (The system continued to crash, though significantly less frequently.)
D: Running memtest (clean)
E: Reinstalling Nvidia proprietary drivers, though the problems began before I also replaced my graphics card.
F: Extensive googling for similar issues.

Some months later, I've reinstalled my entire system because I was aware that this bug likely awaited me if I updated to mint 18.3 and I wanted to be able to use more recent kernels. Despite a completely fresh installation running 4.10.0-42-generic, I began to again experience crashes.

Because the crashes persist through a clean system install, I can only assume that the problem is a hardware one.

EDIT: I can confirm that Ubuntu-based distros all suffer the same problem, I just had a live session of Ubuntu GNOME running from a USB drive crash with the same symptoms on my machine.

Relevant System Specs:
AMD Ryzen 5 1600 CPU
MSI Tomahawk B350 Motherboard
Corsair Vengeance LPX 8GB DDR4 2666 MHz RAM

Re: System crashes with 4.10+ series kernels

Posted: Sun Dec 10, 2017 4:05 am
by Tallone55
When I do get useful log data it sometimes looks like this:

https://pastebin.com/qmcziMtt

followed by a large number of null bytes after the system crashes.

This error message seems to correspond to a slower crash where the cursor begins stuttering noticeably for a few seconds prior to total meltdown.

Re: System crashes with 4.10+ series kernels

Posted: Sun Dec 10, 2017 7:07 am
by thx-1138
...that part in specific is IOMMU related:
Dec 9 06:42:36 thomas-linux kernel: [ 3493.358058] AMD-Vi: Event logged [
Dec 9 06:42:36 thomas-linux kernel: [ 3493.358063] IO_PAGE_FAULT device=24:00.0 domain=0x000b address=0x00000000fba84804 flags=0x0000]
With some googling around, and elsewhere, it seems that some people were able to get away with: iommu=soft

Post the output of:

Code: Select all

inxi -Fxz
So that people around know what they're dealing with...

Better edit the thread's title to 'Ryzen system crashes...', so that people with Ryzen will notice and pop-up to help...

Re: System crashes with 4.10+ series kernels

Posted: Sun Dec 10, 2017 7:18 am
by Pjotr
Sticking to the 4.8 kernel series, even though it's outdated and no longer supported with security updates, is still a reasonable option. Even security updates for the kernel, usually have little practical relevance for a desktop user. :)

Re: System crashes with 4.10+ series kernels

Posted: Sun Dec 10, 2017 9:31 am
by Tallone55

Code: Select all

System:    Host: thomas-linux Kernel: 4.8.0-58-generic x86_64 (64 bit gcc: 5.4.0)
           Desktop: Cinnamon 3.6.6 (Gtk 3.18.9-1ubuntu3.3)
           Distro: Linux Mint 18.3 Sylvia
Machine:   Mobo: Micro-Star model: B350 TOMAHAWK (MS-7A34) v: 1.0
           Bios: American Megatrends v: 1.80 date: 09/13/2017
CPU:       Hexa core AMD Ryzen 5 1600 Six-Core (-HT-MCP-) cache: 3072 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm) bmips: 38395
           clock speeds: max: 3200 MHz 1: 1550 MHz 2: 1550 MHz 3: 1550 MHz
           4: 2800 MHz 5: 1550 MHz 6: 1550 MHz 7: 1550 MHz 8: 1550 MHz
           9: 1550 MHz 10: 1550 MHz 11: 1550 MHz 12: 1550 MHz
Graphics:  Card: NVIDIA Device 1b81 bus-ID: 25:00.0
           Display Server: X.Org 1.18.4 drivers: nvidia (unloaded: fbdev,vesa,nouveau)
           Resolution: 1280x1024@60.02hz
           GLX Renderer: GeForce GTX 1070/PCIe/SSE2
           GLX Version: 4.5.0 NVIDIA 384.90 Direct Rendering: Yes
Audio:     Card-1 Advanced Micro Devices [AMD] Device 1457
           driver: snd_hda_intel bus-ID: 27:00.3
           Card-2 NVIDIA Device 10f0 driver: snd_hda_intel bus-ID: 25:00.1
           Card-3 C-Media driver: USB Audio usb-ID: 003-004
           Sound: Advanced Linux Sound Architecture v: k4.8.0-58-generic
Network:   Card-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           driver: r8169 v: 2.3LK-NAPI port: f000 bus-ID: 21:00.0
           IF: enp33s0 state: down mac: <filter>
           Card-2: Qualcomm Atheros AR93xx Wireless Network Adapter
           driver: ath9k bus-ID: 24:00.0
           IF: wlp36s0 state: up mac: <filter>
Drives:    HDD Total Size: 1000.2GB (4.7% used)
           ID-1: /dev/sda model: ST1000DM003 size: 1000.2GB
Partition: ID-1: / size: 908G used: 37G (5%) fs: ext4 dev: /dev/dm-0
           ID-2: /boot size: 473M used: 68M (15%) fs: ext2 dev: /dev/sda2
           ID-3: swap-1 size: 8.54GB used: 0.00GB (0%) fs: swap dev: /dev/dm-1
RAID:      No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors:   None detected - is lm-sensors installed and configured?
Info:      Processes: 312 Uptime: 50 min Memory: 1742.0/7998.5MB
           Init: systemd runlevel: 5 Gcc sys: 5.4.0
           Client: Shell (bash 4.3.481) inxi: 2.2.35
System information, as requested. The page fault happens a couple minutes before the system crash, so it's unlikely to be the direct cause, though I'll look into the solution you posted. I'm using the 4.8 kernel series atm because of other projects I'm working on, but I'll be back on 4.10 after I've finished those other projects.

Re: System crashes with 4.10+ series kernels

Posted: Sun Dec 10, 2017 9:45 am
by ClixTrix
I'm running the Gigabyte equivalent of your motherboard with Ryzen 1600. See my specs.

Here are some tips on Ryzen:

1. Update your BIOS to at least the AGESA Code 1.0.0.6b for Ryzen firmware patches. That might also help memory compatibility problems.

2. Ryzen requires a "Haswell Ready" PSU due to the C6 power requirement which is consistent with Haswell C6/C7 requirement. I'd also advise a good quality 500W minimum (which is often stated in Manuals on these boards).

3. Ryzen SMT Scheduler support requires Kernel 4.10 or newer. I'm very stable on 4.10 (with a fix) and am waiting for the 4.14 Kernel to mature for it's support of Ryzen Turbo mode. Again, Kernel 4.10 is the oldest Kernel to have Ryzen support. I didn't use the 4.8 Kernel due to the Vector 07 bug with Gigabyte motherboards. I did use the 4.4 Kernel on early installs, but only as a method to load Mint and move to Kernel 4.10.

4. There is a known bug causing freezes on Ryzen. The workaround (fix I mentioned) is adding rcu_nocbs=0-xx, where xx is equal to the number of processor threads minus one to the Grub boot parms, like adding NOMODESET. This patch avoided the idle freezes I had. For my Ryzen 1600, the patch is rcu_nocbs=0-11.

5. There are reported problems with using nVidia video cards with the default nouveau driver on Ryzen systems. Update to the proprietary nVidea driver using Driver Update.

6. There is a segfault problem with early Ryzens produced up to about week 25. I've got one and used Kill-Ryzen test to verify it on mine. It's not something that your would typically encounter with normal use of your system. You can RMA your Ryzen for that problem.

Here are threads at Ubuntu and Kernel.org discussing the bugs:

https://bugs.launchpad.net/ubuntu/+sour ... ug/1690085

https://bugzilla.kernel.org/show_bug.cgi?id=196683

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Sun Dec 10, 2017 10:05 am
by thx-1138
Tallone55, your inxi reports:
driver: ath9k bus-ID: 24:00.0
Your original syslog above reports on top:
IO_PAGE_FAULT device=24:00.0
And afterwards, it's exactly ath9k that starts faltering...

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Sun Dec 10, 2017 10:07 am
by Hoser Rob
What pjotr said ... just use a kernel that works.

There's no such thing as backwards/forwards compatibility in Linux really. Some newer kernels are just not going to work. It may fix something and break something else.

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Sun Dec 10, 2017 11:00 am
by Pjotr
Hoser Rob wrote:There's no such thing as backwards/forwards compatibility in Linux really.
In some respects, compatibility is worse than in Windows. But in other respects, it's better than in Windows (e.g. old printers, old Nvidia video cards, etc.).

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Fri Dec 22, 2017 8:28 am
by Tallone55
I figured I shouldn't leave this thread hanging like I did the last time I attempted to address this issue. I read through the list posted above by ClixTrix and followed his links. Judging by the relatively light load I was running when my system crashed each time, it would appear that my problem is the one discussed in the threads linked above.

My Ryzen does not appear to be experiencing segfaults. I'm not ready to pull it off the mobo to find out what week it's from. I ordered it at the end of July, but it could have been sitting on a shelf for weeks.

The issue is *not* the following:

1: "Haswell ready" PSU. I have confirmed that my 750W Corsair PSU is, indeed, "Haswell ready"
2: nouveau driver. I currently have the driver blacklisted and am using the latest Nvidia proprietary drivers.

The likeliest issue would appear to be #4: worked-around by adding rcu-nocbs=0-11 to Grub boot params. This seems to be fairly reliable according to the bug reports on that issue, at least until AMD releases AGESA 1.0.0.7, which was slated for Novemberish and I haven't heard anything about since October. There are rumors it will contain a fix for the idle bug.

I'll test this weekend when I have time and see if I can get some measure of stability on a newer kernel. Do mint kernels come compiled with the CONFIG_RCU_NOCB_CPU=y option?

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Fri Dec 22, 2017 10:08 am
by ClixTrix
Tallone55 wrote:The likeliest issue would appear to be #4: worked-around by adding rcu-nocbs=0-11 to Grub boot params. This seems to be fairly reliable according to the bug reports on that issue, at least until AMD releases AGESA 1.0.0.7, which was slated for Novemberish and I haven't heard anything about since October. There are rumors it will contain a fix for the idle bug.

I'll test this weekend when I have time and see if I can get some measure of stability on a newer kernel. Do mint kernels come compiled with the CONFIG_RCU_NOCB_CPU=y option?
No, there was a change last June that was apparently applied to current and downstream Kernels getting updates.

https://git.kernel.org/pub/scm/linux/ke ... 0602c25764

I've been operating with the RCU_NOCB=0-11 override (adds back what they removed) since early November with no idle freezes. I've only tested it on the 4.10 Kernel (see my sig). If your system hardware is working with 4.10, I'd stick with that Kernel for now. I'm aware that some Ryzen motherboards have a newer audio CODEC, which might require Kernel 4.11 or newer. If I were to selected a newer Kernel, I'd go with 4.13 as it is getting updates for now (and 4.11 isn't).

The recently released 4.14 Kernel added Turbo-mode support for Ryzen. It should be available to Mint 18 after the next Ubuntu 16.04.4 LTS release in mid-February. I intend to test 4.14 mainline early in January, when the fix-cycle for early bugs slows.

My board has AGESA 1.0.7.2a available with BIOS F10, since early this month. Not thinking that latest has the idle fix. It hasn't come up at the bugzilla thread, so doubting it does. Haven't flashed my board to that level, so can't say for certain.

https://bugzilla.kernel.org/show_bug.cgi?id=196683

Edit: I checked MSI for your motherboards latest BIOS. I see it's using AGESA 1.0.0.6b from last September. That's equivalent to my current F5a. So, MSI hasn't yet added 1.0.7.2a to a BIOS. I'm aware of new processors support and memory stability fixes for 1.0.7.2a only. So, not sure you're missing anything.

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Fri Dec 22, 2017 5:24 pm
by mr_raider
I'm using a ryzen 1700x, b350 tomahawk, and corsair LPX 2666 RAM. Virtually identical. I have been rock stable under 4.8, 4.10 aand 4.13.

I suggest you troubleshoot your hardware.

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Fri Dec 22, 2017 7:06 pm
by ClixTrix
mr_raider wrote:I'm using a ryzen 1700x, b350 tomahawk, and corsair LPX 2666 RAM. Virtually identical. I have been rock stable under 4.8, 4.10 aand 4.13.

I suggest you troubleshoot your hardware.
Any chance you have C6 or Global C States disabled in BIOS? That is one bypass for the problem (and worked for me in testing).

Also, have you tried letting the system sit idle for long periods of 4 or more hours, e.g. overnight or weekend? Some have indicated it can take a day or more of idle before a freeze. It's not a simple bug that's easy/quick to show. However, it has got AMD's attention and appears (from sources) they are working on a code fix. I hope that's the case.

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Fri Dec 22, 2017 7:19 pm
by mr_raider
My system folds 24/7 as is evident from my signature. If the community is willing to give up 350k ppd, I'm happy to let it go idle.

I have all power saving options enabled.

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Fri Dec 22, 2017 7:35 pm
by ClixTrix
mr_raider wrote:My system folds 24/7 as is evident from my signature. If the community is willing to give up 350k ppd, I'm happy to let it go idle.

I have all power saving options enabled.
No, not suggesting any sacrifice needed. However, it is an "idle freeze" problem, so it would appear you probably won't encounter the bug in the scope of your normal use. Just keep folding ....

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Sat Dec 23, 2017 5:34 am
by joele
Not sure this is helpful but I have had a bit of a journey with my Ryzen box and linux..

I couldn't get into it with 18.2 or 18.1, couldn't even boot the live disc, so ended up switching to Fedora 26 then 27, both worked perfectly with my ryzen 1600x.

I prefer a deb based system so I decided to give 18.3 a try and it installs fine but I was getting freeze ups, kernel level I think as it was completely locked up, it only occurred after the computer was left idle for a decent period.. I updated to 4.13 and made sure to load the latest Nvidia drivers (just in case) and it has been rock solid, no more freeze ups.

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Sat Dec 23, 2017 7:48 am
by ClixTrix
joele wrote:Not sure this is helpful but I have had a bit of a journey with my Ryzen box and linux..

I couldn't get into it with 18.2 or 18.1, couldn't even boot the live disc, so ended up switching to Fedora 26 then 27, both worked perfectly with my ryzen 1600x.

I prefer a deb based system so I decided to give 18.3 a try and it installs fine but I was getting freeze ups, kernel level I think as it was completely locked up, it only occurred after the computer was left idle for a decent period.. I updated to 4.13 and made sure to load the latest Nvidia drivers (just in case) and it has been rock solid, no more freeze ups.
I couldn't install (boot from Live) Mint 18.2 either due to the vector 07 bug associated with Gigabyte motherboards (see my sig). That bug was fixed in kernel 4.10.

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Mon May 07, 2018 3:47 am
by Tallone55
Posting here on this thread again because my problem remains unresolved.

Things I've tried in the intervening months:

1: Disabling global c states, first in the BIOS.

2 then by compiling a custom kernel with the no_cb option as mentioned above.

Finally, I broke down and bought a new CPU, assuming that the modified architecture of Ryzen second generation would absolutely fix my problem.
I purchased a Ryzen 2700x and installed it on my motherboard.

The crashes in newer kernels persist.

I am at a loss. The symptoms are identical to before; the processor locks up and prints null bytes in the syslog.

At this point, I have no idea which component could be the cause of my problem. The evidence continues to point at the processor, but a new CPU, obviously produced *after* AMD fixed the manufacturing defect, with a different lithography, doesn't seem like it could *possibly* be the root cause. I don't know what my next step should be.

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Mon May 07, 2018 4:26 am
by Pjotr
Did you already try the latest kernel of the 4.15 series?
Update Manager - panel: View - Linux kernels

After installing it: reboot.

Re: AMD Ryzen system crashes with 4.10+ series kernels

Posted: Mon May 07, 2018 7:32 am
by ClixTrix
Since my last post to you, there is a BIOS fix issued by AMD for the Ryzen idle freeze problem. Since you've updated to a 2700x, you probably updated the BIOS to current release. Look in BIOS settings for "PSU Idle Control" and set it to "Typical Current Idle". It probably is in the same section of BIOS where the C6/Global C-States settings are found.