Infrequent GPU Crashes/Freezes During Gaming

Forum rules
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
Locked
SpookyNoodle
Level 1
Level 1
Posts: 19
Joined: Sun Apr 11, 2021 4:35 pm

Infrequent GPU Crashes/Freezes During Gaming

Post by SpookyNoodle »

Hi all,

kind of a continuation from this thread, although I'm making a new thread since we've moved on from the initial question of "force driver restart from keyboard."

On my system, with seemingly little correlation between incidents, my screen will freeze up and become totally unresponsive to keyboard and mouse. I seriously can't find any connections between what causes these issues, other than that it's always during some game. It's happened in the middle of intense fights during League of Legends, as well as standing still in the middle of a previously-loaded room in Binding of Isaac. There's no common factor in terms of graphical quality, resolution, movement, 2D vs 3D, or anything else I can detect. It's not like this is super common either, it happens maybe once every 3-4 days.

Image

I can still SSH into the system from my laptop, and interestingly, the audio still plays. In fact, I was in a Discord call during one occurrence, and I could continue talking with people even after the screen froze. The only way I've found to solve the issue is to force a manual restart, by pressing and holding the power button. Shortcuts like ctrl + alt + delete, or ctrl + alt + backspace don't work, and sending an SSH reboot command didn't do anything either.

I went into the logs, and here's what I found at the time of the crash:

Code: Select all

18:19:58 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
18:19:58 kernel: amdgpu 0000:26:00.0: amdgpu: GPU reset(2) succeeded!
18:19:58 kernel: amdgpu 0000:26:00.0: amdgpu: GPU reset(2) succeeded!
18:19:58 kernel: amdgpu 0000:26:00.0: amdgpu: GPU reset succeeded, trying to resume
18:19:58 kernel: amdgpu 0000:26:00.0: amdgpu: GPU BACO reset
18:19:58 kernel: amdgpu: rlc is busy, skip halt rlc
18:19:58 kernel: amdgpu: rlc is busy, skip halt rlc
18:19:57 kernel: amdgpu: cp is busy, skip halt cp
18:19:57 kernel: amdgpu 0000:26:00.0: amdgpu: GPU reset begin!
18:19:57 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process League of Legen pid 31092 thread League of Legen pid 31217
18:19:57 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process League of Legen pid 31092 thread League of Legen pid 31217
18:19:57 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2053311, emitted seq=2053313
18:19:57 kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
I tried searching for phrases like Waiting for fences timed out and ring gfx timeout, and I came across some conflicting information. One thread suggested adding a line to grub.cfg: rcu_nocbs=0-3 idle=nomwait, however it also clarified that this issue was only working on Linux Kernel 5.1.14 or older, but I'm honestly not sure how to change my Linux Kernel version. When I looked up tutorials, everything suggested building a custom kernel from scratch, or rolling back to a kernel that was previously installed, and I'm not even sure how to do that. I'm not sure I even fully understand what a kernel is.

If anyone has any suggestions, or can point to a straightforward tutorial on how to install and use (?) specific kernel versions, I'd be appreciative. Thanks for your time!

Output from inxi:

Code: Select all

System:
  Kernel: 5.8.0-50-generic x86_64 bits: 64 compiler: N/A 
  Desktop: Cinnamon 4.8.6 wm: muffin 4.8.1 dm: LightDM 1.30.0 
  Distro: Linux Mint 20.1 Ulyssa base: Ubuntu 20.04 focal 
Machine:
  Type: Desktop Mobo: ASRock model: AB350 Pro4 serial: <filter> 
  UEFI: American Megatrends v: P5.00 date: 07/05/2018 
CPU:
  Topology: 6-Core model: AMD Ryzen 5 1600 bits: 64 type: MT MCP arch: Zen 
  rev: 1 L2 cache: 3072 KiB 
  flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm 
  bogomips: 76655 
  Speed: 1375 MHz min/max: 1550/3200 MHz boost: enabled Core speeds (MHz): 
  1: 1375 2: 1374 3: 1372 4: 1375 5: 1374 6: 1374 7: 1375 8: 1374 9: 1374 
  10: 1376 11: 1374 12: 1374 
Graphics:
  Device-1: AMD Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] 
  vendor: PC Partner Limited driver: amdgpu v: kernel bus ID: 26:00.0 
  chip ID: 1002:67df 
  Display: x11 server: X.Org 1.20.9 driver: amdgpu,ati 
  unloaded: fbdev,modesetting,vesa tty: N/A 
  OpenGL: renderer: AMD Radeon RX 480 Graphics (POLARIS10 DRM 3.38.0 
  5.8.0-50-generic LLVM 12.0.0) 
  v: 4.6 Mesa 21.0.3 - kisak-mesa PPA direct render: Yes 
Audio:
  Device-1: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] 
  vendor: PC Partner Limited driver: snd_hda_intel v: kernel bus ID: 26:00.1 
  chip ID: 1002:aaf0 
  Device-2: AMD Family 17h HD Audio vendor: ASRock driver: snd_hda_intel 
  v: kernel bus ID: 28:00.3 chip ID: 1022:1457 
  Device-3: Blue Microphones Yeti Stereo Microphone type: USB 
  driver: hid-generic,snd-usb-audio,usbhid bus ID: 3-3:3 chip ID: b58e:9e84 
  serial: <filter> 
  Device-4: C-Media Audio Adapter (Unitek Y-247A) type: USB 
  driver: hid-generic,snd-usb-audio,usbhid bus ID: 1-7:3 chip ID: 0d8c:0014 
  Sound Server: ALSA v: k5.8.0-50-generic 
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet 
  vendor: ASRock driver: r8169 v: kernel port: d000 bus ID: 25:00.0 
  chip ID: 10ec:8168 
  IF: enp37s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
Drives:
  Local Storage: total: 6.60 TiB used: 662.17 GiB (9.8%) 
  ID-1: /dev/sda vendor: SanDisk model: SD8SBAT256G1122 size: 238.47 GiB 
  speed: 6.0 Gb/s serial: <filter> rev: 3000 scheme: GPT 
  ID-2: /dev/sdb vendor: Western Digital model: WD30EZRZ-00Z5HB0 
  size: 2.73 TiB speed: 6.0 Gb/s rotation: 5400 rpm serial: <filter> 
  rev: 0A80 scheme: GPT 
  ID-3: /dev/sdc type: USB vendor: Seagate model: Backup+ Hub BK 
  size: 3.64 TiB serial: <filter> rev: D781 scheme: GPT 
Partition:
  ID-1: / size: 216.77 GiB used: 12.72 GiB (5.9%) fs: ext4 dev: /dev/sda2 
  ID-2: /home size: 2.69 TiB used: 178.10 GiB (6.5%) fs: ext4 dev: /dev/sdb1 
  ID-3: swap-1 size: 17.14 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/sda3 
Sensors:
  System Temperatures: cpu: 30.4 C mobo: N/A gpu: amdgpu temp: 36 C 
  Fan Speeds (RPM): N/A gpu: amdgpu fan: 799 
Repos:
  No active apt repos in: /etc/apt/sources.list 
  Active apt repos in: /etc/apt/sources.list.d/brave-browser-release.list 
  1: deb [signed-by=/usr/share/keyrings/brave-browser-archive-keyring.gpg arch=amd64] https://brave-browser-apt-release.s3.brave.com/ stable main
  Active apt repos in: /etc/apt/sources.list.d/google-chrome.list 
  1: deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main
  Active apt repos in: /etc/apt/sources.list.d/kisak-kisak-mesa-focal.list 
  1: deb http://ppa.launchpad.net/kisak/kisak-mesa/ubuntu focal main
  Active apt repos in: /etc/apt/sources.list.d/lutris-team-lutris-focal.list 
  1: deb http://ppa.launchpad.net/lutris-team/lutris/ubuntu focal main
  Active apt repos in: /etc/apt/sources.list.d/official-package-repositories.list 
  1: deb http://mirrors.xmission.com/linuxmint ulyssa main upstream import backport
  2: deb http://archive.ubuntu.com/ubuntu focal main restricted universe multiverse
  3: deb http://archive.ubuntu.com/ubuntu focal-updates main restricted universe multiverse
  4: deb http://archive.ubuntu.com/ubuntu focal-backports main restricted universe multiverse
  5: deb http://security.ubuntu.com/ubuntu/ focal-security main restricted universe multiverse
  6: deb http://archive.canonical.com/ubuntu/ focal partner
  Active apt repos in: /etc/apt/sources.list.d/playonlinux.list 
  1: deb http://deb.playonlinux.com/ stretch main
  Active apt repos in: /etc/apt/sources.list.d/slgobinath-gcalendar-focal.list 
  1: deb http://ppa.launchpad.net/slgobinath/gcalendar/ubuntu focal main
  Active apt repos in: /etc/apt/sources.list.d/spotify.list 
  1: deb http://repository.spotify.com stable non-free
  Active apt repos in: /etc/apt/sources.list.d/teams.list 
  1: deb [arch=amd64] https://packages.microsoft.com/repos/ms-teams stable main
  Active apt repos in: /etc/apt/sources.list.d/vscode.list 
  1: deb [arch=amd64,arm64,armhf signed-by=/etc/apt/trusted.gpg.d/packages.microsoft.gpg] https://packages.microsoft.com/repos/code stable main
Info:
  Processes: 343 Uptime: 57m Memory: 15.63 GiB used: 3.06 GiB (19.6%) 
  Init: systemd v: 245 runlevel: 5 Compilers: gcc: 9.3.0 alt: 9 Shell: zsh 
  v: 5.8 running in: terminator inxi: 3.0.38 
  
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
Imma be honest: I'm real fockin' stupid.
User avatar
SMG
Level 25
Level 25
Posts: 31307
Joined: Sun Jul 26, 2020 6:15 pm
Location: USA

Re: Infrequent GPU Crashes/Freezes During Gaming

Post by SMG »

SpookyNoodle wrote: Sun Apr 25, 2021 10:52 pmOn my system, with seemingly little correlation between incidents, my screen will freeze up and become totally unresponsive to keyboard and mouse. I seriously can't find any connections between what causes these issues, other than that it's always during some game. It's happened in the middle of intense fights during League of Legends, as well as standing still in the middle of a previously-loaded room in Binding of Isaac. There's no common factor in terms of graphical quality, resolution, movement, 2D vs 3D, or anything else I can detect. It's not like this is super common either, it happens maybe once every 3-4 days.
That sounds like it happens during times of high load if it always happens during a game.
SpookyNoodle wrote: Sun Apr 25, 2021 10:52 pmI went into the logs, and here's what I found at the time of the crash:
The graphics driver, amdgpu, is crashing and is not able to recover completely. Because it can not recover, you can not get graphics. That is why restarting X Server (Ctl-Alt-Backspace) does not work.

This first comment of the Bug 205089 - amdgpu : drm:amdgpu_cs_ioctl : Failed to initialize parser -125 explains the last message you posted and is why the system can not restart.
"The GPU has reset and so you need to restart your desktop environment to continue. The error messages are because the kernel is rejecting commands from userspace because the application needs to recreate their contexts after a GPU reset. Things like desktop compositors would need to use the OpenGL robustness extensions and recreate their contexts after a GPU reset for this to work smoothly. Unfortunately, no desktop compositors do this at the moment."

Comment 3 then continues the explanation, "The GPU reset succeeded. However, since the GPU has been reset, the contents of the memory (e.g, vram) that the application was using is undefined. So the application needs to use an API level (e.g., OpenGL robustness extensions or vulkan context lost) interface to query whether the GPU was reset and re-initialize it's buffers if so."

Other comments in the thread mention vulkan. I do not know what version you are using or what that game requires, but maybe that is a factor to consider. I notice you already have upgraded your Mesa version. Maybe there are settings in the game you can adjust?
SpookyNoodle wrote: Sun Apr 25, 2021 10:52 pmI tried searching for phrases like Waiting for fences timed out and ring gfx timeout, and I came across some conflicting information. One thread suggested adding a line to grub.cfg: rcu_nocbs=0-3 idle=nomwait, however it also clarified that this issue was only working on Linux Kernel 5.1.14 or older, but I'm honestly not sure how to change my Linux Kernel version. When I looked up tutorials, everything suggested building a custom kernel from scratch, or rolling back to a kernel that was previously installed, and I'm not even sure how to do that. I'm not sure I even fully understand what a kernel is.
One uses kernels which are supported for one's operating system. If you wish to "mix and match" then you are on your own for fixing issues.

The LTS kernel for LM20 is the 5.4 kernel. There is no older kernel you can use with LM20, so the kernel you mentioned is not an option.

You are currently using the 5.8 kernel. To go back and try the 5.4 kernel you would boot into grub and go to the Advanced Options to select a 5.4 kernel. If you do not normally see grub, it's my understanding you will want to hold the Escape key when booting with UEFI in order to see grub. Mint normally boots to the highest number installed kernel unless you tell it to boot with a different kernel (in the Advanced Options of grub).

Have you had more or less of these occurences since switching to the 5.8 kernel? If less, then returning to use the 5.4 may not help. If more, then after booting into the 5.4 kernel you will want to go to Update Manager and remove the 5.8 kernels you have installed. Then future boots will automatically boot to the 5.4 kernel.
Image
A woman typing on a laptop with LM20.3 Cinnamon.
SpookyNoodle
Level 1
Level 1
Posts: 19
Joined: Sun Apr 11, 2021 4:35 pm

Re: Infrequent GPU Crashes/Freezes During Gaming

Post by SpookyNoodle »

SMG wrote: Mon Apr 26, 2021 8:38 pmHave you had more or less of these occurences since switching to the 5.8 kernel?
The number of incidents seems to be largely the same. I tried adding immu=pt to my boot parameters in /etc/default/grub, followed by a update-grub of course, but I had another fatal crash a few days later, so I guess that isn't solving my problem.

Tbh, I feel kinda discouraged. I'm not sure I'm protective enough of my privacy to want to deal with all these compatibility issues. I'll probably switch my desktop back to Win10 and then try to strip it down to improve speed where I can.
Imma be honest: I'm real fockin' stupid.
User avatar
Pjotr
Level 23
Level 23
Posts: 19879
Joined: Mon Mar 07, 2011 10:18 am
Location: The Netherlands (Holland) 🇳🇱
Contact:

Re: Infrequent GPU Crashes/Freezes During Gaming

Post by Pjotr »

You might try these safe speed tweaks, notably the section about taking some load off your graphics:
https://easylinuxtipsproject.blogspot.c ... -mint.html
Tip: 10 things to do after installing Linux Mint 21.3 Virginia
Keep your Linux Mint healthy: Avoid these 10 fatal mistakes
Twitter: twitter.com/easylinuxtips
All in all, horse sense simply makes sense.
User avatar
SMG
Level 25
Level 25
Posts: 31307
Joined: Sun Jul 26, 2020 6:15 pm
Location: USA

Re: Infrequent GPU Crashes/Freezes During Gaming

Post by SMG »

SpookyNoodle wrote: Tue Apr 27, 2021 2:05 amThe number of incidents seems to be largely the same.
Did you try any of the suggestions in this post from the other thread which are to specifically help 1st gen Ryzen chipsets?
Image
A woman typing on a laptop with LM20.3 Cinnamon.
SpookyNoodle
Level 1
Level 1
Posts: 19
Joined: Sun Apr 11, 2021 4:35 pm

Re: Infrequent GPU Crashes/Freezes During Gaming

Post by SpookyNoodle »

SMG wrote: Tue Apr 27, 2021 9:05 amDid you try any of the suggestions in this post from the other thread which are to specifically help 1st gen Ryzen chipsets?
I did, yes. I adjusted the BiOS settings for the PSU, used apt install git && git clone git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git && sudo cp -v -u linux-firmware/amdgpu/* /lib/firmware/amdgpu && sudo update-initramfs -uk all, as well as adding idle=nomwait to my boot parameters.
Pjotr wrote: Tue Apr 27, 2021 4:38 am You might try these safe speed tweaks, notably the section about taking some load off your graphics:
https://easylinuxtipsproject.blogspot.c ... -mint.html
Thank you! I've actually gone through and already done all of the (relevant) adjustments on my machine, but I've still been suffering from these crashes.
Imma be honest: I'm real fockin' stupid.
User avatar
roblm
Level 15
Level 15
Posts: 5939
Joined: Sun Feb 24, 2013 2:41 pm

Re: Infrequent GPU Crashes/Freezes During Gaming

Post by roblm »

If the setting Typical Current Idle in the BIOS doesn’t help, then try Low Current Idle

Try using the kernel parameter processor.max_cstate=1

Update the BIOS/UEFI

More drastic steps are to disable C6 Mode in the BIOS and if that doesn't help, disable Global C-state Control
Locked

Return to “Graphics Cards & Monitors”