Trouble with AMD GPU (again)

Forum rules
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
Locked
DrunkenContender
Level 1
Level 1
Posts: 13
Joined: Fri May 13, 2022 5:28 pm

Trouble with AMD GPU (again)

Post by DrunkenContender »

I posted something similar a few days ago but I marked it as solved as I had reinstalled the system and everything seemed to work well, but now everything fails again. I decided to make a new thread, since the system is different now, some of the packages that seemed to be an issue aren't installed, so it's a different situation now.

My AMD GPU is giving me trouble. It seems that the system doesn't load any drivers, so I'm stuck with a very small resolution. I have the option to enter recovery mode, which makes it "usable", since I get full resolution, but it's software rendering, and I only get to use one display instead of the two that I have.

After reinstalling the system, it worked well, but then it randomly booted without drivers some times. Now it does it every time.

I dual-boot to Windows 10 almost every day. I disabled fast startup on windows (it was giving me trouble with a shared NTFS drive, this fixed it). There seems to be some discrepancy regarding system time, something like Windows and Linux interpreting the time in different ways with regards to time zone, so when I'm on Windows the time is off by 2h, I unset and set automatic time and it's fixed, then when I boot into Linux the time sets itself back to normal again. It probably doesn't have anything to do with this problem, but I don't know.

I will note that it doesn't matter which of these combinations I do, same thing happens:
  • From Mint, reboot to Mint
  • From Mint, shutdown, later boot to Mint
  • From Windows, reboot to Mint
  • From Windows, shutdown, later boot to Mint
If I can't solve this in a few days, I think I will just buy an Nvidia GPU and sell this one, since it works fine on Windows.

For all the info that I paste below, I got it when I'm in recovery mode, but I also did it by just booting normally (with no drivers loaded), and I specify when there are any differences.

HW/SW info:

My GPU is a 5700 XT. XFX RX 5700 XT Thicc III Ultra 8GB Boost Up to 2025MHz GDDR6 3xDP HDMI (Rx-57XT8TBD8)

xrandr in recovery mode:

Code: Select all

$ xrandr

xrandr: Failed to get size of gamma for output default
Screen 0: minimum 2560 x 1440, current 2560 x 1440, maximum 2560 x 1440
default connected primary 2560x1440+0+0 0mm x 0mm
   2560x1440     93.00* 
xrandr when not in recovery mode:

Code: Select all

$ xrandr

xrandr: Failed to get size of gamma for output default
Screen 0: minimum 640 x 480, current 1280 x 768, maximum 1280 x 768
default connected primary 1280x768+0+0 0mm x 0mm
   1280x720       0.00  
   1024x768       0.00  
   800x600        0.00  
   640x480        0.00  
   1280x768       0.00* 
inxi in recovery mode:

Code: Select all

$ inxi -Fxxxrz

System:    Kernel: 5.13.0-41-generic x86_64 bits: 64 compiler: N/A Desktop: Cinnamon 5.2.7 wm: muffin 5.2.1 dm: LightDM 1.30.0 
           Distro: Linux Mint 20.3 Una base: Ubuntu 20.04 focal 
Machine:   Type: Desktop Mobo: Micro-Star model: X570-A PRO (MS-7C37) v: 3.0 serial: <filter> UEFI: American Megatrends 
           v: H.C1 date: 11/16/2020 
CPU:       Topology: 6-Core model: AMD Ryzen 5 3600X bits: 64 type: MT MCP arch: Zen L2 cache: 3072 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 91203 
           Speed: 2200 MHz min/max: 2200/4409 MHz boost: enabled Core speeds (MHz): 1: 2200 2: 1867 3: 2593 4: 1992 5: 3110 
           6: 3600 7: 2338 8: 3591 9: 3007 10: 3425 11: 3557 12: 2181 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] vendor: XFX Pine 
           driver: N/A bus ID: 2f:00.0 chip ID: 1002:731f 
           Display: x11 server: X.Org 1.20.13 driver: ati,fbdev unloaded: modesetting,radeon,vesa resolution: 2560x1440~93Hz 
           OpenGL: renderer: llvmpipe (LLVM 12.0.0 256 bits) v: 4.5 Mesa 21.2.6 compat-v: 3.1 direct render: Yes 
Audio:     Device-1: Advanced Micro Devices [AMD/ATI] Navi 10 HDMI Audio driver: snd_hda_intel v: kernel bus ID: 2f:00.1 
           chip ID: 1002:ab38 
           Device-2: Advanced Micro Devices [AMD] Starship/Matisse HD Audio vendor: Micro-Star MSI X570-A PRO 
           driver: snd_hda_intel v: kernel bus ID: 31:00.4 chip ID: 1022:1487 
           Device-3: Licensed by Sony Entertainment America Rocksmith Guitar Adapter type: USB 
           driver: hid-generic,snd-usb-audio,usbhid bus ID: 1-1:2 chip ID: 12ba:00ff 
           Device-4: Focusrite-Novation type: USB driver: snd-usb-audio bus ID: 5-4.2:3 chip ID: 1235:801c 
           Sound Server: ALSA v: k5.13.0-41-generic 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Micro-Star MSI X570-A PRO driver: r8169 
           v: kernel port: d000 bus ID: 27:00.0 chip ID: 10ec:8168 
           IF: enp39s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
           IF-ID-1: virbr0 state: down mac: <filter> 
           IF-ID-2: virbr0-nic state: down mac: <filter> 
Drives:    Local Storage: total: 2.75 TiB used: 2.10 TiB (76.4%) 
           ID-1: /dev/nvme0n1 vendor: Intel model: SSDPEKNW010T8 size: 953.87 GiB speed: 31.6 Gb/s lanes: 4 serial: <filter> 
           rev: 002C scheme: GPT 
           ID-2: /dev/sda vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB speed: 6.0 Gb/s rotation: 7200 rpm 
           serial: <filter> rev: 0001 scheme: GPT 
Partition: ID-1: / size: 464.28 GiB used: 256.43 GiB (55.2%) fs: ext4 dev: /dev/nvme0n1p5 
           ID-2: swap-1 size: 14.90 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/nvme0n1p6 
Sensors:   System Temperatures: cpu: 65.9 C mobo: N/A 
           Fan Speeds (RPM): N/A 
Repos:     No active apt repos in: /etc/apt/sources.list 
           Active apt repos in: /etc/apt/sources.list.d/google-chrome.list 
           1: deb [arch=amd64] https://dl.google.com/linux/chrome/deb/ stable main
           Active apt repos in: /etc/apt/sources.list.d/google-earth-pro.list 
           1: deb [arch=amd64] http://dl.google.com/linux/earth/deb/ stable main
           Active apt repos in: /etc/apt/sources.list.d/official-package-repositories.list 
           1: deb http://packages.linuxmint.com una main upstream import backport #id:linuxmint_main
           2: deb http://archive.ubuntu.com/ubuntu focal main restricted universe multiverse
           3: deb http://archive.ubuntu.com/ubuntu focal-updates main restricted universe multiverse
           4: deb http://archive.ubuntu.com/ubuntu focal-backports main restricted universe multiverse
           5: deb http://security.ubuntu.com/ubuntu/ focal-security main restricted universe multiverse
           6: deb http://archive.canonical.com/ubuntu/ focal partner
           Active apt repos in: /etc/apt/sources.list.d/signal-xenial.list 
           1: deb [arch=amd64 signed-by=/usr/share/keyrings/signal-desktop-keyring.gpg] https://updates.signal.org/desktop/apt xenial main
           Active apt repos in: /etc/apt/sources.list.d/spotify.list 
           1: deb http://repository.spotify.com stable non-free
           Active apt repos in: /etc/apt/sources.list.d/vscode.list 
           1: deb [arch=amd64,arm64,armhf] http://packages.microsoft.com/repos/code stable main
Info:      Processes: 365 Uptime: 8m Memory: 31.27 GiB used: 9.22 GiB (29.5%) Init: systemd v: 245 runlevel: 5 Compilers: 
           gcc: 9.4.0 alt: 9 clang: 10.0.0-4ubuntu1 Shell: bash v: 5.0.17 running in: guake inxi: 3.0.38
The only different thing when I don't enter recovery mode is:

Code: Select all

Display: x11 server: X.Org 1.20.13 driver: ati,vesa unloaded: fbdev,modesetting,radeon resolution: 1280x768~N/A 
lshw in recovery mode:

Code: Select all

$ sudo lshw -C video     
	*-display UNCLAIMED       
		 description: VGA compatible controller
		 product: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
		 vendor: Advanced Micro Devices, Inc. [AMD/ATI]
		 physical id: 0
		 bus info: pci@0000:2f:00.0
		 version: c1
		 width: 64 bits
		 clock: 33MHz
		 capabilities: pm pciexpress msi vga_controller bus_master cap_list
		 configuration: latency=0
		 resources: memory:d0000000-dfffffff memory:e0000000-e01fffff ioport:f000(size=256) memory:fce00000-fce7ffff memory:c0000-dffff

The only difference in recovery mode is that capabilities doesn't contain bus_master.

modprobe in recovery mode:

Code: Select all

$ ls /etc/modprobe.d/
alsa-base.conf                  blacklist-ath_pci.conf  blacklist-firewire.conf     blacklist-modem.conf  blacklist-rare-network.conf  intel-microcode-blacklist.conf
amd64-microcode-blacklist.conf  blacklist.conf          blacklist-framebuffer.conf  blacklist-oss.conf    dkms.conf                    iwlwifi.conf
Identical in recovery mode.

$ cat /var/log/Xorg.0.log | nc termbin.com 9999 in recovery mode:
https://termbin.com/qaya


$ cat /var/log/Xorg.0.log | nc termbin.com 9999 without recovery mode:
https://termbin.com/6rq7

---

Any ideas on more diagnostics or things to try to figure out what's wrong? Thanks!
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
User avatar
SMG
Level 25
Level 25
Posts: 31988
Joined: Sun Jul 26, 2020 6:15 pm
Location: USA

Re: Trouble with AMD GPU (again)

Post by SMG »

DrunkenContender wrote: Wed May 25, 2022 8:20 amAfter reinstalling the system, it worked well, but then it randomly booted without drivers some times. Now it does it every time.
Did you happen to update kernels within that time span? If so, boot to an older one. (Actually, 5.13-044 has been released so your system does not have the latest according to the inxi output.)
DrunkenContender wrote: Wed May 25, 2022 8:20 amAfter reinstalling the system, it worked well, but then it randomly booted without There seems to be some discrepancy regarding system time, something like Windows and Linux interpreting the time in different ways with regards to time zone, so when I'm on Windows the time is off by 2h, I unset and set automatic time and it's fixed, then when I boot into Linux the time sets itself back to normal again. It probably doesn't have anything to do with this problem, but I don't know.
That's a very common issue that I do not think is related. I had thought people indicated the time difference is usually one hour, but maybe the difference is location dependent.
DrunkenContender wrote: Wed May 25, 2022 8:20 amFor all the info that I paste below, I got it when I'm in recovery mode, but I also did it by just booting normally (with no drivers loaded), and I specify when there are any differences.
I'm not really sure I understand the difference of what is happening between the two modes, but I think a journalctl of a normal boot (not recovery) would hopefully give a clue as to what is happening.

If you can boot normally and get to a terminal, please run:

Code: Select all

journalctl -b | nc termbin.com 9999
It will return with a url address that you should post in your next reply.

If you are not able to get to a terminal in regular boot mode, but can in recovery mode, do them sequentially and run the following when you boot in recovery mode:

Code: Select all

journalctl -b -1 | nc termbin.com 9999
That gets the information from the prior boot cycle. It will return with a url address that you should post in your next reply.

Please also check to see if you are running the most recent BIOS/UEFI. Generally speaking, I have seen where updates helped on the Linux-based distro side even though it appeared to make no changes for the Windows side. I do not know the specifics for your motherboard so I leave it up to you as to whether you want to try a newer version if a newer version is available.
Image
A woman typing on a laptop with LM20.3 Cinnamon.
DrunkenContender
Level 1
Level 1
Posts: 13
Joined: Fri May 13, 2022 5:28 pm

Re: Trouble with AMD GPU (again)

Post by DrunkenContender »

Hello SMG and thank you for your time and effort.

I updated to the latest kernel, I rebooted and nothing changed.

I updated the BIOS to the latest non-beta available, I rebooted and it went straight to Windows, I had to re-set booting order in the BIOS to "ubuntu" instead of Windows otherwise it didn't reach grub. Rebooted to Mint, and nothing changed, still no GPU drivers loaded.

I shutdown, waited a couple of minutes, booted again, and now it works, it seems to grab the proper drivers.

I don't celebrate too much because it might fail at any moment again as it did before, so what I did is I grabbed the output of all the commands I did in my original post, plus journalctl -b as per your suggestion, and saved all of them to text files.

If I ever boot to no GPU drivers again in the next days, I will reply with new info from those command outputs when in no-driver mode, plus all of the output that I got now when everything's fine.

If everything's fine consistently in the next few days, I will report back and mark as solved. I hope this delay in marking as solved is fine, given that my system has been a bit finicky as of late. I reboot at least once a day, so the system will be tested. I will also try to set the system to sleep and back, since that is also usually a trigger for GPU problems.

Thank you!
DrunkenContender
Level 1
Level 1
Posts: 13
Joined: Fri May 13, 2022 5:28 pm

Re: Trouble with AMD GPU (again)

Post by DrunkenContender »

Welp, it happened again.

I rebooted so that I could enable SVM on the BIOS and after booting back it didn't load the GPU drivers. Just in case I went back and disabled SVM, but still no GPU drivers after a couple of reboots, so SVM shouldn't have anything to do with anything. I'm back again in recovery mode.

Here's the diagnostics for no drivers, no recovery mode:

Code: Select all

$ xrandr

Screen 0: minimum 640 x 480, current 1280 x 768, maximum 1280 x 768
default connected primary 1280x768+0+0 0mm x 0mm
   1280x720       0.00  
   1024x768       0.00  
   800x600        0.00  
   640x480        0.00  
   1280x768       0.00* 

Code: Select all

$ inxi -Fxxxrz

System:    Kernel: 5.13.0-44-generic x86_64 bits: 64 compiler: N/A Desktop: Cinnamon 5.2.7 wm: muffin 5.2.1 
           dm: LightDM 1.30.0 Distro: Linux Mint 20.3 Una base: Ubuntu 20.04 focal 
Machine:   Type: Desktop Mobo: Micro-Star model: X570-A PRO (MS-7C37) v: 3.0 serial: <filter> 
           UEFI: American Megatrends LLC. v: H.G0 date: 03/16/2022 
CPU:       Topology: 6-Core model: AMD Ryzen 5 3600X bits: 64 type: MT MCP arch: Zen L2 cache: 3072 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 91200 
           Speed: 2199 MHz min/max: 2200/4409 MHz boost: enabled Core speeds (MHz): 1: 2199 2: 2200 3: 2631 4: 2233 
           5: 2049 6: 1867 7: 1865 8: 2797 9: 2057 10: 3597 11: 2057 12: 1866 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] vendor: XFX Pine 
           driver: N/A bus ID: 2f:00.0 chip ID: 1002:731f 
           Display: x11 server: X.Org 1.20.13 driver: ati,vesa unloaded: fbdev,modesetting,radeon resolution: 1280x768~N/A 
           OpenGL: renderer: llvmpipe (LLVM 12.0.0 256 bits) v: 4.5 Mesa 21.2.6 compat-v: 3.1 direct render: Yes 
Audio:     Device-1: Advanced Micro Devices [AMD/ATI] Navi 10 HDMI Audio driver: snd_hda_intel v: kernel bus ID: 2f:00.1 
           chip ID: 1002:ab38 
           Device-2: Advanced Micro Devices [AMD] Starship/Matisse HD Audio vendor: Micro-Star MSI X570-A PRO 
           driver: snd_hda_intel v: kernel bus ID: 31:00.4 chip ID: 1022:1487 
           Device-3: Licensed by Sony Entertainment America Rocksmith Guitar Adapter type: USB 
           driver: hid-generic,snd-usb-audio,usbhid bus ID: 1-1:2 chip ID: 12ba:00ff 
           Device-4: Focusrite-Novation type: USB driver: snd-usb-audio bus ID: 5-4.2:3 chip ID: 1235:801c 
           Sound Server: ALSA v: k5.13.0-44-generic 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Micro-Star MSI X570-A PRO 
           driver: r8169 v: kernel port: d000 bus ID: 27:00.0 chip ID: 10ec:8168 
           IF: enp39s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
           IF-ID-1: virbr0 state: down mac: <filter> 
           IF-ID-2: virbr0-nic state: down mac: <filter> 
Drives:    Local Storage: total: 2.75 TiB used: 2.13 TiB (77.5%) 
           ID-1: /dev/nvme0n1 vendor: Intel model: SSDPEKNW010T8 size: 953.87 GiB speed: 31.6 Gb/s lanes: 4 
           serial: <filter> rev: 002C scheme: GPT 
           ID-2: /dev/sda vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB speed: 6.0 Gb/s rotation: 7200 rpm 
           serial: <filter> rev: 0001 scheme: GPT 
Partition: ID-1: / size: 464.28 GiB used: 260.45 GiB (56.1%) fs: ext4 dev: /dev/nvme0n1p5 
           ID-2: swap-1 size: 14.90 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/nvme0n1p6 
Sensors:   System Temperatures: cpu: 60.0 C mobo: N/A 
           Fan Speeds (RPM): N/A 
Repos:     No active apt repos in: /etc/apt/sources.list 
           Active apt repos in: /etc/apt/sources.list.d/google-chrome.list 
           1: deb [arch=amd64] https://dl.google.com/linux/chrome/deb/ stable main
           Active apt repos in: /etc/apt/sources.list.d/google-earth-pro.list 
           1: deb [arch=amd64] http://dl.google.com/linux/earth/deb/ stable main
           Active apt repos in: /etc/apt/sources.list.d/official-package-repositories.list 
           1: deb http://packages.linuxmint.com una main upstream import backport #id:linuxmint_main
           2: deb http://archive.ubuntu.com/ubuntu focal main restricted universe multiverse
           3: deb http://archive.ubuntu.com/ubuntu focal-updates main restricted universe multiverse
           4: deb http://archive.ubuntu.com/ubuntu focal-backports main restricted universe multiverse
           5: deb http://security.ubuntu.com/ubuntu/ focal-security main restricted universe multiverse
           6: deb http://archive.canonical.com/ubuntu/ focal partner
           Active apt repos in: /etc/apt/sources.list.d/signal-xenial.list 
           1: deb [arch=amd64 signed-by=/usr/share/keyrings/signal-desktop-keyring.gpg] https://updates.signal.org/desktop/apt xenial main
           Active apt repos in: /etc/apt/sources.list.d/spotify.list 
           1: deb http://repository.spotify.com stable non-free
           Active apt repos in: /etc/apt/sources.list.d/vscode.list 
           1: deb [arch=amd64,arm64,armhf] http://packages.microsoft.com/repos/code stable main
Info:      Processes: 324 Uptime: N/A Memory: 31.27 GiB used: 1.70 GiB (5.4%) Init: systemd v: 245 runlevel: 5 Compilers: 
           gcc: 9.4.0 alt: 9 clang: 10.0.0-4ubuntu1 Shell: bash v: 5.0.17 running in: guake inxi: 3.0.38 

Code: Select all

$ sudo lshw -C video    

*-display UNCLAIMED
	 description: VGA compatible controller
	 product: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
	 vendor: Advanced Micro Devices, Inc. [AMD/ATI]
	 physical id: 0
	 bus info: pci@0000:2f:00.0
	 version: c1
	 width: 64 bits
	 clock: 33MHz
	 capabilities: pm pciexpress msi vga_controller cap_list
	 configuration: latency=0
	 resources: memory:d0000000-dfffffff memory:e0000000-e01fffff ioport:f000(size=256) memory:fce00000-fce7ffff memory:c0000-dffff

Code: Select all

$ ls /etc/modprobe.d/

alsa-base.conf
amd64-microcode-blacklist.conf
blacklist-ath_pci.conf
blacklist.conf
blacklist-firewire.conf
blacklist-framebuffer.conf
blacklist-modem.conf
blacklist-oss.conf
blacklist-rare-network.conf
dkms.conf
intel-microcode-blacklist.conf
iwlwifi.conf

Code: Select all

$ cat /var/log/Xorg.0.log | nc termbin.com 9999
https://termbin.com/xjba

Code: Select all

$ journalctl -b | nc termbin.com 9999
https://termbin.com/xjfg

Diagnostics when everything worked:

Code: Select all

$ xrandr

Screen 0: minimum 320 x 200, current 5120 x 1440, maximum 16384 x 16384
DisplayPort-0 disconnected (normal left inverted right x axis y axis)
DisplayPort-1 connected primary 2560x1440+0+0 (normal left inverted right x axis y axis) 597mm x 336mm
   2560x1440     59.95*+  74.97  
   1920x1200     59.95  
   1920x1080     60.00    50.00    59.94  
   1600x1200     59.95  
   1280x1440     59.91  
   1680x1050     59.95  
   1280x1024     75.02    60.02  
   1440x900      59.89  
   1280x960      60.00  
   1280x800      59.95  
   1280x720      60.00    50.00    59.94  
   1024x768      75.03    70.07    60.00  
   832x624       74.55  
   800x600       72.19    75.00    60.32    56.25  
   720x576       50.00  
   720x480       60.00    59.94  
   640x480       75.00    72.81    66.67    60.00    59.94  
   720x400       70.08  
DisplayPort-2 connected 2560x1440+2560+0 (normal left inverted right x axis y axis) 597mm x 336mm
   2560x1440     59.95*+  74.97  
   1920x1200     59.95  
   1920x1080     60.00    50.00    59.94  
   1600x1200     59.95  
   1280x1440     59.91  
   1680x1050     59.95  
   1280x1024     75.02    60.02  
   1440x900      59.89  
   1280x960      60.00  
   1280x800      59.95  
   1280x720      60.00    50.00    59.94  
   1024x768      75.03    70.07    60.00  
   832x624       74.55  
   800x600       72.19    75.00    60.32    56.25  
   720x576       50.00  
   720x480       60.00    59.94  
   640x480       75.00    72.81    66.67    60.00    59.94  
   720x400       70.08  
HDMI-A-0 disconnected (normal left inverted right x axis y axis)

Code: Select all

$ inxi -Fxxxrz

System:    Kernel: 5.13.0-44-generic x86_64 bits: 64 compiler: N/A Desktop: Cinnamon 5.2.7 wm: muffin 5.2.1 dm: LightDM 1.30.0 
           Distro: Linux Mint 20.3 Una base: Ubuntu 20.04 focal 
Machine:   Type: Desktop Mobo: Micro-Star model: X570-A PRO (MS-7C37) v: 3.0 serial: <filter> UEFI: American Megatrends LLC. 
           v: H.G0 date: 03/16/2022 
CPU:       Topology: 6-Core model: AMD Ryzen 5 3600X bits: 64 type: MT MCP arch: Zen L2 cache: 3072 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 91202 
           Speed: 2203 MHz min/max: 2200/4409 MHz boost: enabled Core speeds (MHz): 1: 2199 2: 2200 3: 1867 4: 2797 5: 2098 
           6: 2200 7: 2200 8: 2199 9: 2201 10: 2200 11: 2197 12: 2198 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] vendor: XFX Pine 
           driver: amdgpu v: kernel bus ID: 2f:00.0 chip ID: 1002:731f 
           Display: x11 server: X.Org 1.20.13 driver: amdgpu,ati unloaded: fbdev,modesetting,radeon,vesa 
           resolution: 2560x1440~60Hz, 2560x1440~60Hz 
           OpenGL: renderer: AMD Radeon RX 5700 XT (NAVI10 DRM 3.41.0 5.13.0-44-generic LLVM 12.0.0) v: 4.6 Mesa 21.2.6 
           direct render: Yes 
Audio:     Device-1: Advanced Micro Devices [AMD/ATI] Navi 10 HDMI Audio driver: snd_hda_intel v: kernel bus ID: 2f:00.1 
           chip ID: 1002:ab38 
           Device-2: Advanced Micro Devices [AMD] Starship/Matisse HD Audio vendor: Micro-Star MSI X570-A PRO 
           driver: snd_hda_intel v: kernel bus ID: 31:00.4 chip ID: 1022:1487 
           Device-3: Licensed by Sony Entertainment America Rocksmith Guitar Adapter type: USB 
           driver: hid-generic,snd-usb-audio,usbhid bus ID: 1-1:2 chip ID: 12ba:00ff 
           Device-4: Focusrite-Novation type: USB driver: snd-usb-audio bus ID: 5-4.2:3 chip ID: 1235:801c 
           Sound Server: ALSA v: k5.13.0-44-generic 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Micro-Star MSI X570-A PRO driver: r8169 
           v: kernel port: d000 bus ID: 27:00.0 chip ID: 10ec:8168 
           IF: enp39s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
           IF-ID-1: virbr0 state: down mac: <filter> 
           IF-ID-2: virbr0-nic state: down mac: <filter> 
Drives:    Local Storage: total: 2.75 TiB used: 2.13 TiB (77.5%) 
           ID-1: /dev/nvme0n1 vendor: Intel model: SSDPEKNW010T8 size: 953.87 GiB speed: 31.6 Gb/s lanes: 4 serial: <filter> 
           rev: 002C scheme: GPT 
           ID-2: /dev/sda vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB speed: 6.0 Gb/s rotation: 7200 rpm 
           serial: <filter> rev: 0001 scheme: GPT 
Partition: ID-1: / size: 464.28 GiB used: 260.40 GiB (56.1%) fs: ext4 dev: /dev/nvme0n1p5 
           ID-2: swap-1 size: 14.90 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/nvme0n1p6 
Sensors:   System Temperatures: cpu: 43.9 C mobo: N/A gpu: amdgpu temp: 60 C 
           Fan Speeds (RPM): N/A gpu: amdgpu fan: 0 
Repos:     No active apt repos in: /etc/apt/sources.list 
           Active apt repos in: /etc/apt/sources.list.d/google-chrome.list 
           1: deb [arch=amd64] https://dl.google.com/linux/chrome/deb/ stable main
           Active apt repos in: /etc/apt/sources.list.d/google-earth-pro.list 
           1: deb [arch=amd64] http://dl.google.com/linux/earth/deb/ stable main
           Active apt repos in: /etc/apt/sources.list.d/official-package-repositories.list 
           1: deb http://packages.linuxmint.com una main upstream import backport #id:linuxmint_main
           2: deb http://archive.ubuntu.com/ubuntu focal main restricted universe multiverse
           3: deb http://archive.ubuntu.com/ubuntu focal-updates main restricted universe multiverse
           4: deb http://archive.ubuntu.com/ubuntu focal-backports main restricted universe multiverse
           5: deb http://security.ubuntu.com/ubuntu/ focal-security main restricted universe multiverse
           6: deb http://archive.canonical.com/ubuntu/ focal partner
           Active apt repos in: /etc/apt/sources.list.d/signal-xenial.list 
           1: deb [arch=amd64 signed-by=/usr/share/keyrings/signal-desktop-keyring.gpg] https://updates.signal.org/desktop/apt xenial main
           Active apt repos in: /etc/apt/sources.list.d/spotify.list 
           1: deb http://repository.spotify.com stable non-free
           Active apt repos in: /etc/apt/sources.list.d/vscode.list 
           1: deb [arch=amd64,arm64,armhf] http://packages.microsoft.com/repos/code stable main
Info:      Processes: 361 Uptime: 2m Memory: 31.27 GiB used: 2.23 GiB (7.1%) Init: systemd v: 245 runlevel: 5 Compilers: 
           gcc: 9.4.0 alt: 9 clang: 10.0.0-4ubuntu1 Shell: bash v: 5.0.17 running in: guake inxi: 3.0.38 

Code: Select all

$ sudo lshw -C video  
  
*-display
	 description: VGA compatible controller
	 product: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
	 vendor: Advanced Micro Devices, Inc. [AMD/ATI]
	 physical id: 0
	 bus info: pci@0000:2f:00.0
	 version: c1
	 width: 64 bits
	 clock: 33MHz
	 capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
	 configuration: driver=amdgpu latency=0
	 resources: irq:75 memory:d0000000-dfffffff memory:e0000000-e01fffff ioport:f000(size=256) memory:fce00000-fce7ffff memory:c0000-dffff

Code: Select all

$ ls /etc/modprobe.d/

alsa-base.conf
amd64-microcode-blacklist.conf
blacklist-ath_pci.conf
blacklist.conf
blacklist-firewire.conf
blacklist-framebuffer.conf
blacklist-modem.conf
blacklist-oss.conf
blacklist-rare-network.conf
dkms.conf
intel-microcode-blacklist.conf
iwlwifi.conf

Code: Select all

$ cat /var/log/Xorg.0.log | nc termbin.com 9999
https://termbin.com/vqgg

Code: Select all

$ journalctl -b | nc termbin.com 9999
https://termbin.com/uf8s
DrunkenContender
Level 1
Level 1
Posts: 13
Joined: Fri May 13, 2022 5:28 pm

Re: Trouble with AMD GPU (again)

Post by DrunkenContender »

Just to add some more info, in the previous weeks there were a couple changes that I made with regards to external hardware:

I have an extra computer connected via HDMI to the same two monitors that this computer is connected via DisplayPort. I have an USB KVM switch where I connect my keyboard and mouse, which is connected to both computers. Via clicking the button on the KVM, the switch toggles the USB connection between both computers. I also have a software called display-switch which detects whenever this KVM is connected or disconnected and sends DDC messages to both my displays in order to switch to/from HDMI/DisplayPort accordingly. It allows me to have two computers running at the same time, and share displays, keyboard and mouse with either one of them. It's pretty cool, when I can get my AMD GPU to work of course.

Since this was recent to the system, I removed the HDMI connections, removed the KVM switch and connected the keyboard and mouse directly to the desktop computer, just to see if it had anything to do with it. I tried rebooting several times and sometimes GPU drivers were loaded, sometimes they weren't. So it doesn't seem that this has anything to do with the problems.

I also switched from wired mouse to wireless mouse a few weeks ago. I just tried with the old wired mouse, and same thing, it still doesn't load GPU drivers in a random way.

I went to the BIOS and disabled Legacy USB compatibility, as a shot in the dark. It had no effect on anything.

I did all of this because I noticed the hid-generic 0003:12BA:00FF.0001: No inputs registered, leaving message, and also comparing journalctl outputs from good/bad boots it seems that USB devices are loaded sometimes before and sometimes after AMD GPU drivers are (or aren't) loaded. But I just saw that the hid-generic messages appear on both good/bad boots, so I don't know.

I'm just trying random stuff since I'm lost.
User avatar
SMG
Level 25
Level 25
Posts: 31988
Joined: Sun Jul 26, 2020 6:15 pm
Location: USA

Re: Trouble with AMD GPU (again)

Post by SMG »

DrunkenContender wrote: Thu May 26, 2022 7:44 amI updated the BIOS to the latest non-beta available, I rebooted and it went straight to Windows, I had to re-set booting order in the BIOS to "ubuntu" instead of Windows otherwise it didn't reach grub.
Other people have noticed that BIOS/UEFI settings sometimes reset themselves after an update. It's possible that is what happened here.
DrunkenContender wrote: Thu May 26, 2022 7:44 amIf everything's fine consistently in the next few days, I will report back and mark as solved. I hope this delay in marking as solved is fine, given that my system has been a bit finicky as of late.
There is no time requirement for marking a toic solved. Presuming we get to a solution, you can test it for a couple of weeks or longer before marking it solved, if you want.
DrunkenContender wrote: Thu May 26, 2022 9:16 amHere's the diagnostics for no drivers, no recovery mode:
When the drivers do not load, there is really no value in checking anything except journalctl. Once you check one method which indicates the drivers did not load, there is no real need to double-check by other methods to verify the same conclusion. We already have plenty of data when they do not load. We need to figure out why they are not loading.

I did comparison of the two logs and they diverged when I reached this point. The amdgpu driver is not able to initialize the hardware.

Code: Select all

May 26 14:31:54 newmint kernel: amdgpu 0000:2f:00.0: amdgpu: Failed to get overdrive table!
May 26 14:31:54 newmint kernel: amdgpu 0000:2f:00.0: amdgpu: Failed to setup default OD settings!
May 26 14:31:54 newmint kernel: [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <smu> failed -62
May 26 14:31:54 newmint kernel: amdgpu 0000:2f:00.0: amdgpu: amdgpu_device_ip_late_init failed
May 26 14:31:54 newmint kernel: amdgpu 0000:2f:00.0: amdgpu: Fatal error during GPU init
May 26 14:31:54 newmint kernel: amdgpu 0000:2f:00.0: amdgpu: amdgpu: finishing device.
May 26 14:31:54 newmint kernel: Console: switching to colour dummy device 80x25
May 26 14:31:54 newmint kernel: [drm] free PSP TMR buffer
May 26 14:31:54 newmint kernel: [drm] amdgpu: ttm finalized
May 26 14:31:54 newmint kernel: amdgpu: probe of 0000:2f:00.0 failed with error -62
I went searching on that error code and found this bug report RX6600XT amdgpu suse tumbleweed failed to init hw_init of IP block <smu> failed -62. The original poster ended up closing the report without a resolution because they were also having issues in Windows 11 with the GPU and figured it was a hardware problem.

However, there is a suggestion from a developer to try the amdgpu.aspm=0 kernel parameter. You can use the following information How to add a kernel parameter to temporarily or permanently add the parameter. When you add it temporarily, it adds it only for that current boot cycle, so if, for any reason, it seems to cause issues or it does not work, all you need to do is reboot to clear it. However, given it sometimes works and sometimes not, a true test may only happen (should it work) but running with it as a permanent change for a while.

ASPM = Active State Power Management. From drm/amdgpu AMDgpu driver :
aspm (in

To disable ASPM (1 = enable, 0 = disable). The default is -1 (auto, enabled).
I also found this topic on the ArchLinux forum [Solved] amdgpu issue with 5600 XT, poor performance. The error is not the same, but there are a couple of people with 5700 XTs in the topic not having problems. The OP ended up changing the riser cable to clear the issue.
Image
A woman typing on a laptop with LM20.3 Cinnamon.
DrunkenContender
Level 1
Level 1
Posts: 13
Joined: Fri May 13, 2022 5:28 pm

Re: Trouble with AMD GPU (again)

Post by DrunkenContender »

SMG wrote: Thu May 26, 2022 2:57 pm However, there is a suggestion from a developer to try the amdgpu.aspm=0 kernel parameter.
Thank you, I will try that!
SMG wrote: Thu May 26, 2022 2:57 pm The OP ended up changing the riser cable to clear the issue.
I my GPU is plugged straight into the motherboard. That being said, if the kernel parameter doesn't fix it (I don't like trying many things at once, it makes it harder to see which thing affected what), I will take a look into the physical positioning of the GPU in the case. It's one of those three-fan longbois that might droop a bit at the end, and I guess that might affect connectivity.
DrunkenContender
Level 1
Level 1
Posts: 13
Joined: Fri May 13, 2022 5:28 pm

Re: Trouble with AMD GPU (again)

Post by DrunkenContender »

I tried these kernel parameters, alone and in some combinations: amdgpu.aspm=0 amdgpu.dpm=0 amdgpu.dc=0 iommu=soft pci=noats and none of them fixed my problem. Some made the resolution native but still software rendering and one display (like recovery mode), others added visual glitches to the text before login page, others stuck forever before login. I found some of those parameters by researching the errors I get, and some of them made it so that the errors found in journalctl were different. But no way to get the drivers to load consistently.

I finally tried reverting to the latest 5.4 kernel and now it works consistently. I must have rebooted about 20 times and the GPU drivers were loaded every time. It always spits out a lot of errors before login and takes like two minutes, but once it gets to the login page it just works as expected.

I don't like the fact that I have to stay with an older kernel (even though it's long term supported, right?), and that I will forever be wary of kernel updates, but it works!

These are the errors that are repeating constantly:

Code: Select all

kernel: amdgpu: [powerplay] failed send message: NumOfDisplays (64)         param: 0x00000002 response 0xffffffc2
kernel: amdgpu: [powerplay] failed send message: NumOfDisplays (64)         param: 0x00000002 response 0xfffffffb
kernel: amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31)         param: 0x00000000 response 0xfffffffb
kernel: amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31)         param: 0x00020000 response 0xfffffffb
These errors appear also after boot is done, forever, every 5 or 10 seconds.

Code: Select all

$ journalctl -b | wc -l
9415
$ journalctl -b | grep amd | grep "failed send" | wc -l
5329
$ uptime -p
up 1 hour, 43 minutes
I will look around to see if there are any fixes for those errors. I was thinking of trying other kernel versions, but for Mint anything other than 5.4 or 5.13 is EOL, so I don't know... It's not a big priority, but I'll do some research. I'm assuming anything else regarding these errors should be in a new thread?
User avatar
SMG
Level 25
Level 25
Posts: 31988
Joined: Sun Jul 26, 2020 6:15 pm
Location: USA

Re: Trouble with AMD GPU (again)

Post by SMG »

DrunkenContender wrote: Sat May 28, 2022 6:56 amI don't like the fact that I have to stay with an older kernel (even though it's long term supported, right?), and that I will forever be wary of kernel updates, but it works!
It receives support until April 2025 which matches the end of support date for Linux Mint 20 versions.

You do have a Ryzen 3000 series so if you can get this to work with a newer kernel than 5.4, I recommend the newer kernel.
DrunkenContender wrote: Sat May 28, 2022 6:56 amThese are the errors that are repeating constantly:

Code: Select all

kernel: amdgpu: [powerplay] failed send message: NumOfDisplays (64)         param: 0x00000002 response 0xffffffc2
kernel: amdgpu: [powerplay] failed send message: NumOfDisplays (64)         param: 0x00000002 response 0xfffffffb
kernel: amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31)         param: 0x00000000 response 0xfffffffb
kernel: amdgpu: [powerplay] failed send message: GetMaxDpmFreq (31)         param: 0x00020000 response 0xfffffffb
I checked the first message and found this bug report on an RX 5700 Bug 204609 - amdgpu: powerplay failed send message where people seem to indicate it relates to having two displays attached using Displayport.

I see a link in that to this bug report for a 5700 XT AMD Navi10 GPU powerplay issues when using two DisplayPort connectors. That person mentions using one DP and one HDMI cleared the issue for them. I saw a comment in the first link where one person mentioned that did not help them, but booting with one monitor attached and then connecting the second one after boot did help.

Some people mention using an RX 5700 XT and some just an RX 5700 so watch to see what a person is using when they say something works.

I just read through the whole (quite long) page and I see this recommendation that worked for many people.
That said I've reliably worked around this bug completely by setting the KMS video mode on boot with video=DP-1:2560x1440@144 video=DP-2:2560x1440@144
Those are two kernel boot parameters (one for each monitor) and you would modify them to use 60 instead of 144.

Maybe the 5.15 kernel will have this issue fixed and you would not need kernel parameters. I expect it to soon become available in Update Manager.
Image
A woman typing on a laptop with LM20.3 Cinnamon.
Locked

Return to “Graphics Cards & Monitors”