Random freezes after 6 months [SOLVED]

Questions about other topics - please check if your question fits better in another category before posting here
Forum rules
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
Locked
anthalamus
Level 1
Level 1
Posts: 26
Joined: Mon Oct 15, 2012 5:59 pm

Random freezes after 6 months [SOLVED]

Post by anthalamus »

Hi all,

I recently put together a desktop computer for heavy data processing, and installed mint MATE edition on it.

Everything was working smoothly for months until a few weeks ago, where the system started to freeze seemingly randomly. It does so now on average every 3 days (+/- 2 days). I hadn't done anything noteworthy around that time, such as an upgrade or anything of the sort.

At first I thought it might be the CPU overheating so I upgraded the CPU fan to a much better one, but that didn't help. I also started monitoring the temps with the "sensors" applet, but at least based on the readings the cores rarely exceed 60C (even as captured by the display "freeze"). Could it just spark so fast that it freezes the computer before the sensors capture it?

I also ran

Code: Select all

sudo apt-get install linux-image-generic
to see if it would help, but to avail.

When I look at /var/log/syslog or /var/log/dmesg, I don't see any obvious error message (though the latter is a bit too cryptic for me)

The puzzling part for me is that it was working perfectly for months... Any help in debugging this would be appreciated!

PS1: the only thing that keeps on working after freezing is the music, for about a minute or so and then it stops
PS2: magic key + REISUB does seem to work and restarts successfully
PS3: output for inxi -Fxxxz:

Code: Select all

System:
  Kernel: 5.4.0-73-generic x86_64 bits: 64 compiler: gcc v: 9.3.0 
  Desktop: MATE 1.24.0 info: mate-panel wm: marco 1.24.0 dm: LightDM 1.30.0 
  Distro: Linux Mint 20 Ulyana base: Ubuntu 20.04 focal 
Machine:
  Type: Desktop Mobo: ASUSTeK model: TUF GAMING X570-PLUS (WI-FI) 
  v: Rev X.0x serial: <filter> UEFI: American Megatrends v: 1407 
  date: 04/01/2020 
CPU:
  Topology: 16-Core (2-Die) model: AMD Ryzen 9 3950X bits: 64 
  type: MT MCP MCM arch: Zen L2 cache: 8192 KiB 
  flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm 
  bogomips: 223582 
  Speed: 2114 MHz min/max: 2200/3500 MHz boost: enabled Core speeds (MHz): 
  1: 1875 2: 2793 3: 2196 4: 2193 5: 2195 6: 2196 7: 2196 8: 2195 9: 1977 
  10: 2421 11: 2051 12: 2075 13: 1923 14: 2195 15: 2196 16: 2190 17: 2189 
  18: 2186 19: 2191 20: 2194 21: 2195 22: 2196 23: 2190 24: 1908 25: 2197 
  26: 2192 27: 2194 28: 2192 29: 2192 30: 2195 31: 2196 32: 2196 
Graphics:
  Device-1: NVIDIA GK208B [GeForce GT 710] vendor: Gigabyte driver: nouveau 
  v: kernel bus ID: 08:00.0 chip ID: 10de:128b 
  Display: x11 server: X.Org 1.20.8 driver: modesetting unloaded: fbdev,vesa 
  compositor: marco v: 1.24.0 resolution: 1920x1080~60Hz 
  OpenGL: renderer: NV106 v: 4.3 Mesa 20.0.4 direct render: Yes 
Audio:
  Device-1: NVIDIA GK208 HDMI/DP Audio vendor: Gigabyte 
  driver: snd_hda_intel v: kernel bus ID: 08:00.1 chip ID: 10de:0e0f 
  Device-2: AMD Starship/Matisse HD Audio vendor: ASUSTeK 
  driver: snd_hda_intel v: kernel bus ID: 0a:00.4 chip ID: 1022:1487 
  Sound Server: ALSA v: k5.4.0-73-generic 
Network:
  Device-1: Intel Wireless-AC 9260 driver: iwlwifi v: kernel bus ID: 03:00.0 
  chip ID: 8086:2526 
  IF: wlp3s0 state: up mac: <filter> 
  Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet 
  vendor: ASUSTeK driver: r8169 v: kernel port: f000 bus ID: 04:00.0 
  chip ID: 10ec:8168 
  IF: enp4s0 state: down mac: <filter> 
Drives:
  Local Storage: total: 1.86 TiB used: 1.38 TiB (74.1%) 
  ID-1: /dev/sda vendor: Samsung model: SSD 860 PRO 2TB size: 1.86 TiB 
  speed: 6.0 Gb/s serial: <filter> rev: 2B6Q scheme: GPT 
Partition:
  ID-1: / size: 1.83 TiB used: 1.38 TiB (75.3%) fs: ext4 dev: /dev/sda2 
Sensors:
  System Temperatures: cpu: 49.2 C mobo: N/A gpu: nouveau temp: 45 C 
  Fan Speeds (RPM): N/A gpu: nouveau fan: 2760 
Info:
  Processes: 465 Uptime: 35m Memory: 62.79 GiB used: 2.06 GiB (3.3%) 
  Init: systemd v: 245 runlevel: 5 Compilers: gcc: 9.3.0 alt: 9 Shell: bash 
  v: 5.0.16 running in: mate-terminal inxi: 3.0.38 
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 2 times in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
User avatar
SMG
Level 25
Level 25
Posts: 32007
Joined: Sun Jul 26, 2020 6:15 pm
Location: USA

Re: Random freezes after 6 months

Post by SMG »

A number of people with 3rd gen Ryzens have found that upgrading to the 5.8 kernel has cleared freeze issues they were having. It's available in Update Manager.

Open Update Manager. Select View > Linux Kernels and click Continue. Make sure 5.8 is selected on the left panel and then click the top-most option on the right panel. An "Install" button will appear. Install the kernel and then reboot for it to become active.

The kernel and nouveau driver are updated on a regular basis, so it's possible there was a change which affected your system even though the change may not have been "noteworthy".

Once your root partition starts getting full, that can affect performance. You are currently at 75%, so depending upon what you are running, your system may be impacted. Here are some tips on How to Clean Linux Mint Safely.

Partition:
ID-1: / size: 1.83 TiB used: 1.38 TiB (75.3%) fs: ext4 dev: /dev/sda2
Image
A woman typing on a laptop with LM20.3 Cinnamon.
anthalamus
Level 1
Level 1
Posts: 26
Joined: Mon Oct 15, 2012 5:59 pm

Re: Random freezes after 6 months

Post by anthalamus »

Thanks a lot for the prompt answer SMG!
SMG wrote: Thu May 20, 2021 2:48 pm A number of people with 3rd gen Ryzens have found that upgrading to the 5.8 kernel has cleared freeze issues they were having. It's available in Update Manager.
I saw some answers mentioning that as well, but the reason I'm hesitant is that I was working fine before. See next item.
SMG wrote: Thu May 20, 2021 2:48 pm The kernel and nouveau driver are updated on a regular basis, so it's possible there was a change which affected your system even though the change may not have been "noteworthy".
But would they update without my initiating it? I really don't remember updating either manually, and to my knowledge I didn't set it up to do so automatically.
SMG wrote: Thu May 20, 2021 2:48 pm Once your root partition starts getting full, that can affect performance. You are currently at 75%, so depending upon what you are running, your system may be impacted. Here are some tips on How to Clean Linux Mint Safely.
Good to know! I wasn't aware of that, or at least I would have put the bar much closer to 100%.
User avatar
SMG
Level 25
Level 25
Posts: 32007
Joined: Sun Jul 26, 2020 6:15 pm
Location: USA

Re: Random freezes after 6 months

Post by SMG »

anthalamus wrote: Thu May 20, 2021 2:56 pm
SMG wrote: Thu May 20, 2021 2:48 pm The kernel and nouveau driver are updated on a regular basis, so it's possible there was a change which affected your system even though the change may not have been "noteworthy".
But would they update without my initiating it? I really don't remember updating either manually, and to my knowledge I didn't set it up to do so automatically.
You are running the latest 5.4 kernel, so you are obviously updating your kernel. There are driver-related files in the kernel and I believe nouveau also updates through Update Manager just like the kernel.
anthalamus wrote: Thu May 20, 2021 2:56 pm
SMG wrote: Thu May 20, 2021 2:48 pm Once your root partition starts getting full, that can affect performance. You are currently at 75%, so depending upon what you are running, your system may be impacted. Here are some tips on How to Clean Linux Mint Safely.
Good to know! I wasn't aware of that, or at least I would have put the bar much closer to 100%.
I do not know the exact cut-off, but once you start getting into the low 90's, you will get a message from Mint letting you know your system is getting full. People have posted here letting us know they ignored that message and they are posting because they can no longer log into Mint. ;)

If you run an SSD, recommendations are to keep it less than 75% full or the system will start slowing. One can go higher with an HDD, but are still temporary files needed while running an operating system which have to go somewhere.
Image
A woman typing on a laptop with LM20.3 Cinnamon.
anthalamus
Level 1
Level 1
Posts: 26
Joined: Mon Oct 15, 2012 5:59 pm

Re: Random freezes after 6 months

Post by anthalamus »

SMG wrote: Thu May 20, 2021 3:33 pm
anthalamus wrote: Thu May 20, 2021 2:56 pm
SMG wrote: Thu May 20, 2021 2:48 pm The kernel and nouveau driver are updated on a regular basis, so it's possible there was a change which affected your system even though the change may not have been "noteworthy".
But would they update without my initiating it? I really don't remember updating either manually, and to my knowledge I didn't set it up to do so automatically.
You are running the latest 5.4 kernel, so you are obviously updating your kernel. There are driver-related files in the kernel and I believe nouveau also updates through Update Manager just like the kernel.
I updated it once after the issue first arose, but not before that actually.

Anyway it looks like trying 5.8 is my best shot at this point anyway. I just wish there had be some log, somewhere I could fine to understand better what happened!
User avatar
SMG
Level 25
Level 25
Posts: 32007
Joined: Sun Jul 26, 2020 6:15 pm
Location: USA

Re: Random freezes after 6 months

Post by SMG »

anthalamus wrote: Thu May 20, 2021 5:23 pmI updated it once after the issue first arose, but not before that actually.
Updates include bug fixes and patches which is why we recommend updating one's install.
anthalamus wrote: Thu May 20, 2021 5:23 pmI just wish there had be some log, somewhere I could fine to understand better what happened!
If you have an idea of the time it happened, you could check journalctl: Using journalctl

Journalclt contains a LOT of information. Without knowing why the freezes were happening or specifically when, it can often be a needle in a haystack type of situation to find out what may have happened. If you have an idea of when, then that would narrow down the messages where the issue might be. If you have an idea of what might have been causing them, then it might be possible to grep for strings in journalctl which would also help narrow down the total number of messages.

Based on feedback I've received from others on other threads, I would suggest trying the 5.8 kernel first. Then if there are still issues, you can check journalctl.

Presuming a freeze happens and you have to power off the computer to get it unstuck, I usually recommend running
journalctl -rb -1 | nc termbin.com 9999
That command prints the journal log information of the prior boot cycle (the one which you stopped with the power button) in reverse order (so the reason for the freeze should be near the top) and sends it to termbin. It will return with a url address that you should post here if you want us to help you analyze it.
Image
A woman typing on a laptop with LM20.3 Cinnamon.
anthalamus
Level 1
Level 1
Posts: 26
Joined: Mon Oct 15, 2012 5:59 pm

Re: Random freezes after 6 months

Post by anthalamus »

SMG wrote: Thu May 20, 2021 5:48 pm
anthalamus wrote: Thu May 20, 2021 5:23 pmI updated it once after the issue first arose, but not before that actually.
Updates include bug fixes and patches which is why we recommend updating one's install.
anthalamus wrote: Thu May 20, 2021 5:23 pmI just wish there had be some log, somewhere I could fine to understand better what happened!
If you have an idea of the time it happened, you could check journalctl: Using journalctl

Journalclt contains a LOT of information. Without knowing why the freezes were happening or specifically when, it can often be a needle in a haystack type of situation to find out what may have happened. If you have an idea of when, then that would narrow down the messages where the issue might be. If you have an idea of what might have been causing them, then it might be possible to grep for strings in journalctl which would also help narrow down the total number of messages.
Oh I think that did the trick as far as pinpointing the problem actually, the first lines returned by the command are:

Code: Select all

May 20 13:50:30 XXX kernel: sysrq: Emergency Sync
May 20 13:50:29 XXX kernel: sysrq: This sysrq operation is disabled.
May 20 13:50:29 XXX kernel: sysrq: This sysrq operation is disabled.
May 20 13:50:29 XXX kernel: sysrq: This sysrq operation is disabled.
May 20 13:50:02 XXX kernel: nouveau 0000:08:00.0: Xorg[1310]: channel 2 killed!
May 20 13:50:02 XXX kernel: nouveau 0000:08:00.0: fifo: engine 6: scheduled for recovery
May 20 13:50:02 XXX kernel: nouveau 0000:08:00.0: fifo: engine 0: scheduled for recovery
May 20 13:50:02 XXX kernel: nouveau 0000:08:00.0: fifo: runlist 0: scheduled for recovery
May 20 13:50:02 XXX kernel: nouveau 0000:08:00.0: fifo: channel 2: killed
May 20 13:50:02 XXX kernel: nouveau 0000:08:00.0: fifo: fault 00 [READ] at 0000000002761000 engine 00 [GR] client 01 [GPC0/T1_0] reason 02 [PTE] on channel 2 [007fb1a000 Xorg[1310]]
May 20 13:50:02 XXX kernel: nouveau 0000:08:00.0: gr: GPC0/TPC0/TEX: 80000049
May 20 13:50:02 XXX kernel: nouveau 0000:08:00.0: gr: TRAP ch 2 [007fb1a000 Xorg[1310]]
And I suspect "Xorg[1310]: channel 2 killed!" is the culprit (the following sysrq entries are my doing REISUB). So that would definitely point to nouveau being the issue here, although again I don't think I actually changed anything pertaining to nouveau past the initial setup.

Any clue as to how to address this specific problem? Updating nouveau itself? Or shall I still just go with the kernel update altogether?
User avatar
SMG
Level 25
Level 25
Posts: 32007
Joined: Sun Jul 26, 2020 6:15 pm
Location: USA

Re: Random freezes after 6 months

Post by SMG »

anthalamus wrote: Thu May 20, 2021 8:01 pmOh I think that did the trick as far as pinpointing the problem actually, the first lines returned by the command are:
I would probably go back further in time to see if there was a reason the nouveau driver tripped. Message are usually sequential so you want to go back to the point where the error messages started rather than what the last message was.
anthalamus wrote: Thu May 20, 2021 8:01 pmAny clue as to how to address this specific problem? Updating nouveau itself? Or shall I still just go with the kernel update altogether?
You indicated you are doing heavy data processing. That means at a minimum using the kernel best suited for your hardware. It's possible upgrading to 5.8 may help clear the nouveau issue.

I checked nouveau code names and the Nvidia GK208B is listed as GT720. It can use the latest available Nvidia drivers which means your system would be using the newer Mesa available in Mint which may provide better stability. If the newer kernel does not help, I would suggest taking a Timeshift snapshot and then trying the Nvidia-450 driver. (Some people have had issues with the Nvidia-460 while others have not. Since you were not even using an Nvidia driver, I figured I'd recommend the Nvidia-450 since it seems to produce less overall issues.)
Image
A woman typing on a laptop with LM20.3 Cinnamon.
anthalamus
Level 1
Level 1
Posts: 26
Joined: Mon Oct 15, 2012 5:59 pm

Re: Random freezes after 6 months

Post by anthalamus »

Just an update that I ended up upgrading to `5.4.0-74-generic` (via `linux-image-generic`) and that it seems to have solved the problem. It's been running smoothly for a few weeks now.

A big thanks to you, SMG, for all the suggestions!
User avatar
SMG
Level 25
Level 25
Posts: 32007
Joined: Sun Jul 26, 2020 6:15 pm
Location: USA

Re: Random freezes after 6 months

Post by SMG »

If you feel the issue is resolved, please go to your first post in this thread, click the pencil icon, and add [SOLVED] to the title so others know you are no longer seeking help on this issue.
Image
A woman typing on a laptop with LM20.3 Cinnamon.
anthalamus
Level 1
Level 1
Posts: 26
Joined: Mon Oct 15, 2012 5:59 pm

Re: Random freezes after 6 months [SOLVED]

Post by anthalamus »

Done, thanks
motoryzen
Level 10
Level 10
Posts: 3497
Joined: Sun Dec 08, 2019 12:25 am

Re: Random freezes after 6 months [SOLVED]

Post by motoryzen »

Wow. It's amazing and odd at the same time that a SUB-model driver version can make all the difference. *scratches chin curiously*

Happy you got it resolved..regardless :D
Mint 21.2 Cinnamon 5.8.4
asrock x570 taichi ...bios p5.00
ryzen 5900x
128GB Kingston Fury @ 3600mhz
Corsair mp600 pro xt NVME ssd 4TB
three 4TB ssds
dual 1TB ssds
Two 16TB Toshiba hdd's
24GB amd 7900xtx vid card
Viewsonic Elite UHD 32" 144hz monitor
Locked

Return to “Other topics”