[SOLVED] High to critical CPU temp, despite low usage

Questions about hardware, drivers and peripherals
Forum rules
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
GreenIsBest
Level 1
Level 1
Posts: 48
Joined: Sun Sep 19, 2021 11:54 am

[SOLVED] High to critical CPU temp, despite low usage

Post by GreenIsBest »

[Conclusion TL;DR] The problem was with physical hardware itself (chips, fans, etc) wich makes sense given the age of the machine. The advice and tools suggested allowed to gradually exclude possible causes. As one final test, I mounted my old leftover Windows partition and performed a factory reset to Win7 (on that partition alone) and confirmed that it still generated more heat than it should. If the shipped OS isn't "optimized conditions", then nothing is; clearing all remaining doubt about the cause of the problem.

[Update 25 Oct 2021] I'm happy to report the heat problems are now solved; after deep cleaning with compressed air and new thermal paste, followed by a semi-improvised hardware upgrade.This alone brought the temperature down by ~10-15 degress, even tough it wasn't that dirty in there. Then decided to experiment a little. Because officially supported parts are no longer sold, I replaced the old fan with a newer one I had laying around in my scrap box, and it worked just fine. Aparentely, fans don't have firmware or compatibility issues, so you can just slap whatever random fan you want and it will work; as long as it fits in there.
Now, according to the $/sensors command, the laptop runs at a very comfy 35-50 degrees on idle, and it takes full stress to push it above 70.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Greetings.
I'm a Windows power(ish) user (given how dumbed-down the concept of "normal user" has become) and I'm no stranger to command terminals (grew up with ZXSpectrum and MSDOS). On the oposite corner, I can't code anything "from the ground up" and I'm (almost) a complete dummy to Linux.
I recently installed Mint 20.1 Cinnamon on a old machine, read the entire easylinuxguide (thank you Pjotr ^^), followed it's advice, allways install every update as soon as it arrives, and when in doupt I follow the "trust the defaults" approach.
I'm loving it and the machine is running like a charm, except for 1 problem.

The device ocasionally crashes with a "CPU critical temp" warning, and the /sensors command reports that the CPU temp is unusually high (50-60º), even when on idle. Said device is a Samsung RF 511 laptop from 2012, with a Quad Core Intel Core i7-2630QM; sure it is old, but it still packed a punch even on the latest Win10.
The full specs, as show by running /inxy -FxZ, are as follows.

Code: Select all

System:
  Kernel: 5.4.0-84-generic x86_64 bits: 64 compiler: gcc v: 9.3.0 
  Desktop: Cinnamon 4.8.6 Distro: Linux Mint 20.1 Ulyssa 
  base: Ubuntu 20.04 focal 
Machine:
  Type: Laptop System: SAMSUNG product: RF511/RF411/RF711 v: 11HX 
  serial: <filter> 
  Mobo: SAMSUNG model: RF511/RF411/RF711 v: 11HX serial: <filter> 
  BIOS: American Megatrends v: 11HX.M036.20110601.SSH date: 06/01/2011 
Battery:
  ID-1: BAT1 charge: 37.7 Wh condition: 37.7/57.7 Wh (65%) 
  model: SAMSUNG Electronics status: Full 
CPU:
  Topology: Quad Core model: Intel Core i7-2630QM bits: 64 type: MT MCP 
  arch: Sandy Bridge rev: 7 L2 cache: 6144 KiB 
  flags: avx lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 31929 
  Speed: 798 MHz min/max: 800/2900 MHz Core speeds (MHz): 1: 798 2: 798 
  3: 798 4: 798 5: 798 6: 798 7: 798 8: 798 
Graphics:
  Device-1: Intel 2nd Generation Core Processor Family Integrated Graphics 
  vendor: Samsung Co driver: i915 v: kernel bus ID: 00:02.0 
  Device-2: NVIDIA GF108M [GeForce GT 540M] vendor: Samsung Co 
  driver: nvidia v: 390.144 bus ID: 01:00.0 
  Display: x11 server: X.Org 1.20.11 driver: modesetting,nvidia 
  unloaded: fbdev,nouveau,vesa resolution: 1366x768~60Hz 
  OpenGL: renderer: GeForce GT 540M/PCIe/SSE2 v: 4.6.0 NVIDIA 390.144 
  direct render: Yes 
Audio:
  Device-1: Intel 6 Series/C200 Series Family High Definition Audio 
  vendor: Samsung Co driver: snd_hda_intel v: kernel bus ID: 00:1b.0 
  Device-2: NVIDIA GF108 High Definition Audio driver: snd_hda_intel 
  v: kernel bus ID: 01:00.1 
  Sound Server: ALSA v: k5.4.0-84-generic 
Network:
  Device-1: Broadcom and subsidiaries BCM4313 802.11bgn Wireless Network 
  Adapter 
  vendor: Askey driver: bcma-pci-bridge v: N/A port: d000 bus ID: 02:00.0 
  Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet 
  vendor: Samsung Co driver: r8169 v: kernel port: b000 bus ID: 03:00.0 
  IF: enp3s0 state: down mac: <filter> 
  IF-ID-1: wlp2s0b1 state: down mac: <filter> 
Drives:
  Local Storage: total: 465.76 GiB used: 22.34 GiB (4.8%) 
  ID-1: /dev/sda vendor: Toshiba model: HDWK105 size: 465.76 GiB 
Partition:
  ID-1: / size: 92.80 GiB used: 22.34 GiB (24.1%) fs: ext4 dev: /dev/sda6 
Sensors:
  System Temperatures: cpu: 58.0 C mobo: 58.0 C gpu: nvidia temp: 50 C 
  Fan Speeds (RPM): N/A 
Info:
  Processes: 236 Uptime: 30m Memory: 5.72 GiB used: 926.9 MiB (15.8%) 
  Init: systemd runlevel: 5 Compilers: gcc: 9.3.0 Shell: bash v: 5.0.17 
  inxi: 3.0.38 
In the System Monitor app, the usual CPU load is around 2-5% on any give core, very rarely goes above 10-15%, ocasionally spiking to 80-100% when doing a doing heavy tasks, and usually only 1-2 of the cores reach that much at any given time. Despite this, the lowest temperature ever reported by the /sensors command is 42ºC.
These are the temps I measured while doing different "test" tasks (Tested browser is Firefox in icognito):
*Idle (with airplane mode) 55-47-42-44
*Browsing updates 63-61-65-58
*Writing in this forum (now) 58-48-45-46
*Listening to music on Youtube (static video) 64-57-54-54
*Watching a heavy video on Youtube in full 1080p 76-71-67-66
*Installing a App (GIMP) 83-84-70-65
*Using said App (GIMP) in its default installed settings 63-56-54-54

Additional info/details that might be usefull:
*This CPU has had new thermal paste applied less than 12 months ago
*This laptop is using a HDD less than 12 months old
*The insides and vents are clean and free of lint or dust
*The swapiness is set to the values recomended in Pjotr's guide; the System Monitor however reports a swap usage of zero (should it?).
*This device is technically kinda-dual-boot, as it's HDD still contains a "carcass" of a old Win10 wich became damaged and no longer works boots past the error screen. Still in the process of cleaning after that mess. The partition containing it is never mounted outside of ocasional manual copy/recovery of files.
*Only the system partition is mounted (100Gb Ext format).
*This laptop came with both a integrated Intel Graphics and a Nvidia Geforce GT 540M; it currently using the recomended closed-source Nvidia driver (as recomended by the driver manager).
*My Cinnamon desktop has no widgets, only the built-in system applets, and all animations are turned off.
*Don't know if this is relevant, but in System Monitor it shows "Pulse Audio" with VeryHigh priority (nice = -11)
*The process refered to as "ROD" (don't know what is) sometimes shows a CPU usage >100% (???), ranging from 111% to 203%. Still searching what this is/means, but it causes no slowness nor increase in noise/temperature.

These temperatures seem unreasonably high, especially on idle. I read in this forum that switching to a older kernel could improve results on older machines, so I tried running the kernel that originally came with my USB install (5.4.0.58). Measured the temp on the same "test situations", with the same app versions, and difference was negligeble (2-5ºC less).
My next attempt will be trying the Xfce desktop, as I hear it is much lighter, and if all else fails, try a lighter distro altogether.
I'd rather not have to resort to another distro, as Mint really fits my wants/needs like a silk glove.

PS: This is my 1st post here, so I apologize in advance for any blunders or shortcomings; I did read the rules and guidelines but still...
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 5 times in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: High to critical CPU temp, despite low usage

Post by rene »

Are you helped by adding the "intel_pstate=disable" kernel parameter? Please see viewtopic.php?f=42&t=349669 as to how to.
User avatar
senjoz
Level 5
Level 5
Posts: 902
Joined: Tue Jun 09, 2020 3:55 am
Location: Kamnik

Re: High to critical CPU temp, despite low usage

Post by senjoz »

Welcome to the Forum, GreenIsBest!

Although you did not specify the temperature of the environment in which your laptop is working, CPU temperatures you mentioned in your post are not very unusual to me. Processor Core i7-2630QM has thermal design power 45 W and junction temperature 100 ºC. If the processor is working, a lot of heat must be dissipated.

GreenIsBest wrote: Wed Sep 22, 2021 9:47 am The device ocasionally crashes with a "CPU critical temp" warning
Please describe more in detail what happens at those occasions, where you get that warning and what you do to get Mint working again.

If you wish to test your laptop how it behaves under heavy load, I suggest you next. Install applications powerstat and stress. You can do this with Synaptic Package Manager. After that, close all opened applications and open four terminals. In the first terminal run command watch sensors. In the second terminal run command sudo powerstat 5 150 -RD (your Sandy Bridge processor should support RAPL statistics, read man powerstat). Check temperatures in sensors and Watts in powerstat. In the third terminal run command watch -n0.1 "cat /proc/cpuinfo | grep \"^[c]pu MHz\"". Check temperatures in sensors, Watts in powerstat and cpu MHz in the third terminal. In fourth terminal run command stress -c 8 -t 600 (stress 8 processor threads for 600 seconds, read man stress). Processor frequency should go above 2.0 GHz. When processor temperature becomes very high (probably close to 90 ºC), frequency should drop to a lower value. If during this test machine will crash, something is wrong. If processor frequency will drop to very low value and temperature will stay at high value, cooling is not adequate.

Regards, Jože

Screenshot from 2021-09-22 21-36-41.png
frank84
Level 2
Level 2
Posts: 67
Joined: Sun Jul 15, 2018 6:56 pm

Re: High to critical CPU temp, despite low usage

Post by frank84 »

Hi GreenIsBest,

I'm no computer expert and just a beginner with Linux Mint. However, I'll try to help by at least mentioning an idea or two. But please remember that due to my lack of computer expertise, I am only making guesses -- in an attempt to be helpful.

I would think that, if you are experiencing unexpectedly high temperatures, one of the first things to consider is whether your CPU heatsink/cooler fan (or any other fans) is/are working properly -- or working at all. The quick search I did regarding the computer model you listed suggests that your device should have a fan. And if that fan has been in there since 2012, I imagine it has made quite a few revolutions and may be near or at its end of life. In your inxi output I noticed this:
Fan Speeds (RPM): N/A
I assume that could possibly mean either your fan is is not in communication with the system, is simply not working at all or, maybe disconnected? I believe a non-working fan could explain the unexpectedly high temperatures you are seeing. Or, if the fan is not communicating properly, then it may not be adjusting its speed properly as your CPU heats up.

Maybe you can use the LM sensors command or an applet to check if the fan speed is adjusting as your CPU works harder. Or you may be able to simply listen closely and hear if the fan is working and adjusting properly.

Just in case any recent modifications were made to fan connections: my understanding is that fans connected with 4 pins can adjust their spin rate/cooling ability as the computer heats up because they can send feedback to the system but, one (e.g., a secondary fan) that might only be connected with a 3-pin connector does not communicate back to the system and may not adjust (at least independently) as the computer heats up.

I am also wondering if your computer worked OK initially after the new thermal paste was added. If it was not applied correctly, or if the CPU cooler was not re-connected/put back in place properly, that could also explain unexpectedly high temperatures.

I hope you find a solution soon -- Good Luck!
GreenIsBest
Level 1
Level 1
Posts: 48
Joined: Sun Sep 19, 2021 11:54 am

Re: High to critical CPU temp, despite low usage

Post by GreenIsBest »

Thank you everyone for the suggestions; I'll hopefully try them all this weekend when I have some free time to dedicate.
In the meanwhile, I stumbled across this discussion on the Ubuntu forums: https://askubuntu.com/questions/247033/ ... ith-ubuntu
It is a very old post from a old version of Ubuntu; but it involves the exact same hardware and complaints (tough in a different brand/machine).
They talk about this "Optimus" and "Bumblebee" thing, and that the 2 GPU's might be conflicting with each other.
Indeed, my machine states that the Nvidia is permanently on "Performance mode" (it was the default setting), even when there is nothing going on to demand GPU power/work.
Do you think this may be the problem I'm having too? Or some variation of it?
I think I'll give it a shot either way, in addition to what you fella's already suggested.
senjoz wrote: Wed Sep 22, 2021 3:58 pm you did not specify the temperature of the environment in which your laptop is working
Depends on season. The summers are allways >27ºC, up to 35-40ºC on unusually hot days. Right now however, its autumn and the temp is only 23ºC.
senjoz wrote: Wed Sep 22, 2021 3:58 pm Processor Core i7-2630QM has thermal design power 45 W and junction temperature 100 ºC. If the processor is working, a lot of heat must be dissipated.
Agreed, but doing what? The i7-2630QM is (or rather, was) a workhorse CPU, so obviously it makes a lot of heat to do a lot of work. But doing what? How much work can it possibly be performing if even on idle the temps are around 50ºC?
senjoz wrote: Wed Sep 22, 2021 3:58 pm Please describe more in detail what happens at those occasions, where you get that warning and what you do to get Mint working again.
Hard to tell, it is all very quick. The screen just goes into a black terminal (like the Grub menu) for barely 2-3 seconds, with a few lines of text describing the problem; saying something along the lines of "CPU core temp critical". Then the machine shuts down by itself. Nothing special to get it back to work; just boot it up again after a few seconds. Seems to just be a shut-down-to-prevent-damage.
frank84 wrote: Wed Sep 22, 2021 6:14 pm either your fan is is not in communication with the system, is simply not working at all or, maybe disconnected?
If its working "properly" it's anyones guess, but the fan does kick in when performing more demanding tasks, and slows down or stops when the device goes back to idle. For example, when I did the "Browse updates" the fan kicked in, and when I did the "install Gimp" it vented even harder. So the fan does react to the demand and temperature. Might as well check if it is reacting "properly".
frank84 wrote: Wed Sep 22, 2021 6:14 pm wondering if your computer worked OK initially after the new thermal paste was added
It sure did, altough it was revealed that the problem was the HDD not the CPU. Long story short, my computer was getting super-hot and I took it to maintenance suspecting that the thermal paste was worn out, but the tech-guy diagnosed it was actually bad-chunks on the HDD.
The extreme heat was because the HDD had to be spun around non-stop to "find" the data scattered all over the disk due to the bad-chunks.
Then again, the HDD was ~9 years old at that point, so not surprising it was so badly degraded. The thermal paste was also 9 years old, so we figured it was best to just "refresh" everything.
Before these repairs, the fan would sometimes blast so hard, it could actually blow a loose post-it out of my desk if placed next to it. After the repairs, it went back to normal, and would only kick in when performing high-demand tasks (like running Minecraft with 200 mods)
famouslastwords

Re: High to critical CPU temp, despite low usage

Post by famouslastwords »

+1 everything @senjoz said.

Temps look fine to me, outside of getting critical temp warnings, which as senjoz already stated people would need more info on what's going on with that. Could be a lot of things, eg: A short in the processor's cooling fan wire, so the fan periodically stops for X-amount of time, causing critical temps and TONS of other scenarios. Are you sitting there using it with a pillow in your lap blocking all the vents when it happens ? That type of thing ... Anyway don't know what else to suggest. CPU's are tough buggers meant to take high temps and a bigtime beating for sustained periods of time.

Oops, saw what frank84 posted and yep a 12yr old fan is old, subject to wearing out. My laptop is 12yrs old or so too, dual-core and gets worse in terms of temp range than what you're seeing, though I suspect what's in my beloved ole beastie can no longer be described as "thermal paste", it's got to be more like fossilized by now, I'm scared to even mess with it. :D

More afterthoughts, although it's clear @rene is a knowledgeable techie, wouldn't bother switching power governors if you're using p_state. Yeah they may cap out/lower your processors freqs ... thus somewhat lower temps, you'd also be losing out on performance. What's the point of having a chip that can hit 2900 MHz, if the governor used caps it out or keeps it at 800 99.8% of the time and the temps you're seeing look fine. I personally prefer the ondemand governor but have learned how to tune it to my tastes. Think the defaults for it stink. There's another 2 cents. :)
Last edited by famouslastwords on Fri Sep 24, 2021 9:09 am, edited 1 time in total.
luca72
Level 2
Level 2
Posts: 97
Joined: Sat Apr 17, 2021 3:10 pm
Location: Milano, Italy

Re: High to critical CPU temp, despite low usage

Post by luca72 »

GreenIsBest wrote: Wed Sep 22, 2021 9:47 am Greetings.
I'm a Windows power(ish) user (given how dumbed-down the concept of "normal user" has become) and I'm no stranger to command terminals (grew up with ZXSpectrum and MSDOS). On the oposite corner, I can't code anything "from the ground up" and I'm (almost) a complete dummy to Linux.
I recently installed Mint 20.1 Cinnamon on a old machine, read the entire easylinuxguide (thank you Pjotr ^^), followed it's advice, allways install every update as soon as it arrives, and when in doupt I follow the "trust the defaults" approach.
I'm loving it and the machine is running like a charm, except for 1 problem.

The device ocasionally crashes with a "CPU critical temp" warning, and the /sensors command reports that the CPU temp is unusually high (50-60º), even when on idle. Said device is a Samsung RF 511 laptop from 2012, with a Quad Core Intel Core i7-2630QM; sure it is old, but it still packed a punch even on the latest Win10.
The full specs, as show by running /inxy -FxZ, are as follows.

Code: Select all

System:
  Kernel: 5.4.0-84-generic x86_64 bits: 64 compiler: gcc v: 9.3.0 
  Desktop: Cinnamon 4.8.6 Distro: Linux Mint 20.1 Ulyssa 
  base: Ubuntu 20.04 focal 
Machine:
  Type: Laptop System: SAMSUNG product: RF511/RF411/RF711 v: 11HX 
  serial: <filter> 
  Mobo: SAMSUNG model: RF511/RF411/RF711 v: 11HX serial: <filter> 
  BIOS: American Megatrends v: 11HX.M036.20110601.SSH date: 06/01/2011 
Battery:
  ID-1: BAT1 charge: 37.7 Wh condition: 37.7/57.7 Wh (65%) 
  model: SAMSUNG Electronics status: Full 
CPU:
  Topology: Quad Core model: Intel Core i7-2630QM bits: 64 type: MT MCP 
  arch: Sandy Bridge rev: 7 L2 cache: 6144 KiB 
  flags: avx lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 31929 
  Speed: 798 MHz min/max: 800/2900 MHz Core speeds (MHz): 1: 798 2: 798 
  3: 798 4: 798 5: 798 6: 798 7: 798 8: 798 
Graphics:
  Device-1: Intel 2nd Generation Core Processor Family Integrated Graphics 
  vendor: Samsung Co driver: i915 v: kernel bus ID: 00:02.0 
  Device-2: NVIDIA GF108M [GeForce GT 540M] vendor: Samsung Co 
  driver: nvidia v: 390.144 bus ID: 01:00.0 
  Display: x11 server: X.Org 1.20.11 driver: modesetting,nvidia 
  unloaded: fbdev,nouveau,vesa resolution: 1366x768~60Hz 
  OpenGL: renderer: GeForce GT 540M/PCIe/SSE2 v: 4.6.0 NVIDIA 390.144 
  direct render: Yes 
Audio:
  Device-1: Intel 6 Series/C200 Series Family High Definition Audio 
  vendor: Samsung Co driver: snd_hda_intel v: kernel bus ID: 00:1b.0 
  Device-2: NVIDIA GF108 High Definition Audio driver: snd_hda_intel 
  v: kernel bus ID: 01:00.1 
  Sound Server: ALSA v: k5.4.0-84-generic 
Network:
  Device-1: Broadcom and subsidiaries BCM4313 802.11bgn Wireless Network 
  Adapter 
  vendor: Askey driver: bcma-pci-bridge v: N/A port: d000 bus ID: 02:00.0 
  Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet 
  vendor: Samsung Co driver: r8169 v: kernel port: b000 bus ID: 03:00.0 
  IF: enp3s0 state: down mac: <filter> 
  IF-ID-1: wlp2s0b1 state: down mac: <filter> 
Drives:
  Local Storage: total: 465.76 GiB used: 22.34 GiB (4.8%) 
  ID-1: /dev/sda vendor: Toshiba model: HDWK105 size: 465.76 GiB 
Partition:
  ID-1: / size: 92.80 GiB used: 22.34 GiB (24.1%) fs: ext4 dev: /dev/sda6 
Sensors:
  System Temperatures: cpu: 58.0 C mobo: 58.0 C gpu: nvidia temp: 50 C 
  Fan Speeds (RPM): N/A 
Info:
  Processes: 236 Uptime: 30m Memory: 5.72 GiB used: 926.9 MiB (15.8%) 
  Init: systemd runlevel: 5 Compilers: gcc: 9.3.0 Shell: bash v: 5.0.17 
  inxi: 3.0.38 
In the System Monitor app, the usual CPU load is around 2-5% on any give core, very rarely goes above 10-15%, ocasionally spiking to 80-100% when doing a doing heavy tasks, and usually only 1-2 of the cores reach that much at any given time. Despite this, the lowest temperature ever reported by the /sensors command is 42ºC.
These are the temps I measured while doing different "test" tasks (Tested browser is Firefox in icognito):
*Idle (with airplane mode) 55-47-42-44
*Browsing updates 63-61-65-58
*Writing in this forum (now) 58-48-45-46
*Listening to music on Youtube (static video) 64-57-54-54
*Watching a heavy video on Youtube in full 1080p 76-71-67-66
*Installing a App (GIMP) 83-84-70-65
*Using said App (GIMP) in its default installed settings 63-56-54-54

Additional info/details that might be usefull:
*This CPU has had new thermal paste applied less than 12 months ago
*This laptop is using a HDD less than 12 months old
*The insides and vents are clean and free of lint or dust
*The swapiness is set to the values recomended in Pjotr's guide; the System Monitor however reports a swap usage of zero (should it?).
*This device is technically kinda-dual-boot, as it's HDD still contains a "carcass" of a old Win10 wich became damaged and no longer works boots past the error screen. Still in the process of cleaning after that mess. The partition containing it is never mounted outside of ocasional manual copy/recovery of files.
*Only the system partition is mounted (100Gb Ext format).
*This laptop came with both a integrated Intel Graphics and a Nvidia Geforce GT 540M; it currently using the recomended closed-source Nvidia driver (as recomended by the driver manager).
*My Cinnamon desktop has no widgets, only the built-in system applets, and all animations are turned off.
*Don't know if this is relevant, but in System Monitor it shows "Pulse Audio" with VeryHigh priority (nice = -11)
*The process refered to as "ROD" (don't know what is) sometimes shows a CPU usage >100% (???), ranging from 111% to 203%. Still searching what this is/means, but it causes no slowness nor increase in noise/temperature.

These temperatures seem unreasonably high, especially on idle. I read in this forum that switching to a older kernel could improve results on older machines, so I tried running the kernel that originally came with my USB install (5.4.0.58). Measured the temp on the same "test situations", with the same app versions, and difference was negligeble (2-5ºC less).
My next attempt will be trying the Xfce desktop, as I hear it is much lighter, and if all else fails, try a lighter distro altogether.
I'd rather not have to resort to another distro, as Mint really fits my wants/needs like a silk glove.

PS: This is my 1st post here, so I apologize in advance for any blunders or shortcomings; I did read the rules and guidelines but still...
hi, did you try to give a good blow of air with a compressor, in the slots (but with respect), I had a similar problem that I put here on the forum, and I solved it like this

viewtopic.php?f=49&t=356980
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: High to critical CPU temp, despite low usage

Post by rene »

famouslastwords wrote: Fri Sep 24, 2021 8:46 am More afterthoughts, although it's clear @rene is a knowledgeable techie, wouldn't bother switching power governors if you're using p_state.
Personally I definitely would, even if only as a test; not meant as a solution as such. Thing is that this very much mirrors a situation known from the time of Mint 18 -> Mint 19. I.e., with a Samsung RF 411 rather than 511:

https://bugs.launchpad.net/ubuntu/+sour ... ug/1768976

and specifically, the link in https://bugs.launchpad.net/ubuntu/+sour ... omments/66 to

https://askubuntu.com/questions/1063363 ... 34#1064534

One would definitely hope that a bug had by now been fixed but I'd first of all try if that pstate thing has things normalize before diving in deeper.
famouslastwords

Re: High to critical CPU temp, despite low usage

Post by famouslastwords »

Too much coffee, too little sleep.

Anyway it goes, hey ... gnu/Linux is amazing, someone can switch power governor on-the-fly or assign a different one to each core if they want to. So if you think it's worth a shot why the heck not, can be switched back quick enough if it doesn't pan out.

Money would still be on something like a bum fan etc. Funny look at this thread. Someone pointed out his CPU was at 95.8 C and the reply is ...
a wire was blocking my cpu fan . i noticed it too .its back to normal 40 degrees idle
Now am planning to do what any reasonable person would in my present condition, ... drink more coffee and probably fall asleep watching a movie. :P
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: High to critical CPU temp, despite low usage

Post by rene »

Reviewing, you might be right; poster sounds relatively competent but certainly it can still be a simple matter of an e.g. failing fan. Oh well; as you said; just try...
frank84
Level 2
Level 2
Posts: 67
Joined: Sun Jul 15, 2018 6:56 pm

Re: High to critical CPU temp, despite low usage

Post by frank84 »

I don't have the expertise to understand the advanced stuff, but I have a rough understanding of what a fan is supposed to do, in general. So here's some more simple thoughts from a non-expert.

If I understood you correctly, it sounds like, at idle, you feel your computer is running unusually hot but that the fan(s) is/are not running at an elevated speed yet. And apparently, the fans slow substantially back down again when you return to an idle state.

If you feel your computer is running unusually hot at idle, shouldn't the fan(s) already be running at a fairly good clip then at idle -- if they are responding to internal temperatures properly and otherwise working properly? Maybe this suggests there is an issue with your fan(s).

My CPU core temperatures are currently at 30 degrees C or below now but I can hear my fans when I put my ear near the computer. [Note: I am using a desktop computer so I don't know how relevant this comparison is.]

In response to your question:
Indeed, my machine states that the Nvidia is permanently on "Performance mode" (it was the default setting), even when there is nothing going on to demand GPU power/work.
Do you think this may be the problem I'm having too? Or some variation of it?
I installed the recommended Nvidia proprietary driver and have a Nvidia icon in my panel (I can't remember if I added it as an applet or if it installed automatically.) When I right-click on the icon, it always indicates that "Performance Mode" is my Active profile. The only thing of note I have going on at the moment is a couple of tabs open to the LM Forum in Firefox. And as I am typing this, my CPU core temperatures range from about 27 - 30 degrees C. My current room temperature is 21 degrees C. My fan speeds are about 800 and 1000 rpm.

When right clicking on that icon, the options for Power Saving mode and On Demand mode are also available. I assume this icon's interface would thus allow us to easily and quickly switch back and forth between these modes for testing, etc. Selecting the "About" option for the icon displays a popup entitled "Nvidia Optimus".

For reference -- since my system is obviously different than yours: I believe the highest temperatures I have encountered with my current heatsink and fans have been less than 60 degrees C. Before upgrading them, I was hitting the low 90s doing the same things. I am running LM 20.2 Edge Cinnamon on a Dell XPS 8940 desktop with an i7 11700 CPU. My computer came new with a terrible CPU cooler and one tiny case fan. I upgraded the CPU heatsink & fan (cooler) and added a second, better case fan. Based on what I read from other people's experiences, I would have barely been able to get the CPU out of first gear without upgrading the cooler and fans because thermal throttling (of the CPU) would kick in right away with any substantial load on the CPU. I seem to have much better cooling now so I'm looking forward to possibly trying games, etc. now.
User avatar
SMG
Level 25
Level 25
Posts: 31988
Joined: Sun Jul 26, 2020 6:15 pm
Location: USA

Re: High to critical CPU temp, despite low usage

Post by SMG »

GreenIsBest wrote: Fri Sep 24, 2021 7:39 amIn the meanwhile, I stumbled across this discussion on the Ubuntu forums: https://askubuntu.com/questions/247033/ ... ith-ubuntu
It is a very old post from a old version of Ubuntu; but it involves the exact same hardware and complaints (tough in a different brand/machine).
They talk about this "Optimus" and "Bumblebee" thing, and that the 2 GPU's might be conflicting with each other.
Indeed, my machine states that the Nvidia is permanently on "Performance mode" (it was the default setting), even when there is nothing going on to demand GPU power/work.
Do you think this may be the problem I'm having too? Or some variation of it?
Do not install Bumblebee. It is not supported.

You should have a little icon in the panel in the lower right of your computer. If you want to switch to using Intel, you use that nvidia-prime-applet to switch to Intel Power-save mode and then you reboot your computer for that to take effect.
Image
A woman typing on a laptop with LM20.3 Cinnamon.
motoryzen
Level 10
Level 10
Posts: 3497
Joined: Sun Dec 08, 2019 12:25 am

Re: High to critical CPU temp, despite low usage

Post by motoryzen »

the System Monitor however reports a swap usage of zero (should it?
That's a good thing. Anytime your system , for all general performance purposes, is having to use Swap space, that means less speed/responsiveness overall and/or with the particular " focus" things your system is dealing with because permanent storage ( as of 2021's available best of the best tech has to offer to us consumers) is always ( in the digital world's math grading sense) worlds apart slower than ram and that's still also worlds apart slower than your cpu.....

This is applicable even if it's a pci express gen 4 B.A.. nvme ssd that houses your Linux Mint installation. The performance of that app or overall the system can be slightly slower best case or if it's a HDD, as your case per you would ...is, then when the ram consumption is maxed out or continuing to stay maxed out and swap space is being utilized..you'll DEFINITELY notice a slow down for as long as whatever is forcing your system to continue keeping the ram consumption maxed and forcing it to continue using swap space.
*Don't know if this is relevant, but in System Monitor it shows "Pulse Audio" with VeryHigh priority (nice = -11)
I have no reason to assume this isn't still accurate information and so far this user has presented plenty of accurate and helpful to a LOT of users here, per xenopeak years ago viewtopic.php?t=197463... Pulse Audio is supposed to run in higher than normal "priority" to ensure no sound stutters occur. Also, I've noticed since even Linux mint 15 at least, that the main audio " server", as it's referred to as, has always ran in a higher than Normal priority.

If all else suggested by others here fail to resolve the issue, I don't see why trying out a decent cpu performance controlling software ( such as cpupower-gui found in Software Manager which works with Intel cpus) would try to help as you could set it to "on demand" or even " power saver" mode to see if maybe forcing your cpu to stay at the lowest speed as often as your programs/whatever you're doing doesn't ask much of it...thus the system uses less power..produces less heat, and less need for the fans to run faster or at all depending on your PC's firmware's controls involving the fans is concerned. *shrugs*

Beyond that, I apologize that I haven't much help to offer.
Mint 21.2 Cinnamon 5.8.4
asrock x570 taichi ...bios p5.00
ryzen 5900x
128GB Kingston Fury @ 3600mhz
Corsair mp600 pro xt NVME ssd 4TB
three 4TB ssds
dual 1TB ssds
Two 16TB Toshiba hdd's
24GB amd 7900xtx vid card
Viewsonic Elite UHD 32" 144hz monitor
User avatar
senjoz
Level 5
Level 5
Posts: 902
Joined: Tue Jun 09, 2020 3:55 am
Location: Kamnik

Re: High to critical CPU temp, despite low usage

Post by senjoz »

You can check logs if there is any usable information about what is going on at unexpected shutdown. When you start Mint after an unexpected shutdown, run in terminal command journalctl -r -b -1. This will display logs from the last boot (with unexpected shutdown) in reverse order. You can send that log to termbin.com with command journalctl -rb -1 | nc termbin.com 9999. Post the url address, which you will get, in your replay. Maybe someone will see where the problem is.

Check also bios options. Maybe there are settings for fan management, like silent, normal and speed mode. And do run a stress test.

Regards, Jože
GreenIsBest
Level 1
Level 1
Posts: 48
Joined: Sun Sep 19, 2021 11:54 am

Re: High to critical CPU temp, despite low usage

Post by GreenIsBest »

Once again, thank you all for the ever increasing suggestions and help.
I decided to start with senjoz suggestion, as it seems like the closest thing to a diagnosis. So I intentionally refrained from atempting any solution (yet), so as to not introduce any variables/changes. Besides, it's about time I run a checkup on my old fan anyway, as I didn't had it checked when I took the device to the thermal paste and HDD repairs

Ran into 2 road-bumps before being able to do so. First the synaptic manager gave a error (“Unable to get exclusive lock”). I noticed the Update Manager had new updates, and after a quick search about the aformentioned error, I figured it was probably due to the Update Manager. So I installed the updates, wich may or may not have “accidently” fixed some problems.
Then the synaptic manager gave another error (“W:Download is performed unsandboxed as root”), wich turned out to be a harmless well know bug (viewtopic.php?t=280054).
No idea why these 2 errors happened nor if they are related to anything relevant or are the random “it just happens” kind of error; but might as well mention them here, if anything to warn other newbies like me if they run into them. Either way, the downloads went smoothly, and installed both PowerStat (0.02.22-1) and Stress (1.0.4-6).
Ran the commands as instructed. It went as such.

On idle (in airplane mode) immediately before triggering the Stress.
Temp: 59-50-46-47
CPU: ~800 MHz on all
Watts: 5,49
Fan: running but not audible, unless I put my ear next to it.

As soon as Stress was triggered (dispaching hogs).
Temp: 89-90-83-77
CPU: It kept rising to ~2200 MHz, droped to ~2000 upon reaching it, rinse and repeat.
Watts:39,88
Fan: Blasting and loud

From the 5-6 minute mark to the 10 minute mark.
Temp: 88-79
CPU: exacly 1795 MHz on all, and didn’t move so much as a decimal.
Watts: 28,34
Fan: Running and audible, but not as loud

The laptop didn't crash, and ss soon as the Stress ended (600 sec) the CPU imediately dropped to 850-798 MHz, and temperature to 64-54 ºC.
PowerStat reported a average 23,30 W, with standard deviation of 12,88 (probably because I started it before the other commands, so it also measured some non-stress time). After a minute or two, the temperature further dropped to 62-59-50-54, presumably because it finished venting the heat still lingering from the stress test.

With this results in hand, and based on senjoz instructions, I can safely conclude that the CPU itself is still running and reacting as normal, as 2GHz is the advertized non-turbo benchmark of this CPU; precisely the upper-limit reached during the stress test. No idea what caused it to ever crash in the first place, since a genuine intentional stress-test didn't push the heat to that point.
Oh well, guess that leaves inadequate cooling as the most likely culprit, and the test numbers (temp vs CPU) seem to agree. And as others have pointed out, this should come as no surprise after 10 years of backbreaking computing.
And just to exclude any doubt, I then visually checked the fan and exausts, and confirmed that everything is still clean and free from lint and dust (damn clean actually). Yap… worn-out fan it seems then.

I find it ironic tough, that this inadequate cooling would reveal itself on a mediumweight no-gaming Linux desktop, instead of the fully-updated Win10 it was running before the installation. Maybe Mint is just better at detecting/warning that your hardware is beggining to fail out?

Should I just have the old fan replaced with a new one right away, or is it worth trying to diagnose/correct this trough software or commands?
motoryzen
Level 10
Level 10
Posts: 3497
Joined: Sun Dec 08, 2019 12:25 am

Re: High to critical CPU temp, despite low usage

Post by motoryzen »

Given your laptop is running a cpu from the Sandy Bridge generation ( that's around 9 years old easily)..it might be time to open it up, clean off the original thermal grease if you haven't done so or know that ..THAT hasn't been done so, and put some good fresh thermal paste on the cpu itself.

If this task has not yet been done on that laptop,..I'll bet you anything you'll see better temps..worst case between 5 to 13c better on a consistent basis.

Here's a video of complete disassembly to lead you there if you're tech savvy or mechanically inclined enough https://www.youtube.com/watch?v=E0l-rxc0boU

This video >> https://www.youtube.com/watch?v=mkomW_dQGHI watching it up until 5:06. There is your cpu access. Experts usually recommend cleaning thermal paste off of any laptop cpu within 7 years or maybe less depending on the most often use/temps scenarios. Sometimes the OEM's use crappy cheap thermal paste that often doesn't practically last even 5 years before its ( the thermal paste's ..job) performance degradation really begins.
Mint 21.2 Cinnamon 5.8.4
asrock x570 taichi ...bios p5.00
ryzen 5900x
128GB Kingston Fury @ 3600mhz
Corsair mp600 pro xt NVME ssd 4TB
three 4TB ssds
dual 1TB ssds
Two 16TB Toshiba hdd's
24GB amd 7900xtx vid card
Viewsonic Elite UHD 32" 144hz monitor
User avatar
senjoz
Level 5
Level 5
Posts: 902
Joined: Tue Jun 09, 2020 3:55 am
Location: Kamnik

Re: High to critical CPU temp, despite low usage

Post by senjoz »

I would say that the CPU stress test was not perfect but it was also not bad. No unexpected shutdown, some throttling at the end of the test. But that was only a CPU stress test. Your machine also has an Nvidia GeForce GT 540M graphics card. That card was not involved in the stress test. Has your machine only one fan for both devices, CPU and graphics card? If both devices would be stressed, cooling could be problematic. I do not know how to stress test both devices, CPU and graphic card, with one test in Linux.

Have you checked bios settings and journalctl logs?

Maybe next will be interesting for you: https://www.notebookcheck.net/Review-Sa ... 575.0.html

Regards, Jože
famouslastwords

Re: High to critical CPU temp, despite low usage

Post by famouslastwords »

^ Woah, hats off to senjoz. Hadn't even thought of any gpu on the thing ... Maybe swap out the vidcard for something else ? Has the critical temp warnings happened again since ? If so were you doing something that was graphics intense ? I mean if heat builds up, faster than it can be evacuated, then yeah ...

Sheesh, that's a really good point senjoz and think yet another reason to boycott Nvidia. As Linus T so publicly pointed out (with his middle finger) they don't seem to care overmuch about the gnu/Linux platform and its users. Don't get me wrong if someone gave me a pc with Nvidia, I'd try my luck or if found some hardware dirt cheap but given any kind of choice personally I don't want to deal with them.

Also @GreenIsBest think that was masterfully done (testing and monitoring) and phrased in your post. Wouldn't advise just changing components for the sake of it. It's kind of like working on a car. Oh it might be the plugs, change em, nah ... it's the wires, switch em out, errrr maybe it's the alternator or battery ? Lol ... someone winds up with a mostly new car, lots of time, trouble and expense and never figures out what the problem even really was. Honestly think @senjoz made a dang good observation and that the problem could very well be Nvidia.

Not like that's new, seen so many Nvidia related horror stories from Linux users over the years, my plan is to avoid them like the plague. :D Anyway it goes, that sounds like a nice laptop, 12yrs old or not. Better than my crusty old thing, lol ... I couldn't play a game of pacman on this to save my life. :P

Either don't know or don't remember but there's got to be some tools to stress test both cpu/gpu. If it can be done some Nix geek has already done and developed software and/or methods to do it. Never got into gaming on Linux but do vaguely recall people discussing it. The same stuff they use/do to measure the FPS they can get on their systems.

Google would know.
Mintmann
Level 3
Level 3
Posts: 161
Joined: Mon Aug 09, 2021 6:36 am

Re: High to critical CPU temp, despite low usage

Post by Mintmann »

If the cpu fan is working make sure it is clean and also check the cooler fins, and check any case filters.
famouslastwords

Re: High to critical CPU temp, despite low usage

Post by famouslastwords »

Facepalm !!!

Wouldn't just doing some decent gaming on the thing accomplish this ? Someone really wants to be thorough, set the performance governor (which tons of gamers apparently tend to do anyway.) and then play some decent games on it to hit up the gpu ?

Alrighty ... was just a random thought. Also DAM U NVIDIA ... errrr, nother random thought. :P
Locked

Return to “Hardware Support”