CPU temperature problems on Asus laptop UX31A

Level 6
Level 6
Posts: 1395
Joined: Tue Mar 11, 2014 4:30 pm

CPU temperature problems on Asus laptop UX31A

Post by fabien85 » Sat Sep 29, 2018 5:42 am

I'm stumbling upon a problem with the Mint 19 Cinnamon that I installed on my wife's laptop. inxi output is given at the end of the post.

The machine is 5 years old. Originally we had installed Mint 17 Cinnamon and it worked well. With time problems started to appear, and upgrades to 17.1 until 17.3 didnt fix them (they didnt seem to worsen just after upgrades, just with time). Specifically, these were cinnamon problems, e.g. icons reverting to rougher ones (I guess from the underlying gnome), the desktop disappearing together with the menu bar...
This started to get annoying, so when Mint 19 got out, I decided to fresh install it on the laptop. Well that didnt work as well as expected. Got ACPI errors at boot (tried various acpi kernel boot parameter, going in the firmware interface and setting the disk to either acpi or ahci, nothing changed), system was unstable... Since these problems didnt appear with Mint 17 on kernel 4.4, I gave it a try and installed 4.4 (first manually by using the .deb from Mint 18, then I installed ukuu). This makes the system stable and usable. Still had some freeze, with errors in the log pointing to the mei module. I had these errors in Mint 17 and blacklisting mei and mei_me worked, so I again blacklisted them. That makes the system more stable.

The problem: I sometimes get erratic machine behaviour (mouse moves and clicks randomly by itself) and system freezes, under normal conditions (e.g. just running thunderbird and firefox and watching videos on youtube).
When the system freezes, I usually cannot restart cinnamon (alt+F2, then r then enter), nor switch to another virtual terminal (e.g. ctrl+alt+F2), nor kill the X server... Only thing that works is a REISUB.
When I reboot and check /var/log/kern.log (or syslog), I see these kind of errors just before freeze :

Code: Select all

[ 6686.190526] CPU1: Core temperature above threshold, cpu clock throttled (total events = 49)
[ 6686.190527] CPU3: Core temperature above threshold, cpu clock throttled (total events = 49)
[ 6686.190529] CPU2: Package temperature above threshold, cpu clock throttled (total events = 49)
[ 6686.190530] CPU0: Package temperature above threshold, cpu clock throttled (total events = 49)
[ 6686.190533] CPU3: Package temperature above threshold, cpu clock throttled (total events = 49)
[ 6686.190536] mce_notify_irq: 1 callbacks suppressed
[ 6686.190538] mce: [Hardware Error]: Machine check events logged
[ 6686.190542] CPU1: Package temperature above threshold, cpu clock throttled (total events = 49)
[ 6686.190545] mce: [Hardware Error]: Machine check events logged
[ 6686.192551] CPU3: Core temperature/speed normal
[ 6686.192552] CPU1: Core temperature/speed normal
[ 6686.192553] CPU0: Package temperature/speed normal
[ 6686.192554] CPU2: Package temperature/speed normal
[ 6686.192555] CPU1: Package temperature/speed normal
[ 6686.192562] CPU3: Package temperature/speed normal
these errors are also present at other points of the logs, but they didnt lead to freeze those times.
I come to understand that the CPU overheats and that makes hardware errors. Maybe these errors are sometimes benign, and when a critical one appears (or enough errors accumulate), the system freezes.
What I dont understand is that the fans are working, you can see that in the inxi output, and I checked I can manually make them turn slower or faster. And I have put a cinnamon applet to display the CPU temperature, and I got freezes where the applet only showed temperatures of 70-80°C, which is way below the maximal temperature (105°C following intel).
Also mcelog is not available in the repositories (apparently it has been removed from ubuntu bionic...), so I cannot check the machine check events log.

Any idea ?

Code: Select all

$ inxi -Fxz
System:    Host: KeyLimePie Kernel: 4.4.156-0404156-generic x86_64
           bits: 64 gcc: 5.4.0
           Desktop: Cinnamon 3.8.9 (Gtk 3.22.30-1ubuntu1)
           Distro: Linux Mint 19 Tara
Machine:   Device: laptop System: ASUSTeK product: UX31A v: 1.0 serial: N/A
           Mobo: ASUSTeK model: UX31A v: 1.0 serial: N/A
           UEFI: American Megatrends v: UX31A.219 date: 06/14/2013
Battery    BAT0: charge: 36.2 Wh 89.8% condition: 40.3/50.6 Wh (80%)
           model: ASUSTeK UX31-35 status: Discharging
CPU:       Dual core Intel Core i7-3537U (-MT-MCP-) 
           arch: Ivy Bridge rev.9 cache: 4096 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 9976
           clock speeds: max: 3100 MHz 1: 1446 MHz 2: 1038 MHz 3: 1224 MHz
           4: 860 MHz
Graphics:  Card: Intel 3rd Gen Core processor Graphics Controller
           bus-ID: 00:02.0
           Display Server: x11 (X.Org 1.19.6 )
           drivers: modesetting (unloaded: fbdev,vesa)
           Resolution: 1920x1080@60.00hz
           OpenGL: renderer: Mesa DRI Intel Ivybridge Mobile
           version: 4.2 Mesa 18.0.5 Direct Render: Yes
Audio:     Card Intel 7 Series/C216 Family High Def. Audio Controller
           driver: snd_hda_intel bus-ID: 00:1b.0
           Sound: ALSA v: k4.4.156-0404156-generic
Network:   Card: Intel Centrino Advanced-N 6235
           driver: iwlwifi bus-ID: 02:00.0
           IF: wlp2s0 state: up mac: <filter>
Drives:    HDD Total Size: 256.1GB (65.7% used)
           ID-1: /dev/sda model: ADATA_XM11_256GB size: 256.1GB
Partition: ID-1: / size: 24G used: 16G (67%) fs: ext4 dev: /dev/sda2
           ID-2: /home size: 210G used: 138G (70%) fs: ext4 dev: /dev/sda3
           ID-3: swap-1 size: 4.29GB used: 0.00GB (0%)
           fs: swap dev: /dev/dm-0
RAID:      No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors:   System Temperatures: cpu: 60.0C mobo: N/A
           Fan Speeds (in rpm): cpu: 3500
Info:      Processes: 241 Uptime: 15:14 Memory: 2513.1/3837.6MB
           Init: systemd runlevel: 5 Gcc sys: 7.3.0
           Client: Shell (bash 4.4.191) inxi: 2.3.56

Re: CPU temperature problems on Asus laptop UX31A

Post by fabien85 » Wed Oct 17, 2018 3:42 pm

I found a possible culprit: the encrypted swap (the install has home directory encryption)
I deactivated this and replaced it by normal swap. And with kernel 4.4 from ukuu it seems ok now. I still see some "Core temperature above threshold, cpu clock throttled" messages in he logs, but it seems less frequent, and there has been no system freeze for the moment. I would prefer if the warnings disappeared completely.
Then I realised that using a kernel from ukuu is not so good. For instance I dont have any man page, and I guess it's just a symptom that other important things are missing because the kernel doesnt have all the ubuntu patches.
I reinstalled the official kernel 4.15 (version 0-36 as I'm writing) with the Mint update manager, and we are going to test whether the system is stable with it.
The thing is that booting with this kernel throws the following ACPI error early in the boot process

Code: Select all

[    0.064309] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
[    0.064462] ACPI Error: Needed type [Reference], found [Integer]         (ptrval) (20170831/exresop-103)
[    0.064462] ACPI Exception: AE_AML_OPERAND_TYPE, While resolving operands for [Store] (20170831/dswexec-461)
[    0.064462] No Local Variables are initialized for Method [_PDC]
[    0.064462] Initialized Arguments for Method [_PDC]:  (1 arguments defined for method invocation)
[    0.064462]   Arg0:           (ptrval) <Obj>           Buffer(12) 01 00 00 00 01 00 00 00
[    0.064462] ACPI Error: Method parse/execution failed \_PR.CPU0._PDC, AE_AML_OPERAND_TYPE (20170831/psparse-550)
This doesnt happen with kernel 4.4. Any idea ?

