My system hangs repeatedly for no apparent reason. What to do?

Questions about applications and software
Forum rules
Before you post please read how to get help
Post Reply
User avatar
einpoklum
Level 1
Level 1
Posts: 26
Joined: Sat Dec 24, 2016 5:14 am
Location: Amsterdam

My system hangs repeatedly for no apparent reason. What to do?

Post by einpoklum » Mon Nov 27, 2017 3:48 pm

I use Linux Mint 18.2 with a 4.10.0-40 kernel on an x86_64 i5-6700K.

For the past few months, I've occasionally had the system hang on me, for unclear reasons. Maybe it has something to do with Skype 64-bit for Linux I installed; or maybe the kernel version I'm using; or maybe it's the hardware - I don't know.

Now, on reboot, I don't find anything wrong I can put my finger on. I've skimmed the logs a bit, but either there's nothing there or I don't know what exactly to look for. (Specifically, there are no suspicious log messages in `/var/log/syslog` close to the time of the hangs).

What can I do to figure out what's triggering the hang?

------------

Additional information:

nVIDIA driver version:
390.25 (comes with CUDA 9.1 which I installed using nVIDIA's "manual" installer, not via apt or a DEB)

inxi -G:

Code: Select all

Graphics:  Card-1: Intel Device 5912
           Card-2: NVIDIA GK106 [GeForce GTX 650 Ti Boost]
           Display Server: X.org 1.18.4 drivers: nvidia,nouveau,intel (unloaded: fbdev,vesa)
           tty size: 154x36 Advanced Data: N/A for root
inxi -Fxz:

Code: Select all

System:    Host: mymachine Kernel: 4.13.0-36-generic x86_64 (64 bit gcc: 5.4.0) Desktop: Xfce 4.12.3 (Gtk 2.24.28)
           Distro: Linux Mint 18.3 Sylvia
Machine:   Mobo: ASUSTeK model: Z170 PRO GAMING v: Rev X.0x Bios: American Megatrends v: 3016 date: 12/27/2016
CPU:       Quad core Intel Core i5-7600K (-MCP-) cache: 6144 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 30336
           clock speeds: max: 4200 MHz 1: 3800 MHz 2: 3800 MHz 3: 3800 MHz 4: 3800 MHz
Graphics:  Card-1: Intel Device 5912 bus-ID: 00:02.0
           Card-2: NVIDIA GK106 [GeForce GTX 650 Ti Boost] bus-ID: 01:00.0
           Display Server: X.org 1.18.4 drivers: nvidia,nouveau,intel (unloaded: fbdev,vesa)
           tty size: 154x36 Advanced Data: N/A for root
Audio:     Card-1 NVIDIA GK106 HDMI Audio Controller driver: snd_hda_intel bus-ID: 01:00.1
           Card-2 Intel Sunrise Point-H HD Audio driver: snd_hda_intel bus-ID: 00:1f.3
           Card-3 Micronas driver: USB Audio usb-ID: 001-002
           Sound: Advanced Linux Sound Architecture v: k4.13.0-36-generic
Network:   Card-1: Intel Ethernet Connection (2) I219-V driver: e1000e v: 3.2.6-k bus-ID: 00:1f.6
           IF: enp0s31f6 state: down mac: <filter>
           Card-2: Qualcomm Atheros AR93xx Wireless Network Adapter driver: ath9k bus-ID: 04:00.0
           IF: wlp4s0 state: up mac: <filter>
Drives:    HDD Total Size: 3128.6GB (37.5% used) ID-1: /dev/sda model: Samsung_SSD_840 size: 128.0GB temp: 0C
           ID-2: /dev/sdb model: ST1000DM003 size: 1000.2GB temp: 36C
           ID-3: /dev/sdc model: WDC_WD20EZRX size: 2000.4GB temp: 32C
Partition: ID-1: / size: 94G used: 60G (67%) fs: ext4 dev: /dev/sda3
           ID-2: /boot size: 361M used: 270M (80%) fs: ext4 dev: /dev/sda1
           ID-3: /var size: 23G used: 2.4G (11%) fs: ext4 dev: /dev/sda2
           ID-4: swap-1 size: 30.48GB used: 0.00GB (0%) fs: swap dev: /dev/sdb4
RAID:      No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors:   System Temperatures: cpu: 29.8C mobo: 27.8C
           Fan Speeds (in rpm): cpu: 0
Info:      Processes: 279 Uptime: 6 days Memory: 5377.8/15654.1MB Init: systemd runlevel: 5 Gcc sys: 5.4.1
           Client: Shell (bash 4.3.481) inxi: 2.2.35 
Last edited by einpoklum on Sun Mar 04, 2018 12:02 pm, edited 6 times in total.

User avatar
einpoklum
Level 1
Level 1
Posts: 26
Joined: Sat Dec 24, 2016 5:14 am
Location: Amsterdam

Re: How do I troubleshoot a hang?

Post by einpoklum » Sun Jan 07, 2018 2:39 pm

Moving this thread back up. A month and a half has passed (I now use Mint 18.3, with kernel 4.10.0-42-generic), hangs still occurring occasionally, and for the life of me I can't figure out what's going wrong.

User avatar
Jim Hauser
Level 5
Level 5
Posts: 506
Joined: Mon Jun 30, 2014 10:08 pm
Location: Pascagoula, Mississippi

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by Jim Hauser » Mon Jan 08, 2018 3:01 am

It might help if you could provide more information.

Try pasting the output of

Code: Select all

inxi -Fxz
entered in the terminal.

User avatar
einpoklum
Level 1
Level 1
Posts: 26
Joined: Sat Dec 24, 2016 5:14 am
Location: Amsterdam

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by einpoklum » Sun Mar 04, 2018 6:43 am

Jim Hauser wrote:
Mon Jan 08, 2018 3:01 am
It might help if you could provide more information etc.
Done. Sorry, I missed your reply a few months back; and the issue has not gone away.

User avatar
Pjotr
Level 21
Level 21
Posts: 13714
Joined: Mon Mar 07, 2011 10:18 am
Location: The Netherlands (Holland)
Contact:

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by Pjotr » Sun Mar 04, 2018 7:53 am

Launch Driver Manager: which Nvidia driver are you using?

Please also post the output of:

Code: Select all

inxi -G
Tip: 10 things to do after installing Linux Mint 19.2 Tina
Keep your Linux Mint healthy: Avoid these 10 fatal mistakes
Twitter: twitter.com/easylinuxtips
All in all, horse sense simply makes sense.

Cosmo.
Level 23
Level 23
Posts: 17830
Joined: Sat Dec 06, 2014 7:34 am

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by Cosmo. » Sun Mar 04, 2018 8:02 am

A relatively quick step you might try is to revert to kenel 4.4.latest. Note, that you have to load it (at first) manually at boot time from the grub menu. If you do not see the grub menu automatically, press and hold the shift key immediately after powering on and look for this kernel in the sub menu.

User avatar
kc1di
Level 14
Level 14
Posts: 5409
Joined: Mon Sep 08, 2008 8:44 pm
Location: Maine USA

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by kc1di » Sun Mar 04, 2018 10:24 am

I like Pjotr would suspect the video driver. please give us the

Code: Select all

inxi -G
output.
Easy tips : https://easylinuxtipsproject.blogspot.com/
Linux Mint Installation Guide: http://linuxmint-installation-guide.rea ... en/latest/
Registered Linux User #462608

User avatar
einpoklum
Level 1
Level 1
Posts: 26
Joined: Sat Dec 24, 2016 5:14 am
Location: Amsterdam

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by einpoklum » Sun Mar 04, 2018 11:58 am

kc1di wrote:
Sun Mar 04, 2018 10:24 am
please give us the

Code: Select all

inxi -G
output.
You got it.

User avatar
einpoklum
Level 1
Level 1
Posts: 26
Joined: Sat Dec 24, 2016 5:14 am
Location: Amsterdam

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by einpoklum » Sun Mar 04, 2018 12:00 pm

Cosmo. wrote:
Sun Mar 04, 2018 8:02 am
A relatively quick step you might try is to revert to kernel 4.4.latest.
I'm a bit skeptical this would work since I (vaguely) remember trying this myself 4 months ago when I first experienced these hangs. But it's fair advice, I'll give it a proper try (again?) to be thorough and will report here.

Cosmo.
Level 23
Level 23
Posts: 17830
Joined: Sat Dec 06, 2014 7:34 am

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by Cosmo. » Sun Mar 04, 2018 4:36 pm

einpoklum wrote:
Sun Mar 04, 2018 11:58 am
You got it.
Where? How?

Well, I found it in the starting post. Please don't do this, it is confusing. Place new information in a new post. Nobody expects every new information in the starting post, which you now have already edited 6 times. It is now very impossible to say for a reader, what originally had been written there and what got added (or possibly removed). And usually nobody rereads a thread always from the beginning, but only the new posts since the last visit.

User avatar
kc1di
Level 14
Level 14
Posts: 5409
Joined: Mon Sep 08, 2008 8:44 pm
Location: Maine USA

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by kc1di » Sun Mar 04, 2018 6:59 pm

It looks like you have both Nvidia and Nouveau being Loaded this will definitely cause problems. you need to blacklist Nouveau, I'm not sure how you installed the Nvidia driver since if you had installed via the driver manager it should have automatically blacklisted nouveau. In any event that could cause the problems you discribe.
Easy tips : https://easylinuxtipsproject.blogspot.com/
Linux Mint Installation Guide: http://linuxmint-installation-guide.rea ... en/latest/
Registered Linux User #462608

User avatar
einpoklum
Level 1
Level 1
Posts: 26
Joined: Sat Dec 24, 2016 5:14 am
Location: Amsterdam

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by einpoklum » Sat Mar 10, 2018 6:47 am

So, some news. So, I've been running with 4.4.0-116 for a few days with no hangs. I thought the NVIDIA drivers were properly installed (as DKMS ran something when I apt-get install'ed this kernel) - but apparently, I was running without the NVIDIA drivers. Also, I don't see a nouveau kernel module loaded. Anyway, now I've tried installing the drivers using the CUDA 9 bundled driver installer, but it failed; here are a few lines from the log:

Code: Select all

...
-> Installing NVIDIA driver version 390.25.
-> There appears to already be a driver installed on your system (version: 390.25).  As part of installing this driver (version: 390.25), the exis
ting driver will be uninstalled.  Are you sure you want to continue? (Answer: Continue installation)
...
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: Yes)
-> Installing both new and classic TLS OpenGL libraries.
-> Installing both new and classic TLS 32bit OpenGL libraries.
-> Install NVIDIA's 32-bit compatibility libraries? (Answer: Yes)
-> Will install GLVND GLX client libraries.
-> Will install GLVND EGL client libraries.
-> Skipping GLX non-GLVND file: "libGL.so.390.25"
... (it was skipping more stuff)
-> Skipping EGL non-GLVND file: "libEGL.so.1"
-> Uninstalling the previous installation with /usr/bin/nvidia-uninstall.
Looking for install checker script at ./libglvnd_install_checker/check-libglvnd-install.sh
   executing: '/bin/sh ./libglvnd_install_checker/check-libglvnd-install.sh'...
   Checking for libglvnd installation.
   Checking libGLdispatch...
   Can't load library libGLdispatch.so.0: libGLdispatch.so.0: cannot open shared object file: No such file or directory
Will install libglvnd libraries.
Will install libEGL vendor library config file to /usr/share/glvnd/egl_vendor.d
...
-> Driver file installation is complete.
-> Installing DKMS kernel module:
-> done.
ERROR: Unable to load the 'nvidia-drm' kernel module.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
I saw this suggestion to run apt-get purge nvidia-*, and use a PPA for installing the drivers; so I've done that (add-apt-repository ppa:graphics-drivers/ppa) and installed the latest nvidia-390 driver. But when I rebooted, the NVIDIA driver had not loaded, and this happens:

Code: Select all

# modprobe nvidia_390
modprobe: ERROR: could not insert 'nvidia_390': Exec format error
On the other hand, if I boot with a newer kernel version - 4.10.0 or 4.13.0 - the driver does load:

Code: Select all

# lsmod | grep nv
nvidia_uvm            757760  0
nvidia_drm             40960  0
nvidia_modeset       1093632  1 nvidia_drm
nvidia              14323712  2 nvidia_modeset,nvidia_uvm
ipmi_msghandler        45056  2 nvidia,ipmi_devintf
drm_kms_helper        167936  2 i915,nvidia_drm
drm                   360448  5 i915,nvidia_drm,drm_kms_helper
(You'll note no nouveau stuff).

Finally - during this krefufle, apt told me I had bbswitch installed and it was no longer used, so I removed that; maybe it was because I had both the manually-installed driver and an apt-installed one? I wonder. Now let's see how this thing holds up.

Also...
kc1di wrote:
Sun Mar 04, 2018 6:59 pm
It looks like you have both Nvidia and Nouveau being Loaded this will definitely cause problems. you need to blacklist Nouveau, I'm not sure how you installed the Nvidia driver since if you had installed via the driver manager it should have automatically blacklisted nouveau. In any event that could cause the problems you discribe.
I used to use NVIDIA's bundled installer, which you get when you download CUDA. It creates the file /etc/modprobe.dnvidia-installer-disable-nouveau.conf:

Code: Select all

# generated by nvidia-installer
blacklist nouveau
options nouveau modeset=0
Is that not enough? But to be honest, I might have done something else earlier with DEB packages which I had not purged. Anyway, now I've switched to using the PPA mentioned above.

User avatar
einpoklum
Level 1
Level 1
Posts: 26
Joined: Sat Dec 24, 2016 5:14 am
Location: Amsterdam

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by einpoklum » Sat Mar 10, 2018 9:07 am

Ok, booting with the newer kernel, and having Skype (64-bit) running, idle, got me crashing in under an hour. Have now shut Skype down and rebooted.

User avatar
einpoklum
Level 1
Level 1
Posts: 26
Joined: Sat Dec 24, 2016 5:14 am
Location: Amsterdam

Re: My system hangs repeatedly for no apparent reason. What to do?

Post by einpoklum » Wed Mar 21, 2018 3:31 pm

Sorry for being slightly spammy, but the crashes of course persist. I should also mention it doesn't look like I'm using nouveau:

Code: Select all

# lsmod
Module                  Size  Used by
ipt_REJECT             16384  3
nf_reject_ipv4         16384  1 ipt_REJECT
bnep                   20480  2
pci_stub               16384  1
vboxpci                24576  0
vboxnetadp             28672  0
vboxnetflt             28672  0
vboxdrv               458752  3 vboxnetadp,vboxnetflt,vboxpci
ccm                    20480  6
xt_multiport           16384  1
iptable_filter         16384  1
ip_tables              24576  1 iptable_filter
x_tables               40960  4 xt_multiport,ipt_REJECT,ip_tables,iptable_filter
binfmt_misc            20480  1
nvidia_uvm            757760  0
intel_rapl             20480  0
x86_pkg_temp_thermal    16384  0
intel_powerclamp       16384  0
snd_usb_audio         196608  1
coretemp               16384  0
snd_usbmidi_lib        32768  1 snd_usb_audio
snd_hda_codec_hdmi     49152  2
kvm_intel             204800  0
kvm                   589824  1 kvm_intel
snd_hda_codec_realtek    94208  1
snd_hda_codec_generic    73728  1 snd_hda_codec_realtek
irqbypass              16384  1 kvm
wmi_bmof               16384  0
eeepc_wmi              16384  0
crct10dif_pclmul       16384  0
asus_wmi               28672  1 eeepc_wmi
mxm_wmi                16384  0
sparse_keymap          16384  1 asus_wmi
crc32_pclmul           16384  0
arc4                   16384  2
input_leds             16384  0
ghash_clmulni_intel    16384  0
pcbc                   16384  0
nvidia_drm             40960  0
nvidia_modeset       1093632  1 nvidia_drm
aesni_intel           188416  4
aes_x86_64             20480  1 aesni_intel
crypto_simd            16384  1 aesni_intel
glue_helper            16384  1 aesni_intel
snd_hda_intel          40960  4
cryptd                 24576  3 crypto_simd,ghash_clmulni_intel,aesni_intel
nvidia              14323712  2 nvidia_modeset,nvidia_uvm
snd_hda_codec         126976  4 snd_hda_intel,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
intel_cstate           20480  0
ath9k                 151552  0
snd_hda_core           81920  5 snd_hda_intel,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
snd_hwdep              20480  2 snd_hda_codec,snd_usb_audio
intel_rapl_perf        16384  0
snd_seq_midi           16384  0
snd_seq_midi_event     16384  1 snd_seq_midi
snd_pcm                98304  5 snd_hda_intel,snd_hda_codec,snd_usb_audio,snd_hda_core,snd_hda_codec_hdmi
ath9k_common           36864  1 ath9k
snd_rawmidi            32768  2 snd_seq_midi,snd_usbmidi_lib
ath9k_hw              475136  2 ath9k,ath9k_common
snd_seq                65536  2 snd_seq_midi_event,snd_seq_midi
ath                    28672  3 ath9k_hw,ath9k,ath9k_common
serio_raw              16384  0
mac80211              782336  1 ath9k
snd_seq_device         16384  3 snd_seq,snd_rawmidi,snd_seq_midi
snd_timer              32768  2 snd_seq,snd_pcm
cfg80211              614400  4 mac80211,ath9k,ath,ath9k_common
snd                    81920  23 snd_hda_intel,snd_hwdep,snd_seq,snd_hda_codec,snd_usb_audio,snd_timer,snd_rawmidi,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_usbmidi_lib,snd_seq_device,snd_hda_codec_realtek,snd_pcm
ipmi_devintf           20480  0
soundcore              16384  1 snd
ipmi_msghandler        45056  2 nvidia,ipmi_devintf
mei_me                 40960  0
mei                   102400  1 mei_me
shpchp                 36864  0
hci_uart              106496  0
btbcm                  16384  1 hci_uart
serdev                 20480  1 hci_uart
btqca                  16384  1 hci_uart
btintel                16384  1 hci_uart
bluetooth             548864  11 hci_uart,btintel,btqca,bnep,btbcm
wmi                    24576  3 asus_wmi,wmi_bmof,mxm_wmi
ecdh_generic           24576  1 bluetooth
intel_lpss_acpi        16384  0
acpi_als               16384  0
intel_lpss             16384  1 intel_lpss_acpi
kfifo_buf              16384  1 acpi_als
mac_hid                16384  0
industrialio           69632  2 acpi_als,kfifo_buf
acpi_pad              180224  0
parport_pc             32768  0
ppdev                  20480  0
lp                     20480  0
parport                49152  3 lp,parport_pc,ppdev
autofs4                40960  2
btrfs                1101824  0
xor                    24576  1 btrfs
raid6_pq              118784  1 btrfs
dm_mirror              24576  0
dm_region_hash         20480  1 dm_mirror
dm_log                 20480  2 dm_mirror,dm_region_hash
hid_generic            16384  0
usbhid                 49152  0
i915                 1830912  3
i2c_algo_bit           16384  1 i915
drm_kms_helper        167936  2 i915,nvidia_drm
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
e1000e                249856  0
psmouse               147456  0
ahci                   36864  7
ptp                    20480  1 e1000e
pps_core               20480  1 ptp
drm                   360448  5 i915,nvidia_drm,drm_kms_helper
libahci                32768  1 ahci
video                  40960  2 asus_wmi,i915
pinctrl_sunrisepoint    28672  0
i2c_hid                20480  0
pinctrl_intel          20480  1 pinctrl_sunrisepoint
hid                   118784  3 i2c_hid,hid_generic,usbhid

Post Reply

Return to “Software & Applications”