NOTE 2: This is not an exhaustive list. If you think the tutorial could be enhanced by adding other tests and checks, please feel free to reply to this post. When replying, it is advisable to use the quote button " at the top right of the post you are going to reply to rather than clicking the Post Reply button at the bottom left of the post, and don't forget to trim the text, leaving the quote metadata in place; this ensures that I see your suggestion.
NOTE 3: In this tutorial I refer to crashes, but these steps are fully applicable to the machine hard locking, and especially to locking that might appear to be random.
NOTE 4: Before you suspect hardware and try these steps, post into the support forums and ask for help. There may be a software reason for your machine crashing or hanging so that should be eliminated first. The quickest way to determine if it might be a software issue is to boot from a live USB flash drive or DVD.
Desktop & laptop
1) If the machine boots, check the output of this command in the terminal:
dmesg --level=err
and look for error messages. You shouldn't be able to miss them if there are any. If you don't know what the error messages are telling you, create a new thread in the support forums, and include a copy/paste of the errors, suitably trimmed if there are a lot of them repeated; we don't need to see hundreds of repeated lines all saying the same thing. Also post the complete output of this command: inxi -Fxxxzr
in your request for assistance. Always enclose the pasted output in code tags [code]output.here[/code]
. You'll see the code tags icon </>
when you post or reply into a thread.Please don't post support questions in this thread.
Desktop only
2) Shut the machine down, remove the power and video cables, press the power on button to discharge any power stored in the PSU's capacitors, remove the video card completely, put it back and make sure it's firmly seated. Make sure NVMe and SATA connections are good.
If you have other devices plugged into your machine you should either reseat them all or remove them all except for the video card and try to boot. If it boots then you know that one of the other cards you removed is involved so replace the cards one by one and boot each time.
If the video card needs a 6 or 8-pin or multiple 6/8 pin connections, make sure they are provided, and if possible, make sure the multiple connectors do not connect to the same power rail, ie the connectors are on independent cables going to the power supply.
Usually motherboards require additional power from 4/6/8pin connectors. Make sure these are connected. Mostly the power sockets are located close to the CPU, and in the bottom left of the motherboard when looking at the back of the board. The back is where peripherals are plugged in.
Desktop only
Perform this step if the motherboard is not new:
3) While you've got the video card out, look for capacitor plague on the motherboard. You're looking for anything that looks like this:
Notice the three green caps. Two have distended tops and one has a flat top. Also note that the first green cap from the left has a brown, waxy substance on top. The cap with a flat top is ok, but the other two are plagued. If you see anything like that then your motherboard is the suspect. You can stop here if you find the plague on your motherboard because it needs replacing.
Reconnect the cables and try again.
Desktop only
4) If the machine crashes after doing 2) and 3), shut it down again and apply some pressure to the motherboard with two fingers in various places then try again. You've applied enough pressure when you feel the board give by the tiniest amount. This is done in case there is an open circuit somewhere on the board. Applying pressure to make the board give just a tiny bit may temporarily close an open circuit, or it may make it worse or it may do nothing at all. If it doesn't crash after this then either there was a loose or open connection somewhere and it's now either permanently rectified or temporarily rectified. If it's temporary then the machine will often crash when it gets warmer. You'll just have to keep your eye on this if applying pressure actually gets the machine to work.
If this is a newly built machine and not shop bought, make doubly-sure that there is sufficient thermal grease on the CPU, make sure the CPU is correctly oriented, and make sure there are no bent pins on the CPU. Also make sure there is no run-off thermal grease on the pins or pads of the CPU. Note that some CPUs have gold pads rather than pins.
NOTE: Don't void your warranty if this is a shop-bought build. When you've performed all of the tests, you may need to consider sending it back.
Desktop only
5) With the machine turned off, apply the same two-finger pressure to your RAM sticks or, alternatively, remove them and reseat them. Try to boot the machine again. As in step 4) if it doesn't crash after this then either there was a loose or open connection somewhere and it's now either permanently rectified or temporarily rectified. If it's temporary then the machine will often crash when it gets warmer. You'll just have to keep your eye on this if applying pressure actually works.
As a further check, you should remove all memory sticks except one and see if it stabilises. If it doesn't, replace that one stick with another. Rinse, repeat until all sticks have been individually tested.
Desktop only
6) This, along with the remaining steps, is where it can get expensive. Try a different video card. If need be, beg, borrow or temporarily steal one. You could monitor your GPU temperatures to see if it's spiking but that's beyond the scope of this tutorial.
Desktop only
7) The power supply might be faulty and require replacement. If this is the case then searching online for how to test the PSU won't help unless you have appropriate electronic test gear, so again, you may have to beg, borrow or steal a power supply to test this.
Desktop & laptop
8 ) Stress test the CPU and RAM.
Code: Select all
sudo apt-get install stress
sudo stress --cpu 12 --timeout 90
Consider running prime95 or memtest86 for long periods, ie more than a few minutes and up to 12 or 24 hours.
Desktop only
9) If the machine crashes in step 8 ) then you should carefully take the CPU out, clean all of the thermal grease, also carefully, using methylated spirits and a small clean cloth, being very careful not to bend any pins, reseat the CPU, apply a matchstick head's worth of new thermal grease, put the cooler back after cleaning out dust and muck and try again.
Desktop & laptop
10) Soak test the machine. Just leave it running for a full 24 hours but turn off display power management and only engage the screen saver. For this test, when the screen saver kicks in, you can turn the display off manually. You want power management disabled (DPMS) because you don't want the video card to go to sleep. You should know enough by this stage if you've got deeper problems because the soak test fails ie the machine crashes when doing almost nothing.
Desktop & laptop
11) Take it to a computer repair technician, and prepare for the worst.