HOWTO: Diagnose and Repair Common Hardware Faults

Write tutorials here
There are more tutorials here http://community.linuxmint.com/tutorial/welcome
Forum rules
Please don't add support questions to tutorials,start your own thread in the appropriate sub-forum instead. Before you post please read this
Post Reply
User avatar
lagagnon
Level 7
Level 7
Posts: 1880
Joined: Wed Jun 17, 2009 7:38 pm
Location: an island in the Pacific...

HOWTO: Diagnose and Repair Common Hardware Faults

Post by lagagnon » Mon Jul 06, 2009 4:27 pm

There are a few posts on the Mint Forums where the user is experiencing either system freezes (lockups) or unrequested shutdowns or reboots. Given that Linux and its applications are normally very stable this situation tends to be indicative of a hardware fault, and is rarely a problem with the operating system or its applications.

If you are experiencing such problems you can readily confirm it to be a hardware (rather than software) fault by running the Mint Live CD (or any Linux Live CD). Open a few applications and try stressing the computer by playing YouTube videos while doing other things. If you experience the same freezing/lockup/shutdown/reboot problems while running the Live CD then you definetely have a hardware fault. In my experience as a computer technician such faults tend to be the result of the following - most common fault first:

1) faulty or improperly seated RAM stick
2) faulty hard or optical drive
3) overheating
4) faulty power supply or faulty battery/charger in laptops


The first step is to run "memtest" from the Live CD. This will quickly tell you whether you have faulty RAM. If memtest shows you have errors you should first of all remove all RAM sticks, clean their gold contacts with a pink eraser and rubbing alcohol and then reseat the sticks properly and rerun memtest. If the memory test fails again you have at least one faulty RAM stick. Remove them one by one until you figure out which is faulty and replace it.

Overheating faults tend to result in your computer automatically shutting down or rebooting rather than freezing. If you run applications that use 100% CPU continuously (games, simulations, video editing, etc) then unless your computer is properly cooled and dust removed from fans and heat sinks your CPU and/or video card may overheat. The solutions are to: 1) check the fans are working properly, 2) check the fan output vents are not blocked (especially laptops) and 3) check you don't have lots of dust buildup between the heat sink fins and fan blades. You can install "conky" and "lm-sensors" to monitor your CPU temperature on a continual basis if you are having such problems.

Your power supply can be tested by installing and configuring the lm-sensors package and running "sensors" to show you your power supply voltages, but you can also see this information in your BIOS setup. The supply voltages should be in the ballpark of 3.3, 5 and 12 volts. Power supplies are most stressed when you are accessing many drives at once (all hard drives and optical drives) while your CPU is running at 100% usage and while your laptop battery is charging. If your failure occurs at these times you probably have a faulty power supply/battery or battery charger unit.

Hard drive faults are usually catastrophic rather than subtle. If you hear even the slightest new noises eminating from the area of your hard drive (usually small clicking sounds) you must backup all your data immediately. This is indicative of a failing hard drive. Also, go to Menu, Administration, Log File Viewer and on the left panel click "dmesg". Scroll through that log and look for errors associated with your hard drive. They look somewhat like this:

sda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

If you see anything like the above in the dmesg log your hard drive is faulty. If the device name (e.g. sdc) relates to an optical drive and you also get similar errors for that device then your optical drive is failing. You can get a better HD diagnosis by using the hard drive tools on the "Ultimate Boot CD" (http://www.ultimatebootcd.com). You may even be able to repair your hard drive with these tools. For this you need to know the exact make/model of your hard drive (use "cat /proc/scsi/scsi" in a terminal).

Optical drive faults are not a big deal as they are replaced cheaply and won't affect your data and you can continue to compute without them. Most optical drive faults are mechanical and obvious to the user so I won't pursue this step any further.

If you have checked all the above items, and there do not appear to be any faults with this equipment then you might have a more serious problem such as a failing CPU or motherboard. If you feel technically competent and you have a desktop computer then remove the CPU fan and the CPU itself. Clean off the heat sink-CPU thermal compound with alcohol, reseat the CPU carefully and renew the heat sink compound (procedures here: http://www.pcguide.com/proc/physinst/sink-c.html). Replace the CPU fan and retest your system.

The above HOWTO is a very succinct treatment of the most obvious hardware faults. If you are experiencing something similar to above but are unable to arrive at a solution after all the above tests then feel free to personal message me - but I will need lots of data and a full description of the faults/tests you have done.

Good luck...Larry
Last edited by lagagnon on Mon Aug 31, 2009 2:24 pm, edited 1 time in total.

Husse
Level 23
Level 23
Posts: 18701
Joined: Sun Feb 11, 2007 7:22 am
Location: Near Borås Sweden

Re: HOWTO: Diagnose and Repair Common Hardware Faults

Post by Husse » Thu Jul 09, 2009 6:21 am

Yupp :)
Especially PSU errors are often overlooked
A warning though
Take extreme care to avoid static discharges - grab a water pipe or something before you start - don't do it standing on a carpet
- preferably use an antistatic bracelet or hold some part of your body (a hand) in direct contact with the casing all the time
Try to find some metal frame in laptops but they have special solutions for hard disks and RAM that makes it less perilous
I have a feeling I may use quite some time chasing what really is a hardware error....
Image
Don't fix it if it ain't broken, don't break it if you can't fix it

User avatar
Fred
Level 10
Level 10
Posts: 3338
Joined: Fri Jan 04, 2008 11:59 am
Location: NC USA

Re: HOWTO: Diagnose and Repair Common Hardware Faults

Post by Fred » Thu Jul 09, 2009 1:01 pm

+1 Good post lagagnon.

While on hardware, one other tip. Another old saying. "Use finis, not force." That is to say. When you are working with hardware, if you have to force it, you are doing it wrong. :-)

Fred
Insanity: Doing the same thing over and over and each time expecting a different result.

Democracy is 2 wolves and a lamb voting on the menu. Liberty is an armed lamb protesting the electoral outcome. A Republic negates the need for an armed protest.

Post Reply

Return to “Tutorials”