Steps to take after HDD failure

Questions about Grub, UEFI,the liveCD and the installer
Forum rules
Before you post please read how to get help
Post Reply
kelltech
Level 2
Level 2
Posts: 81
Joined: Sun Jun 29, 2014 3:38 pm
Location: UK

Steps to take after HDD failure

Post by kelltech » Mon Sep 16, 2019 6:21 am

It appears that I have a HDD failing on me (attached image) which was suspect because of some random yet persistent boot issues, particularly after kernel updates (like this: viewtopic.php?f=46&t=298632).

What I have is 2 drives: an SSD that has Win10 and LM installed on it, with the secondary HDD used as storage for both installs. That secondary is the one that's failing.

Question is, should I be wiping everything from both drives and installing fresh when I replace that secondary drive, or should I try to copy everything over? Would that cause unforeseen issues down the road?
Attachments
Screenshot from 2019-09-16 11-15-20.png

gm10
Level 19
Level 19
Posts: 9641
Joined: Thu Jun 21, 2018 5:11 pm

Re: Steps to take after HDD failure

Post by gm10 » Mon Sep 16, 2019 6:41 am

Is that the drive that already had one bad sector when we previously talked? You didn't scroll down the SMART status in your screenshot to show which item actually failed.

The problem with copying from the bad drive is that you may get corrupt data. You can do it if you need to recover something that is not backed up yet but you should then verify that the data is actually intact. So assuming a healthy backup I'd replace the drive, clean install and then restore from the backup.
Tune up your LM 19.x: ppa:gm10/linuxmint-tools

User avatar
Spearmint2
Level 16
Level 16
Posts: 6874
Joined: Sat May 04, 2013 1:41 pm
Location: Maryland, USA

Re: Steps to take after HDD failure

Post by Spearmint2 » Mon Sep 16, 2019 6:54 am

That drive looks pretty good to me. I'd put it in my computer any day. It's the 1 and 7 values most important. "Seek error rate" is ZERO! Every drive can get a bad sector once in awhile, can even happen on new drives. All that "old age" and "pre-fail" stuff can mostly be ignored. That drive is barely used. Not even a single "reallocated sector" in line 5. If there was a "bad sector" that wouldn't show none reallocated. I'm almost envious of that drive, LOL.
All things go better with Mint. Mint julep, mint jelly, mint gum, candy mints, pillow mints, peppermint, chocolate mints, spearmint,....

rene
Level 12
Level 12
Posts: 4251
Joined: Sun Mar 27, 2016 6:58 pm

Re: Steps to take after HDD failure

Post by rene » Mon Sep 16, 2019 6:55 am

There should be no need to wipe the SSD; although lacking significant experience with W10, your NTFS partition is a simple "basic volume" which would mean that simply recreating an NTFS partition on the new/wiped HDD again would have Windows not upset. In Linux, you can simply (and if necessary) update file system UUIDs in /etc/fstab, or conversely reuse the old UUIDs on the file systems themselves.

In the above I'm saying "new/wiped" since your drive isn't in fact showing (definitive) bad sectors. Your new screenshot isn't showing SMART attribute 197 (Current Pending Sectors) but from the thread you linked, if that's still 1 then I'd feel it perhaps a bit too soon to give up on the drive. If you perform a "secure erase" on the drive that may be all you need to have the drive go on for years without issue; the (somewhat) relevant number of read errors that you DO show are likely due to that one specific sector...

I.e., "secure erase" is rather badly named. It might be "secure" but its point is much more that its a firmware-level format; one where the drive checks all sectors and gracefully retires any found to be actually bad; the drive has a reservoir of spare sectors to map into its place. If after such a "secure erase" attribute 5 has stayed at 0 and 197 returned to 0 I'd feel confident again about the drive.

Of course, a "secure erase" does in fact erase, so so as to have this be an option you'd need sufficient storage to be able to backup the current contents of the HDD to (and you'd repartition/reformat/restore it after same as to a new drive) but if you do, I'd give the drive a second chance still.

kelltech
Level 2
Level 2
Posts: 81
Joined: Sun Jun 29, 2014 3:38 pm
Location: UK

Re: Steps to take after HDD failure

Post by kelltech » Mon Sep 16, 2019 7:27 am

Thanks for the help everyone, some things to think about. I was certain this drive was on it's way out, it's the 3rd laptop I've had it in after all. :shock:
gm10 wrote:
Mon Sep 16, 2019 6:41 am
Is that the drive that already had one bad sector when we previously talked? You didn't scroll down the SMART status in your screenshot to show which item actually failed.

The problem with copying from the bad drive is that you may get corrupt data. You can do it if you need to recover something that is not backed up yet but you should then verify that the data is actually intact. So assuming a healthy backup I'd replace the drive, clean install and then restore from the backup.
Hey gm10, appreciate your help once again. It is the one we spoke about with the bad sector. It behaved mostly ok until another kernel update then had to run fsck a few times to get things right again. Maybe I should have asked, would a wipe on this drive solve the problem, or is it facing imminent failure based on the behaviour? I've attached another screen showing the full SMART results.
Attachments
Screenshot from 2019-09-16 12-14-10.png

gm10
Level 19
Level 19
Posts: 9641
Joined: Thu Jun 21, 2018 5:11 pm

Re: Steps to take after HDD failure

Post by gm10 » Mon Sep 16, 2019 7:53 am

Well, seems that sector still hasn't been remapped, or maybe it's a different one. At least it's still only one sector. Yet the self-test failed on reading. It might be time to actively check the whole drive for bad sectors (you need to unmount the partitions first or run this from a live environment):

https://wiki.archlinux.org/index.php/ba ... structive)

If it's still only that one sector you do not necessarily need to throw it out yet, even if it ends up getting remapped (but as before, keep an eye on it). But if this test turns up more bad sector then yes, out with it.
Tune up your LM 19.x: ppa:gm10/linuxmint-tools

kelltech
Level 2
Level 2
Posts: 81
Joined: Sun Jun 29, 2014 3:38 pm
Location: UK

Re: Steps to take after HDD failure

Post by kelltech » Mon Sep 16, 2019 5:08 pm

gm10 wrote:
Mon Sep 16, 2019 7:53 am
If it's still only that one sector you do not necessarily need to throw it out yet, even if it ends up getting remapped (but as before, keep an eye on it). But if this test turns up more bad sector then yes, out with it.
That took a bit of time, but just towards the end errors began appearing. The end result is Pass completed, 44 bad blocks found. (44/0/0 errors).
I think that provides a pretty clear answer: replace the drive, wipe it all and start fresh without the ball and chain of the Win10 dual boot that I haven't used in several months. :lol:

Here's the full test:

root@mint:/home/mint# badblocks -nsv /dev/sda4
Checking for bad blocks in non-destructive read-write mode
From block 0 to 468347903
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: ^[[D^[[D^[[C 31.04% done, 1:09:23 elapsed. (0/0/0 e446171120one, 3:52:10 elapsed. (0/0/0 errors)
446171121one, 3:52:13 elapsed. (1/0/0 errors)
446171122one, 3:52:15 elapsed. (2/0/0 errors)
446171123one, 3:52:18 elapsed. (3/0/0 errors)
446171124one, 3:52:20 elapsed. (4/0/0 errors)
446171125one, 3:52:23 elapsed. (5/0/0 errors)
446171126one, 3:52:25 elapsed. (6/0/0 errors)
446171127one, 3:52:27 elapsed. (7/0/0 errors)
446179584one, 3:52:35 elapsed. (8/0/0 errors)
446179585one, 3:52:38 elapsed. (9/0/0 errors)
446179586one, 3:52:41 elapsed. (10/0/0 errors)
446179587one, 3:52:44 elapsed. (11/0/0 errors)
446179648one, 3:52:49 elapsed. (12/0/0 errors)
446179649one, 3:52:52 elapsed. (13/0/0 errors)
446179650one, 3:52:55 elapsed. (14/0/0 errors)
446179651one, 3:52:57 elapsed. (15/0/0 errors)
446179652one, 3:53:00 elapsed. (16/0/0 errors)
446179653one, 3:53:03 elapsed. (17/0/0 errors)
446179654one, 3:53:05 elapsed. (18/0/0 errors)
446179655one, 3:53:08 elapsed. (19/0/0 errors)
446179756one, 3:53:15 elapsed. (20/0/0 errors)
446179757one, 3:53:18 elapsed. (21/0/0 errors)
446179758one, 3:53:21 elapsed. (22/0/0 errors)
446179759one, 3:53:23 elapsed. (23/0/0 errors)
446179760one, 3:53:26 elapsed. (24/0/0 errors)
446179761one, 3:53:29 elapsed. (25/0/0 errors)
446179762one, 3:53:32 elapsed. (26/0/0 errors)
446179763one, 3:53:35 elapsed. (27/0/0 errors)
446179764one, 3:53:38 elapsed. (28/0/0 errors)
446179765one, 3:53:41 elapsed. (29/0/0 errors)
446183120one, 3:53:50 elapsed. (30/0/0 errors)
446183121one, 3:53:53 elapsed. (31/0/0 errors)
446183122one, 3:53:56 elapsed. (32/0/0 errors)
446183123one, 3:53:59 elapsed. (33/0/0 errors)
446183136one, 3:54:03 elapsed. (34/0/0 errors)
446183137one, 3:54:06 elapsed. (35/0/0 errors)
446183138one, 3:54:08 elapsed. (36/0/0 errors)
446183139one, 3:54:11 elapsed. (37/0/0 errors)
446183140one, 3:54:14 elapsed. (38/0/0 errors)
446183232one, 3:54:22 elapsed. (39/0/0 errors)
446183233one, 3:54:25 elapsed. (40/0/0 errors)
446183234one, 3:54:28 elapsed. (41/0/0 errors)
446183235one, 3:54:31 elapsed. (42/0/0 errors)
446183480one, 3:54:38 elapsed. (43/0/0 errors)
done
Pass completed, 44 bad blocks found. (44/0/0 errors)

gm10
Level 19
Level 19
Posts: 9641
Joined: Thu Jun 21, 2018 5:11 pm

Re: Steps to take after HDD failure

Post by gm10 » Mon Sep 16, 2019 5:35 pm

Yep, I'd throw that out, drives aren't expensive enough to gamble with your data. Typically such drives do get exponentially worse at that stage. In particular considering that's just the results from one partition, you'll possibly have more in the next partition.
Tune up your LM 19.x: ppa:gm10/linuxmint-tools

all41
Level 15
Level 15
Posts: 5691
Joined: Tue Dec 31, 2013 9:12 am
Location: Computer, Car, Cage

Re: Steps to take after HDD failure

Post by all41 » Mon Sep 16, 2019 5:44 pm

I have seen 20 bad blocks become 450 bad blocks overnight.
If the data is sensitive I would just destroy the drive as opposed to attempting a secure overwrite.

rene
Level 12
Level 12
Posts: 4251
Joined: Sun Mar 27, 2016 6:58 pm

Re: Steps to take after HDD failure

Post by rene » Mon Sep 16, 2019 6:06 pm

Note that not anything about "secure erase" in what I wrote above was about securely erasing anything; specifically only about it being a firmware-level format, and often capable of "fixing" issues, simply by having the drive test and/or gracefully retire sectors that are in fact bad, and clear any issues with ones that are not: it reinitializes some/the firmware tables. The "badblocks" run that was suggested operates at a much higher level and would stumble over issues the firmware would either not or would "fix". I'd check SMART values now, and if 5 is still 0 (although the badblock run may have it caused it to increase) I'd go the "secure erase" route.

You can do so from Windows using a host of tools (google will know) and from Linux using hdparm. As said, even though erasing might not be the point, erasing it certainly does do, so you'd need to backup still readable content first.

User avatar
Spearmint2
Level 16
Level 16
Posts: 6874
Joined: Sat May 04, 2013 1:41 pm
Location: Maryland, USA

Re: Steps to take after HDD failure

Post by Spearmint2 » Mon Sep 16, 2019 6:32 pm

Yeah, get new drive. Here's some stats on drives, and Seagate had problems in those years. Maybe manufactured when that tsunami in Thailand wiped them out, LOL?

https://www.backblaze.com/b2/hard-drive-test-data.html

You didn't buy a refurb did you? Avoid them.

Here's a famous picture that went around the internet about then, concerning failure rates.

https://www.backblaze.com/blog/wp-conte ... acture.jpg
All things go better with Mint. Mint julep, mint jelly, mint gum, candy mints, pillow mints, peppermint, chocolate mints, spearmint,....

kelltech
Level 2
Level 2
Posts: 81
Joined: Sun Jun 29, 2014 3:38 pm
Location: UK

Re: Steps to take after HDD failure

Post by kelltech » Tue Sep 17, 2019 11:41 am

Nope, not a refurb just a well used drive in it's third computer. :D

Totally agree, gm10, drives are too inexpensive to gamble with data. My new Western Digital Black, 1TB 7200rpm just arrived and now comes the wipe. When I return, I'll have that Win10 partition gone too! :lol:

Post Reply

Return to “Installation & Boot”