Bad Sectors on SSD?

jackcq · Post by **jackcq** » Sat Apr 07, 2018 9:04 am

Hi,

last summer I replaced my faulty 2TB system drive with a combo of a 250GB SSD (for the OS) and another 2TB HDD (for those large video files, games and stuff). Back then the drive started to behaved strangely, like video playback stuttering, checksums on single files failing and eventually refusing to boot altogether. Now imagine my shock, when yesterday, I noticed stuttering on video playback again (I do a lot of video related stuff on my mint system) and after a quick check with gnome-disks it tells me, that this time my SSD has bad blocks:

Model:ADATA SU800 (P1021A)
Size:256 GB (256.060.514.304 bytes)
Partitioning:Master Boot Record
Serial Number:2H0720015298
Assessment: Disk is OK, 4 bad sectors (32° C / 90° F)

What really scares me, that when I first noticed, it was one bad sector. Then I started backing up data, it turned to two, and this morning it's 4 already. What is strange though: all the videos that made me notice this in the first place are stored on the HDD (which is okay, according to gnome-disks) and not on the SSD. Maybe it's because the swap partition and the operating system is on the SSD? A badblocks read test also didn't show any errors:

badblocks -s /dev/sda
Checking for bad blocks (read-only test): 87.10% done, 10:00 elapsed. (0/0/0 errors)
done

And smartctl also doesn't seem to show anything unusual:

[smartctl -a /dev/sda
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.13.0-38-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: ADATA SU800
Serial Number: 2H0720015298
LU WWN Device Id: 5 707c18 10044f608
Firmware Version: P1021A
User Capacity: 256.060.514.304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Apr 7 14:53:49 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x71) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0000 100 100 000 Old_age Offline - 0
5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age Offline - 5
9 Power_On_Hours 0x0000 100 100 000 Old_age Offline - 164
12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 467
160 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
161 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 53
163 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 11
164 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 7633
165 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 76
166 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 3
167 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 15
148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 17033
149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 535
150 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 404
151 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 500
169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 100
177 Wear_Leveling_Count 0x0000 100 100 050 Old_age Offline - 2
181 Program_Fail_Cnt_Total 0x0000 100 100 000 Old_age Offline - 0
182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline - 0
192 Power-Off_Retract_Count 0x0000 100 100 000 Old_age Offline - 7
194 Temperature_Celsius 0x0000 100 100 000 Old_age Offline - 40
196 Reallocated_Event_Count 0x0000 100 100 016 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0000 100 100 050 Old_age Offline - 8
232 Available_Reservd_Space 0x0000 100 100 000 Old_age Offline - 100
241 Total_LBAs_Written 0x0000 100 100 000 Old_age Offline - 62864
242 Total_LBAs_Read 0x0000 100 100 000 Old_age Offline - 86587
245 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 122128

SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 2

ATA Error Count: 0
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error -1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
00 ec 00 00 00 00 00

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 00 00 00:00:00.000 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:00:00.000 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:00:00.000 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:00:00.000 IDENTIFY DEVICE
c8 00 00 00 00 00 00 00 00:00:00.000 READ DMA

Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 164 -
# 2 Short offline Completed without error 00% 164 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
7 0 65535 Read_scanning was completed without error
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Is there any way to get more information, to make sure it's the SSD that's failing and not the HDD?

Thanks in advance for your help/hints.

rene · Post by **rene** » Sat Apr 07, 2018 9:54 am

jackcq wrote: ⤴Sat Apr 07, 2018 9:04 am Device is: Not in smartctl database [for details use: -P showall]

You have a lot of "Unknown_Attribute" SMART attributes so let's first of all update the smartctl database. You're running 6.5 so grab https://www.smartmontools.org/browser/b ... /drivedb.h and save it somewhere:

Code: Select all

sudo smartctl -a --drivedb=/some/where/drivedb.h /dev/sda

will then likely show more details. You can place the file in /etc/smart_drivedb.h to have it picked up automatically.

What is however the case is that already we can see that indeed your SSD has 5 reallocated sectors (Atribute ID#5) and one pending reallocation (ID#196) which is to say that it has by now likely grown to 6 at least. With an SSD it's not necessarily a huge issue: of course it can happen with old SSDs when they've been written to a lot, but also with new SSD's when factory testing didn't find some bad cells already. I've had this happen with an Intel SSD and although the count was growing for me as well, ever since I "firmware-level reformatted" the drive (causing it to test, retest and reinitialize internal blocklists) the count has remained stable, and the drive speciifically tells me through a SMART attribute that I still have 98% reserve capacity remaining. Your SSD appears to also be for all intents new: Total LBAs Written/Read of 60/80K is tiny. So tiny in fact I wonder how you managed that while using it for some year or so -- but let's wait for a paste with a new drivedb.

Mentioned "firmware-level reformatting" -- which I would do -- would of course necessitate you reinstalling the system on the SSD. You'd boot into a Live system from USB or DVD and run

Code: Select all

sudo hdparm --security-set-pass password /dev/sdz
sudo hdparm --security-erase password /dev/sdz

in which of course /dev/sdz is to be replaced with the correct one for the SSD: the second command erases the SSD fully and will disable the password again. I don't believe that the Reallocated Block count SMART value will ever in fact decrease again but you'll likely find that it stops increasing. The new drivedb will moreover possibly point out an atrribute that basically says you're okay; i.e., like that "98%" thing for my Intel.

jackcq · Post by **jackcq** » Sat Apr 07, 2018 11:35 am

First off, thanks for the quick reply and the useful hints, it's appreciated.

You have a lot of "Unknown_Attribute" SMART attributes so let's first of all update the smartctl database. You're running 6.5 so grab https://www.smartmontools.org/browser/b ... /drivedb.h and save it somewhere

Hmm, that didn't work ... the device still isn't in the database.
I even picked up a tarball of 6.6 and built/configured it to /usr/local real quick and downloaded the 6.6 drivedb.h (assuming that's the latest one), but I'm still getting "not in smartctl database":

smartctl -a --drivedb=/etc/smart_drivedb.h /dev/sda
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.13.0-38-generic] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: ADATA SU800
Serial Number: 2H0720015298
LU WWN Device Id: 5 707c18 10044f608
Firmware Version: P1021A
User Capacity: 256.060.514.304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Apr 7 17:12:18 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x71) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0000 100 100 000 Old_age Offline - 0
5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age Offline - 5
9 Power_On_Hours 0x0000 100 100 000 Old_age Offline - 164
12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 467
160 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
161 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 53
163 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 11
164 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 7635
165 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 76
166 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 3
167 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 15
148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 17043
149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 535
150 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 404
151 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 501
169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 100
177 Wear_Leveling_Count 0x0000 100 100 050 Old_age Offline - 2
181 Program_Fail_Cnt_Total 0x0000 100 100 000 Old_age Offline - 0
182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline - 0
192 Power-Off_Retract_Count 0x0000 100 100 000 Old_age Offline - 7
194 Temperature_Celsius 0x0000 100 100 000 Old_age Offline - 33
196 Reallocated_Event_Count 0x0000 100 100 016 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0000 100 100 050 Old_age Offline - 8
232 Available_Reservd_Space 0x0000 100 100 000 Old_age Offline - 100
241 Total_LBAs_Written 0x0000 100 100 000 Old_age Offline - 62895
242 Total_LBAs_Read 0x0000 100 100 000 Old_age Offline - 86606
245 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 122160

SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 2

ATA Error Count: 0
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error -1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
00 ec 00 00 00 00 00

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 00 00 00:00:00.000 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:00:00.000 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:00:00.000 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:00:00.000 IDENTIFY DEVICE
c8 00 00 00 00 00 00 00 00:00:00.000 READ DMA

Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 164 -
# 2 Short offline Completed without error 00% 164 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
7 0 65535 Read_scanning was completed without error
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I didn't think that an ADATA SU800 is so uncommon, that brand's pretty popular over here. What really worries me that something else/more might be wrong, as the system behaves kinda slowly when extracting files or copying them (which also explains the stuttering on video files I experienced last night) even though all those file operations should run on the HDD only. Maybe because the OS and swap in particular runs on that faulty SSD? As for the "firmware-level reformatting" you suggested, I guess I'd better order a new SSD so I could copy over my partitions before doing that. It literally took me months to set up my dual boot with Win7 exactly the way I want it, I'd rather clone whole partitions instead of reinstalling everything from scratch.

rene · Post by **rene** » Sat Apr 07, 2018 2:41 pm

Right, sorry, I might have checked. Somewhat unexpected but the ADATA SU800 seems to still not be known indeed. Oh well. Attribute ID#5 is the important one and that one it knows about.

Can't for now say anything specific about general problems but note that your 2T drive will do just fine to store backups. The command-line tool dd is really about the conceptually simplest tool available: read a byte/block from the source, write it to the destination, repeat. That is, to create a full SSD image while booted into the Live USB/DVD you'd mount your HDD, open a terminal on it and so something like

Code: Select all

sudo dd if=/dev/sda of=image.dd bs=100M

Will take a while; you can add status=progress to the command line for some feedback. Restoring after the format:

Code: Select all

sudo dd if=image.dd of=/dev/sda bs=100M

The bs= parameter does not need to be the same between save and restore; it just says "read that many bytes at once". Without any bs= it reads/writes just 512 bytes at once which is slow. This to say: all very nonmagical again...

If you'd insist, backing up the partition table and imaging per partition is about as simple but would need more information -- and why bother...

jackcq · Post by **jackcq** » Sat Apr 07, 2018 4:43 pm

Can't for now say anything specific about general problems but note that your 2T drive will do just fine to store backups.

Actually no, it won't - the 2T drive contains my /home filesystem and my win7 data partition ... it's always full to the brim, despite all my efforts to backup and move away data.
I might find enough space for those 250G on one of my external usb backup drives, but sending an entire 250G disk image via usb with dd is probably gonna take at least overnight if not a few days. So I figured if it takes a while anyway and since I have to buy a new SSD anyway, I might as well order one and clone the layout directly to the new drive and do the "firmware-level formatting" only after everything works as expected again. On a side note, I really wonder if 5 bad sectors would be a reason to return that SSD on warranty?

Meanwhile there's more shocking news ... gnome-disks briefly told me with a red warning that the 2TB drive is likely to fail soon (without going to details).
The warning is gone again, but if the 2TB drive is also failing that might explain why everything seems really slow even when not even working on the SSD.

Here's what smartctl tells me on the 2TB drive, any thoughts?

smartctl -a --drivedb=/etc/smart_drivedb.h /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.13.0-38-generic] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Hitachi Ultrastar 7K3000
Device Model: Hitachi HUA723020ALA641
Serial Number: YGG3UBEA
LU WWN Device Id: 5 000cca 224c1bc58
Firmware Version: MK7OA840
User Capacity: 2.000.398.934.016 bytes [2,00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Apr 7 22:32:00 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (19950) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 333) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 087 087 016 Pre-fail Always - 91
2 Throughput_Performance 0x0005 095 095 054 Pre-fail Offline - 493
3 Spin_Up_Time 0x0007 128 128 024 Pre-fail Always - 489 (Average 486)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 509
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 133 133 020 Pre-fail Offline - 27
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 3505
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 455
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 510
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 510
194 Temperature_Celsius 0x0002 139 139 000 Old_age Always - 43 (Min/Max 20/51)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short captive Completed without error 00% 24931 -
# 2 Short offline Completed without error 00% 24014 -
# 3 Short offline Completed without error 00% 18522 -
# 4 Short offline Completed without error 00% 8925 -
# 5 Short offline Completed without error 00% 8486 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I really hate to buy more hardware for my old box, but I might as well buy an SSD and another 4TB just to be on the safe side.

Also, does that even work: copying a full image of the SSD with dd including the bad sectors ? I was thinking I'd use ddrescue which works similar but is capable of skipping data in physically damaged sectors.

rene · Post by **rene** » Sat Apr 07, 2018 7:01 pm

jackcq wrote: ⤴Sat Apr 07, 2018 4:43 pm Here's what smartctl tells me on the 2TB drive, any thoughts?

You have a non-zero Raw Read Error Rate which I've seen as a result of bad cable/connector (and bad controller, but that one only for legacy PATA) which would likely cause me to try a different cable, certainly when it is increasing, but other than that your HDD seems to be in fine condition. Specifically, no reallocated sectors.

If as you say the drive is "full to the brim" this may be all the explanation you need for seriously degraded performance. The Linux ext filesystems try and avoid fragmentation as much as possible but clearly can not without room to spare. If "to the brim" is in your case in fact brimly enough that the drive has to seek all over itself in the middle of some videos due to fragmentation then surely this could give you performance issues. The best way to defragment an ext filesystem is back it up, reformat it and restore the backup, making sure to not ever fill it over 80% or so. Starting with ext4 the e4defrag program is another option; see its manpage.

As to the SSD bit; a USB2 HDD will generally do some 30 M/s making for 250G/30M/s=2.5 hours. A USB3 HDD could be 3 times as fast

You say "since I have to buy a new SSD anyway" but note that I would not find the 6 or so reallocated sectors to be a necessarily pressing reason. In HDD times I wouldn't have lived with bad sectors, especially seeing as how almost without fail any externally appearing bad sectors meant imminent appearance of many more, but things are different with SSDs. Although I'm admittedly not certain about that I do believe it's in fact possible for the count to be non-zero even if all damaged blocks were part of the SSD's internal over-provisioning reservoir. I am certain that any bad blocks will be gracefully retired when running a firmware-level format. Moreover, flash cells can give out without saying anything about other cells; for HDD's this was different due to likelihood of drive mechanics causing the issue.

I as said had an Intel SSD with the same issue, and reallocated sector count that grew to 19 before I firmware-level formatted it and it's stayed at 19 ever since. No, I don't think your warranty will cover this since I don't think it's actually much of a problem other than not looking nice when viewed. In my case Intel support basically also said as much and the firmware-level format is exactly what they ended up advising. I personally by the way also had significant throughput issues with the SSD before doing so: running hdparm -T --direct /dev/sda could in that sense also be useful. Do make sure to compensate for the SSD for example being connected to a SATA2 interface (this would limit it to 300M/s) but when connected to a SATA3 interface you should be getting on the order of 500M/s.

In fact, you are dual booting with Windows 7 and I see that ADATA has a with Intel's tool comparable "SSD Toolbox" available: http://www.adata.com/en/ss/software-6/. Surely it would be a good idea to see what it has to say about things. It also provides for the "secure erase" -- and even running only some diagnostics scans could be all that's needed to cause the drive to rearrange blocks.

As to your last point: I expect you wouldn't have issues with plain dd: you mentioned a badblocks run finding everything hunky-dory. But feel free to use ddrescue. Certainly it's useful on actually damaged disks.

gnappi · Post by **gnappi** » Sun Apr 08, 2018 1:41 am

Actually no, it won't - the 2T drive contains my /home filesystem and my win7 data partition ... it's always full to the brim, despite all my efforts to backup and move away data.
I might find enough space for those 250G on one of my external usb backup drives, but sending an entire 250G disk image via usb with dd is probably gonna take at least overnight if not a few days. So I figured if it takes a while anyway and since I have to buy a new SSD anyway, I might as well order one and clone the layout directly to the new drive and do the "firmware-level formatting" only after everything works as expected again. On a side note, I really wonder if 5 bad sectors would be a reason to return that SSD on warranty?

Meanwhile there's more shocking news ... gnome-disks briefly told me with a red warning that the 2TB drive is likely to fail soon (without going to details).
The warning is gone again, but if the 2TB drive is also failing that might explain why everything seems really slow even when not even working on the SSD.

From my understanding ALL drives (even new) are shipped with bad sectors, but the bad sectors are mapped out with spare sectors (which all drives also have) so the drive looks pristine.

As the drive ages and bad sectors appear the drive automatically re-maps them out and updates the "grown defect table". When all of the "spare sectors" are gone, bad sectors show up in drive tests and the drive is sick and will not get better. You are likely getting SMART errors (Self-Monitoring, Analysis, and Reporting Technology) generated by the drive, not all are bad, but when you get a warning about a failure, don't disregard them.

So, if my understanding is correct, bad sectors showing up are a cause for a replacement. Whether or not the maker will just replace it with a similar drive with a newly juggled defect table and "spare sectors" with a slightly reduced size is unknown. At least I never heard FA or "refurbishing" reports at that granular a level from any of the big makers.

OH, you may want to check for the windows file CBS.log on your Windows drive. They can grow HUGE but cannot be deleted (they are safe to delete) unless you stop the Windows Modules Installer. The more you run Win clean up the bigger this file gets up to 2 gigs or so.

catweazel · Post by **catweazel** » Sun Apr 08, 2018 2:10 am

gnappi wrote: ⤴Sun Apr 08, 2018 1:41 am From my understanding ALL drives (even new) are shipped with bad sectors, but the bad sectors are mapped out with spare sectors (which all drives also have) so the drive looks pristine.

SSDs use wear levelling, and 15% overprovision is recommended.

jackcq · Post by **jackcq** » Sun Apr 08, 2018 1:32 pm

If as you say the drive is "full to the brim" this may be all the explanation you need for seriously degraded performance.

Well, 1.2 TB of the drive are taken up by the win7 data partition alone, which isn't even mounted by default and should have no impact on performance. In any case I decided to free some extra space on my /home partition just to be safe, so I tried copying some 60GBs of to the win partition in order to burn them on bluray. This was taking much longer than expected ... like almost an hour for 4GB in a dozen files (copying from one partition of the drive to another)! So I did the following: I cross-wired the HDD to a different PC and booted from a Mint live CD, basically running the HDD on a different SATA controller, with a different mainboard, CPU and memory and a different SATA cable. Now I tried to copy the same data again ... which still took MUCH too long and eventually produced input/output read errors. Eventually I gave up trying to copy stuff, booted back to my mint install and now the HDD looks really BAD on smartctl.

You have a non-zero Raw Read Error Rate which I've seen as a result of bad cable/connector (and bad controller, but that one only for legacy PATA) which would likely cause me to try a different cable, certainly when it is increasing, but other than that your HDD seems to be in fine condition. Specifically, no reallocated sectors.

Yup, it increased, quite a lot:

smartctl -a --drivedb=/etc/smart_drivedb.h /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.13.0-38-generic] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Hitachi Ultrastar 7K3000
Device Model: Hitachi HUA723020ALA641
Serial Number: YGG3UBEA
LU WWN Device Id: 5 000cca 224c1bc58
Firmware Version: MK7OA840
User Capacity: 2.000.398.934.016 bytes [2,00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Apr 8 19:00:00 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (19950) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 333) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 054 054 016 Pre-fail Always - 1296565532
2 Throughput_Performance 0x0005 116 116 054 Pre-fail Offline - 149
3 Spin_Up_Time 0x0007 128 128 024 Pre-fail Always - 487 (Average 489)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 512
5 Reallocated_Sector_Ct 0x0033 005 005 005 Pre-fail Always FAILING_NOW 877
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 135 135 020 Pre-fail Offline - 26
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 3514
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 458
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 513
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 513
194 Temperature_Celsius 0x0002 125 125 000 Old_age Always - 48 (Min/Max 20/51)
196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 11015
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 27
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 38 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 38 occurred at disk power-on lifetime: 3514 hours (146 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 fd 03 4a bc 04 Error: UNC 253 sectors at LBA = 0x04bc4a03 = 79448579

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 00 4a bc e0 08 02:41:35.546 READ DMA EXT
27 00 00 00 00 00 e0 08 02:41:35.546 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 08 02:41:35.543 IDENTIFY DEVICE
ef 03 42 00 00 00 a0 08 02:41:35.541 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 08 02:41:35.541 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 37 occurred at disk power-on lifetime: 3514 hours (146 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 06 32 48 bb 04 Error: UNC 6 sectors at LBA = 0x04bb4832 = 79382578

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 30 48 bb e0 08 02:37:40.685 READ DMA EXT
27 00 00 00 00 00 e0 08 02:37:40.683 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 08 02:37:40.681 IDENTIFY DEVICE
ef 03 42 00 00 00 a0 08 02:37:40.679 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 08 02:37:40.671 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 36 occurred at disk power-on lifetime: 3514 hours (146 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 03 35 48 bb 04 Error: UNC 3 sectors at LBA = 0x04bb4835 = 79382581

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 30 48 bb e0 08 02:37:30.636 READ DMA EXT
25 00 08 28 48 bb e0 08 02:37:00.295 READ DMA EXT
25 00 08 20 48 bb e0 08 02:36:40.644 READ DMA EXT
25 00 08 18 48 bb e0 08 02:36:23.982 READ DMA EXT
25 00 08 10 48 bb e0 08 02:36:07.793 READ DMA EXT

Error 35 occurred at disk power-on lifetime: 3514 hours (146 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 fe 02 4a bb 04 Error: UNC 254 sectors at LBA = 0x04bb4a02 = 79383042

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 00 4a bb e0 08 02:34:26.914 READ DMA EXT
27 00 00 00 00 00 e0 08 02:34:26.913 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 08 02:34:26.911 IDENTIFY DEVICE
ef 03 42 00 00 00 a0 08 02:34:26.909 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 08 02:34:26.909 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 34 occurred at disk power-on lifetime: 3514 hours (146 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 00 48 bb 04 Error: UNC at LBA = 0x04bb4800 = 79382528

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 00 48 bb e0 08 02:34:24.109 READ DMA EXT
25 00 00 00 46 b4 e0 08 02:34:24.106 READ DMA EXT
25 00 00 00 44 b4 e0 08 02:34:24.103 READ DMA EXT
25 00 00 00 42 b4 e0 08 02:34:24.101 READ DMA EXT
25 00 00 00 40 b4 e0 08 02:34:24.099 READ DMA EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short captive Completed without error 00% 24931 -
# 2 Short offline Completed without error 00% 24014 -
# 3 Short offline Completed without error 00% 18522 -
# 4 Short offline Completed without error 00% 8925 -
# 5 Short offline Completed without error 00% 8486 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I think it's pretty obvious now, that there's a problem with that HDD and not (or not only) the SSD. I'm gonna order a 4TB HDD tonight and a new SSD just in case. If, as has been suggested here, the SSD is fine, I can still use it in another office computer, but the HDD probably needs to be replaced asap.

rene · Post by **rene** » Sun Apr 08, 2018 5:18 pm

jackcq wrote: ⤴Sun Apr 08, 2018 1:32 pm Model Family: Hitachi Ultrastar 7K3000
[ ... ]
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 054 054 016 Pre-fail Always - 1296565532
[ ... ]
5 Reallocated_Sector_Ct 0x0033 005 005 005 Pre-fail Always FAILING_NOW 877
[ ... ]
196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 11015

Well. Whatever happened to have that drive go from 0 reallocated sectors yesterday to an actually failing number now, indeed I wouldn't use the drive anymore. Or again, at the very least not until a firmware-level format has fixed things up as far as possible again, and then only for throw-away storage.

I would advise a Western Digital drive to replace it; they've been the by far best and sturdiest brand I've encountered. Once you have it installed you can run the same security erase on the old HDD as for the old SSD and see how it turns out. The SMART values I believe will not in fact decrease but the actual count of bad sectors could, after the retesting involved in the firmware-level format. Because I am a bit suspicious we're taking about an interface issue, given the 0 from yesterday. Is for example one of the copper traces of the 7-pin SATA connector "bundled up" and did reconnecting it to the new system make things even worse in that respect?

Given that it was full anyway a new and bigger drive is certainly a good idea regardless, but I would then still wait a bit to bin the old drive; visually inspect connector and firmware-level format it and you may have a useful additional 2TB throw-away drive...

jackcq · Post by **jackcq** » Mon Apr 09, 2018 12:25 pm

Last night that HDD officially broke down: Mint refused to boot, giving me drive-not-ready errors and such. It really paid off, that I set up my system to be able to run from SSD exactly for cases like that. So, all I had to do was comment out one line is fstab and everything works again - minus my download and data dirs under /home (which should be mostly backed up). I'll wait with the firmware-levle format of the SSD until the replacement arrives and hope it'll bear with me until then (should arrive tomorrow or on Wednesday anyway). So far, the bad sector count hasn't gone up from 5 anymore.

rene wrote: ⤴Sun Apr 08, 2018 5:18 pm Well. Whatever happened to have that drive go from 0 reallocated sectors yesterday to an actually failing number now, indeed I wouldn't use the drive anymore.
Or again, at the very least not until a firmware-level format has fixed things up as far as possible again, and then only for throw-away storage.

Actually, I'll check how much data I can still salvage and then I'd try to see if I can get it replaced on warranty. It's less than a year old after all. That code you gave me for the firmware-level format works for HHDs same as SSDs?

rene wrote: ⤴Sun Apr 08, 2018 5:18 pm I would advise a Western Digital drive to replace it; they've been the by far best and sturdiest brand I've encountered.

In fact, I ordered a WD Blue 4 TB even before even reading your reply (and a Crucial MX500 SSD). I agree, that Western Digital seems a solid brand, having a few on my other computers in the office that work quite well. It's merely a coincidence that this one was a Hitachi drive, as I needed a replacement real quick for an SHDD (that's a HDD with an SSD chache, don't think they make those any more, also wasn't the greatest drive I ever had), so I bought it from a local refurbishing vendor, who told me it's enterprise quality. Well, I'm not really really convinced about that any more, it lasted less than a year and also tended to get quite hot and loud.

rene wrote: ⤴Sun Apr 08, 2018 5:18 pm Because I am a bit suspicious we're taking about an interface issue, given the 0 from yesterday. Is for example one of the copper traces of the 7-pin SATA connector "bundled up" and did reconnecting it to the new system make things even worse in that respect?

It's definitely very strange. I would take any bet, that the other day, when gnome-disks already told me, that the drive is failing, the result in smartctl would have looked just as bad, only that even before I even run smartctl, the drive seemed fine again. I only hope that the whole problem is not related to the mainboard/SATA controller, because I either I'm very unlucky or something in this computer is trashing those drives, this is like the 3rd replacement already since I have this computer. In fact, I would have replaced the mainboard/CPU a while ago, if it wasn't for the lack of options that would allow me to dual boot with Win7. I use Mint first and foremost, but occasionally I need windows for a game and that windows-only tax software. I want to avoid Win10 at all costs, so all those fancy coffe lake, kaby lake or Ryzen processors are not an option.

rene wrote: ⤴Sun Apr 08, 2018 5:18 pm Given that it was full anyway a new and bigger drive is certainly a good idea regardless

Yeah, I just fear it'll fill up just as quickly and it will be even harder to back up all that data. I still haven't found any good way to back up large amounts of data. External drives tend to fill up quickly and can fail as well, BD-Rs are expensive and unreliable.

rene · Post by **rene** » Mon Apr 09, 2018 4:19 pm

jackcq wrote: ⤴Mon Apr 09, 2018 12:25 pm So far, the bad sector count hasn't gone up from 5 anymore.

That's good. And yes, the security erase works the same for (S)ATA HDD's as it does for (S)ATA SSD's; it's a generic ATA feature. Note; some BIOSen set the drive's security features "frozen" when booting; hdparm -I /dev/sda for the proper value of /dev/sda will tell you at the end out of its output if indeed the features are frozen. If so, suspend and resume the machine: another hdparm -I will then tell you they're "not frozen". At that point you can as per above set a password and initiate the security-erase. On the HDD this will take quite a bit and happens without feedback: the terminal will itself appear frozen after initiating. It is not; just wait for the amount of time which hdparm -I also displayed at the end of its output.

As to backing up: although it can be a bit of an investment, I'd advise a NAS. With a 2-bay one you have in the form of RAID0 versus RAID1 (or 5) the choice between two-disk capacity or one-disk capacity with automatic backup on the other. And yes, certainly those disks can give out as well -- but note that when you actually DO use them for backup only, chances of that happening at the exact moment that the source in your machine also gives out are really slim; you'll still have a valid copy and can replace the failing disk at your leisure. With RAID1/5 you moreover haven automatic second backup...

I tend to advise Synology or Netgear ReadyNAS; Synology is possibly best and personally I'm fond of ReadyNAS. Yes, they cost money and so do the drives you need to stick in. But they're quite convenient.

jackcq · Post by **jackcq** » Mon Apr 09, 2018 6:39 pm

rene wrote: ⤴Mon Apr 09, 2018 4:19 pm As to backing up: although it can be a bit of an investment, I'd advise a NAS. With a 2-bay one you have in the form of RAID0 versus RAID1 (or 5) the choice between two-disk capacity or one-disk capacity with automatic backup on the other. And yes, certainly those disks can give out as well -- but note that when you actually DO use them for backup only, chances of that happening at the exact moment that the source in your machine also gives out are really slim; you'll still have a valid copy and can replace the failing disk at your leisure. With RAID1/5 you moreover haven automatic second backup...

I tend to advise Synology or Netgear ReadyNAS; Synology is possibly best and personally I'm fond of ReadyNAS. Yes, they cost money and so do the drives you need to stick in. But they're quite convenient.

I already have a NAS, but it's more like my file server. To share documents and pictures with the windows pcs in the basement office. It's a dual bay Synology model with two 2TB disks running in mirror raid mode. I occasionally back up critical directories on BD-R or an external USB drive, but in general, I hope the mirror raid is pretty safe already for the reasons you already pointed out. I also made sure to use a to standard linux software raid and not the proprietry Synology variant so I can, if all else fails, mount the disks on my pc manually. Setting up another NAS with severall TBs storage capacity would be too expensive.

EDIT: Hmm, correct me if I'm wrong but I think it was also you, who helped me figure out my problem with the automounter indirectly also involving my NAS. Always the same helpful people here

jackcq · Post by **jackcq** » Wed Apr 11, 2018 3:34 pm

rene wrote: ⤴Sat Apr 07, 2018 7:01 pm As to your last point: I expect you wouldn't have issues with plain dd: you mentioned a badblocks run finding everything hunky-dory. But feel free to use ddrescue. Certainly it's useful on actually damaged disks.

My new SDD arrived today, but I can't quite figure out how to clone the disk. My problem is that the ADATA drive has an odd 256GB size, while the new one is 250GB - 6 GB smaller. I figured I'd just copy everything with dd anyway (dd if=/dev/sda of=/dev/sdb bs=100M - sda being the old drive, sdb the new one). Then I booted into a mint live disk deleted the incomplete extended partition (basically I deleted and recreated everything behind the win7 partitions). Then I shrank the win7 NTFS by some 10GB and recreated everything behind it with exactly the same UID and volume labels and copied over the contents. The result is win7 works as usual, but Mint takes an awful long time to boot obviously still trying to boot the partitions with the layout of the old disk. I'm not sure how I can fix the boot block or whatever is causing the boot delay. I tried update-grub, but that didn't help.

Here's the layout of the old disk as shown with fdisk:

isk /dev/sda: 238,5 GiB, 256060514304 bytes, 500118192 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xfa9aa21f

Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 206847 204800 100M 7 HPFS/NTFS/exFAT
/dev/sda2 206848 389525503 389318656 185,7G 7 HPFS/NTFS/exFAT
/dev/sda3 389525504 390502399 976896 477M 83 Linux
/dev/sda4 390504446 500117503 109613058 52,3G 5 Extended
/dev/sda5 390504448 421752831 31248384 14,9G 82 Linux swap / Solaris
/dev/sda6 421754880 500117503 78362624 37,4G 83 Linux

And here's the new one.

Disk /dev/sda: 232,9 GiB, 250059350016 bytes, 488397168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0xfa9aa21f

Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 206847 204800 100M 7 HPFS/NTFS/exFAT
/dev/sda2 206848 368846847 368640000 175,8G 7 HPFS/NTFS/exFAT
/dev/sda3 368846848 369870847 1024000 500M 83 Linux
/dev/sda4 369870848 488397167 118526320 56,5G 5 Extended
/dev/sda5 369872896 401330175 31457280 15G 83 Linux
/dev/sda6 401332224 488397167 87064944 41,5G 83 Linux

Basically the boot sector and the first partiton is exactly as copied by dd, the second partition is slightly shrunk and the rest is recreated, reformatted and the data copied over. If all else fails, I would probably have to reinstall Mint. I'd really like to avoid that.

rene · Post by **rene** » Wed Apr 11, 2018 3:49 pm

Unfortunately I have very little time so I hope someone else can step in if this becomes involved... but certainly one difference I see is that your swap partition used to be of the correct type 82 and is now of type 83. If the UUID for the swap partition DOES in fact match this shouldn't I believe in fact matter, but your symptom matches a non-found (swap) partition; browse though sudo journalctl to see what's the issue.

To change the type of /dev/sda5: sudo fdisk /dev/sda, t, 5, 82.

You didn't post the UUID's: sudo blkid when booted into the Live system with both old and new SSD's present will show them all for easy comparison.

As said; if not enough I hope someone else can step in: I'm off for a bit...

rene · Post by **rene** » Wed Apr 11, 2018 5:13 pm

A possibly quicker/easier options might btw be to fully shrink down part6 on the old SDD(with gparted, dd the entire SSD over again, re-expand part6 again to fill the new SSD.

jackcq · Post by **jackcq** » Wed Apr 11, 2018 5:37 pm

rene wrote: ⤴Wed Apr 11, 2018 3:49 pm Unfortunately I have very little time so I hope someone else can step in if this becomes involved... but certainly one difference I see is that your swap partition used to be of the correct type 82 and is now of type 83. If the UUID for the swap partition DOES in fact match this shouldn't I believe in fact matter, but your symptom matches a non-found (swap) partition; browse though sudo journalctl to see what's the issue.

Damn, you're good.

Of course, I totally forgot about that swap partition. I forgot to toggle it and I forgot to append the correct UID when formatting it (if I even formatted it at all). After doing that , it seems to work just fine. I'm already writing this from a system booted completely from the new disk.

rene wrote: ⤴Wed Apr 11, 2018 5:13 pm A possibly quicker/easier options might btw be to fully shrink down part6 on the old SDD(with gparted, dd the entire SSD over again, re-expand part6 again to fill the new SSD.

Yeah that might have worked, if space weren't a bit tight already on the root filesystem, there isn't much to shrink on part 6, it's like 70% full due to holding a fully operational Mint 18.2 system. Back then when I created that partition layout, I wanted something to give maximum space to win7 while still having my entire Mint on SSD with only the directories for big data like games, videos and stuff on the HDD. If I have to squeeze it by another 6GB, I'd rather take that space from Windows. Also, the whole point was to avoid changing anything on that old SSD (in case it's less stable than expected) until everything's set.

jackcq · Post by **jackcq** » Fri Apr 13, 2018 5:29 pm

rene wrote: ⤴Sat Apr 07, 2018 9:54 am
Mentioned "firmware-level reformatting" -- which I would do -- would of course necessitate you reinstalling the system on the SSD. You'd boot into a Live system from USB or DVD and run
Code: Select all
sudo hdparm --security-set-pass password /dev/sdz
sudo hdparm --security-erase password /dev/sdz
in which of course /dev/sdz is to be replaced with the correct one for the SSD: the second command erases the SSD fully and will disable the password again.
I don't believe that the Reallocated Block count SMART value will ever in fact decrease again but you'll likely find that it stops increasing.

Okay, after I successfully cloning the SSD, I put away the new SSD, reinstalled the old one and the new 4TB HDD and I tried the "firmware-level reformatting " you suggested: For some reason it's not working. The second command (hdparm --security-erase password /dev/sda) finishes in less than 5 seconds for the whole 256 GB drive and when I call fdisk the partiton table is the same as before. I think it's safe to assume, that nothing got formatted/erased at all. I wrote back the disk.img with dd anyway, but as expected the stats of the drive didn't change (currently at 7 bad sectors).

I've also tried to copy back data from the old HDD to the new one, which turns out to be quite a challenge. The drive behaves very strangely. Not only is it odd that it's status on smartctrl changed so drastically from healthy to failing, but also reading data from the drive slows down randomly, then speeds up again. Sometimes copying puts so much stress on the bus, that the whole system freezes. Then when I restart the computer and copy the same files again, it works without problems. It all seems fairly random.

rene · Post by **rene** » Sat Apr 14, 2018 2:19 pm

Have experienced nor ever heard of anything of the kind with the secure erase, on SSD nor HDD, SATA nor PATA -- and am unfortunately also coming up blank trying to think of any possible reason. The "5 seconds" is not unheard of; my Samsung 850 EVO 250G does about the same and certainly the hdparm --security-erase does in fact zero it out fully. Only thing I could come up is of the exceedingly silly kind; you not running the hdparm and the fdisk on the same disk.

But can not, then, say anything sensible at this point. Might be cause for a forum-wide poll/call for experience with a non-zeroing secure erase. And/or: can you hook up the old SSD as an extra and run the ADATA tool from the Windows 7 partition on the new one against the old SSD? It mentioned to also support secure erase and it would in any case be interesting to see if it has something to say about the old SSD.

jackcq · Post by **jackcq** » Sat Apr 14, 2018 7:58 pm

rene wrote: ⤴Sat Apr 14, 2018 2:19 pm Have experienced nor ever heard of anything of the kind with the secure erase, on SSD nor HDD, SATA nor PATA -- and am unfortunately also coming up blank trying to think of any possible reason. The "5 seconds" is not unheard of; my Samsung 850 EVO 250G does about the same and certainly the hdparm --security-erase does in fact zero it out fully. Only thing I could come up is of the exceedingly silly kind; you not running the hdparm and the fdisk on the same disk.

Today I decided to do that secure erase with that 2TB Hitachi HDD, as any more attempts to copy and rescue data seemed futile. It took like 6-8 hours and afterwards the partition table was clean. It didn't help with the smartctl assessment of the drive, or the stress/lag it obviously produces on the whole SATA bus even when it's just attached as an extra. Just checking the partition table with fdisk takes like almost a minute now. I'll call the vendor on Monday and see if there's any chance I get a replacement on warranty, it's less than a year old after all and shouldn't have failed me like that. And no, I did not run hdparm and fdisk on different disks, the partition tables look very different and it was clearly the one belonging to the SSD shown on fdisk after the secure erase.

rene wrote: ⤴Sat Apr 14, 2018 2:19 pm But can not, then, say anything sensible at this point. Might be cause for a forum-wide poll/call for experience with a non-zeroing secure erase. And/or: can you hook up the old SSD as an extra and run the ADATA tool from the Windows 7 partition on the new one against the old SSD? It mentioned to also support secure erase and it would in any case be interesting to see if it has something to say about the old SSD.

That's actually a good idea. I was wondering how I would even run a secure erase on Windows when it's booted from the same disk you try to erase - seems like booting from the new SSD is the answer to that. I'll try that tomorrow.

Linux Mint Forums

Bad Sectors on SSD?

Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?

Re: Bad Sectors on SSD?