[Solved-hopefully]Hard Disk error?

Questions about hardware, drivers and peripherals
Forum rules
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
Locked
Garmine

[Solved-hopefully]Hard Disk error?

Post by Garmine »

Hello!

I've been using Linux Mint for about 3 months now without any problems (except for fglrx... :)) but now I've found a great problem, however, I'm not sure if it's Mint's fault or not.

I have a new 2 TB Western Digital Caviar Green HDD (wd20earx), it's 3 months old just like my Mint installation. It used to have MBR and a single 2 TB NTFS partition. Yesterday when I woke up I found nothing on it, after looking at it with HD Sentinel (from a Windows) it said that it's half-formatted (and with full format, not with quick!) with ext4 which happened the previous day at ~20:30. However, at that time the only running thing was Linux Mint and a Chromium (without root rights of course) browser. I didn't notice a single thing about it neither a window asking "is it okay if I kill your HDD?" or any suspicious sounds of heavy disk-usage due to formatting. The HDD's SMART looks 98% fine too. At (Oct 29) 21:15:38 I've shut down the system. The HDD was mounted the whole day just like for 3 months before where nothing wrong happened. I used it that day too and as I remember I used it (write+read too) at ~20:00 last time that day.

I also can't find any error message related to this, only one from ~17:45, from syslog:

Oct 29 17:44:17 MikiMint kernel: [21655.808037] ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen
Oct 29 17:44:17 MikiMint kernel: [21655.808042] ata5: SError: { RecovComm PHYRdyChg CommWake DevExch }
Oct 29 17:44:17 MikiMint kernel: [21655.808044] ata5.00: failed command: SMART
Oct 29 17:44:17 MikiMint kernel: [21655.808048] ata5.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
Oct 29 17:44:17 MikiMint kernel: [21655.808049] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Oct 29 17:44:17 MikiMint kernel: [21655.808051] ata5.00: status: { DRDY }
Oct 29 17:44:17 MikiMint kernel: [21655.808058] ata5: hard resetting link
Oct 29 17:44:18 MikiMint kernel: [21656.688120] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 29 17:44:18 MikiMint kernel: [21656.745126] ata5.00: configured for UDMA/133
Oct 29 17:44:18 MikiMint kernel: [21656.745150] ata5: EH complete

Smart says ~20 days online, and ~500 read errors, the only other non-zero values are power on/off related.

Luckily 99% of the data was backed up, so I was lucky this time but I'm afraid that it might happen again...

Is it possible that Mint noticed a serious HDD error and then tried to fix it without asking(!) first?
Any ideas why it happened?
Also, what other log files, data, etc. should I attach?

Thanks in advance!
Garmine

P.s.:
I've checked my other HDDs' SMART:
A 1.2 yrs runned 1 TB says 1 read errors,
while the other, 1 yrs runned 80GB says 0 read errors,
but the new has 500 read errors in only 20 days!

Please say that I'm wrong but then it's an HDD fault, isn't it? :S
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 2 times in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
eanfrid

Re: Hard Disk error?

Post by eanfrid »

It a hardware error, either on the HDD SATA micro-controller or the power-supply interface or a bad/faulty SATA cable connection.
Garmine

Re: Hard Disk error?

Post by Garmine »

Thanks!

So I've changed the SATA and Power Supply cables. I also plugged the SATA to another jack. But I still don't get why HD Sentinel showed the destructed partition as "ext4". That's why Mint is suspicious too. I don't think so that my motherboard or power supply is faulty since that would make errors in other places (e.g. in the other drives) but there's no sign of it.

The inside of the machine was clean too, only a minimal dust was there (I clean it regularly).
But can a faulty cable kill the whole partition (especially like this) ?
eanfrid

Re: Hard Disk error?

Post by eanfrid »

Code: Select all

Oct 29 17:44:17 MikiMint kernel: [21655.808049] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Oct 29 17:44:17 MikiMint kernel: [21655.808051] ata5.00: status: { DRDY }
Oct 29 17:44:17 MikiMint kernel: [21655.808058] ata5: hard resetting link
SMART errors and ATA bus errors are not OS-related. Maybe the MBR got corrupted or damaged. I don't know about HDD Sentinel but it can also issue wrong infos if it can't properly read basic HDD/MBR parameters on a faulty drive.

What gives "smartctl -A /dev/sdX" ?

Also I don't really understand this part
It used to have MBR and a single 2 TB NTFS partition. Yesterday when I woke up I found nothing on it, after looking at it with HD Sentinel (from a Windows) it said that it's half-formatted (and with full format, not with quick!) with ext4 which happened the previous day at ~20:30.
What is now the actual volume format: NTFS or EXT4 ?
Garmine

Re: Hard Disk error?

Post by Garmine »

Thank you!
eanfrid wrote:
It used to have MBR and a single 2 TB NTFS partition. Yesterday when I woke up I found nothing on it, after looking at it with HD Sentinel (from a Windows) it said that it's half-formatted (and with full format, not with quick!) with ext4 which happened the previous day at ~20:30.
What is now the actual volume format: NTFS or EXT4 ?
It had a single NTFS partition before.
After the error it was unusable (with anything), however, the prog said it looks like an unfinished EXT4. But probably you're right and it didn't recognize the corrupted partition table and that's why it showed like that.
Yesterday I've reformatted the whole disk to NTFS, after changing the cables. I also copied the data from the old drive again (~800 GB), and it didn't increase the number of read errors. So it looks fine for now.
So it's NTFS right now.

smartcl output:

Code: Select all

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   158   155   051    Pre-fail  Always       -       502
  3 Spin_Up_Time            0x0027   165   163   021    Pre-fail  Always       -       6741
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       181
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       484
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       159
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       47
193 Load_Cycle_Count        0x0032   198   198   000    Old_age   Always       -       8034
194 Temperature_Celsius     0x0022   124   106   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
eanfrid

Re: Hard Disk error?

Post by eanfrid »

Yep, seems to be fine again. But keep an eye on SMART and ATA-bus errors: hardware (even brand new) can fail at any time... Linux is quite good at monitoring these hardware errors and does its best to make faulty hardware work. You can also install smart-notifier and gsmartcontrol (smartctl GUI) if you wish ;)
Garmine

Re: Hard Disk error?

Post by Garmine »

eanfrid wrote:Yep, seems to be fine again. But keep an eye on SMART and ATA-bus errors: hardware (even brand new) can fail at any time... Linux is quite good at monitoring these hardware errors and does its best to make faulty hardware work. You can also install smart-notifier and gsmartcontrol (smartctl GUI) if you wish ;)
Thank you! :)
Also, updated the topic: "[Solved-hopefully]" :)
Locked

Return to “Hardware Support”