RAID 5 died after reboot

All Gurus once were Newbies
Forum rules
There are no such things as "stupid" questions. However if you think your question is a bit stupid, then this is the right place for you to post it. Please stick to easy to-the-point questions that you feel people can answer fast. For long and complicated questions prefer the other forums within the support section.
Before you post please read how to get help
Post Reply
giantcat
Level 1
Level 1
Posts: 5
Joined: Thu Dec 27, 2012 3:44 pm

RAID 5 died after reboot

Post by giantcat »

Hello all. We want a 3x8TB RAID 5 16TB space. Due to budget, we had to do it incrementally. I'll include the steps I followed below.

The RAID was complete and functional. I rebooted the machine, and fstab attempting to mount the desired directory killed the boot. I was able to edit fstab to remove auto-mounting the RAID, which allowed normal startup. Using the disk manager, I see:
sda1: partition type Linux RAID, 8.0TB unknown
sdb1: partition type Linux RAID, 8.0TB unknown
sdc1: partition type Linux RAID, 8.0TB Linux RAID member
(sdd1 is the system SSD)
md0: Block device is empty

I'm learning, so I don't know if maybe mdadm conf didn't get written, is it possible to restore the RAID without losing contents, help! :)

Here are the steps I followed over a six month period:
---
1) Prepare disk
fdisk /dev/sdx
to create partition sdx1
Choose 'Linux raid autodetect' for Type

2) Create degraded array
mdadm --create /dev/md0 --level=1 --raid-devices=2 missing /dev/sdx1

3) Make file system
mkfs.ext4 /dev/md0

4) Mount and use
mkdir /mnt/Entertainment
mount /dev/md0 /mnt/Entertainment
chmod 777 /mnt/Entertainment
---
after obtaining second drive
---
5) Prepare new disk
fdisk /dev/sdx

6) Add new partition
mdadm /dev/md0 -a /dev/sdx1

7) Check status of operation
cat /proc/mdstat
---
after obtaining third drive
---
8) Prepare new disk
fdisk /dev/sdx

9) Expand array with third disk
mdadm --grow /dev/md0 --level=5
mdadm --grow /dev/md0 --add /dev/sdx1 --raid-devices=3

10) Expand available space
mdadm --grow /dev/md0 --size=max
e2fsck -f /dev/md0
*wait*
resize2fs /dev/md0

Kadaitcha Man
Level 5
Level 5
Posts: 914
Joined: Mon Aug 27, 2012 10:17 pm

Re: RAID 5 died after reboot

Post by Kadaitcha Man »

giantcat wrote:
Thu Mar 26, 2020 12:48 pm
I'm learning, so I don't know if maybe mdadm conf didn't get written, is it possible to restore the RAID without losing contents, help! :)
I have no idea why you attempted to create a degraded array from scratch and assumed it would 'just work' when you tried to extend it. It looks as if sda and sdb don't know that they're part of the RAID set. Troubleshooting what you've done isn't going to be straightforward because when I look through the steps you've used, I think you've done a lot of things that simply aren't kosher. It takes only one terminal command to create the array, the remaining commands are to check the status of the array and set up the mount points but you've done this over a six month period, and you borked your boot in the bargain. Over-eager haste may be the root cause here, not mdadm.

I know that you said budget is a constraint, but really, with 16TB of data, software RAID alone isn't going to cut it. Any kind of software or fakeRAID (BIOS RAID) needs hardware or one or more large independent disk backups because a single wayward interstellar cosmic particle can drop your software RAID down a bottomless bitbucket.

I think you have two main choices, and neither of them are going to get you out of your pickle without destroying whatever data you may have in the array. The first option is to kill the array completely and rebuild it from scratch as a single RAID set, then restore your data from backup but this is a defective approach because it doesn't reduce the risk of any of these problems occurring in the future. The second option is, by far, the best option. Go to ebay and search for adaptec 6805T. You can pick up a brand new hardware RAID card for well under $US50. You will also need an SFF-8087 SAS to SATA cable. You can add a global hot spare when you can afford it.

https://raid.wiki.kernel.org/index.php/ ... nd_testing

That link, and others like it, are your only other option.

After all of that, you may also want to reconsider RAID5. If one of those three drives goes toes up, your read bandwidth will be so compromised that opening a small text file will be like wading through treacle toffee.

I wish I had better news for you but I don't. I run multiple hardware RAID sets on multiple servers because one RAID set with a hot spare isn't sufficient to reduce the risk of data loss. The servers are in a backup chain, feeding from a very large software (mdadm 24TB striped RAID 0) RAID set served to the network. One hardware RAID server backs up the software RAID daily. Another hardware RAID server takes a signal from the first hardware RAID server and backs up the first hardware RAID server. Periodically, about once a week, a third server backs up the second server to multiple independent disks. Of course, this will be major overkill for some people but it all comes down to this: What value do I place on my data?

If you really must use mdadm in RAID5 then kill the array entirely and start from scratch.
It's pronounced kad-eye-cha, not kada-itcha.

giantcat
Level 1
Level 1
Posts: 5
Joined: Thu Dec 27, 2012 3:44 pm

Re: RAID 5 died after reboot

Post by giantcat »

Hello. I didn't assume it would just work, I did many tests with three drives and three 200GB partitions. Those survived reboots and worked fine.

However, I'm not replying argumentatively. Your reply is very illuminating and helpful.

Our goal is basically a home entertainment server. Thus, would a three disc RAID 5 be sufficient, if one disc goes up, replace and rebuild? If this eventual crisis management solution for a simply home environment won't work anyway, then yes, there's no point. If it would allow us to have some ability to trust the data to survive a single disc failure, then I want to do it.

So now my question to you is, would that be sufficient for the scenario I describe?

Thanks!

Kadaitcha Man
Level 5
Level 5
Posts: 914
Joined: Mon Aug 27, 2012 10:17 pm

Re: RAID 5 died after reboot

Post by Kadaitcha Man »

giantcat wrote:
Fri Mar 27, 2020 11:29 am
So now my question to you is, would that be sufficient for the scenario I describe?
The road-block is obviously budget. I don't trust software RAID on its own for critical files, and I'm only that much less leary of using LVM to create a single volume spanning more than one disk, and that's because should some errant cosmic particle hit your CPU and trigger an immediate reboot, for example, then the software RAID might be left in an unstable state, and you might lose the lot in the bargain. This is why I use hardware RAID (Adaptec 6805T) cards.

I think only you can answer your question, and doing so requires you to place a value on your data. For me, it would pose unacceptable risk of data loss to use software RAID or LVM without some form of backup.

RAID5 will let one disk die and the data would still be readable. RAID 50 offers far more protection but it requires a minimum of six disks. My two servers have 7 disks each, and each set of 7 is plugged into a 6805T card, which supports 8 disks without expanders. Six of the disks are in RAID50. One disk is a global hot spare so that if a drive goes down, the global hot spare will take its place. With this setup, I have no issue using LVM or RAID using mdadm to create one large volume because the servers keep backups of the files shared on the network. Actually, ithe setup more complex than that but you get the idea.

Without a backup of your data, you're up the creek without a paddle if it fails totally, and trust me, it does happen. Redundancy, as in RAID, only offers limited protection.

A different option for you might be to put two drives in RAID0 or create a LVM volume and run a script to rsync the files that you don't want to lose onto the third disk. This would require you to sort your files into critical and non-critical, and to watch how much space the critical files are taking up so that you don't try to overfill your backup drive. I'm not recommending this, I'm rather trying to show that there is more than one option.

As I said, what method you implement all comes down to that one question, "How important is my data?"
It's pronounced kad-eye-cha, not kada-itcha.

Post Reply

Return to “Newbie Questions”