Fixing RAID5 array after replacing a disk on QNAP TS-453A

I have a QNAP TS-453A NAS with 4 6TB disks configured as a RAID5 storage pool. Currently with firmware 4.4.2.1262.

Yesterday I discovered that one of the disks (disk 3) had failed, and that the storage pool was in a "Degraded" state.

I quickly got a new disk. The old disk was a WD Red 6TB WD60EFRX, the new one is a WD Red 6TB  WD60AFEX. I hot-swapped the disks. According to the documentation, the new disk should be detected automatically, and the storage pool should automatically start rebuilding ("Rebuilding" state). But nothing happened.

I checked the UI, Storage & Snapshots tool. The storage pool was still in degraded state, but all four disks were now green and healthy. However, disk 3 was listed as "not a member" of the storage pool. When I selected to Manage the pool, I could do nothing. The only action that was not disabled was "Rebuild RAID Group", but when I tried that there were no free disks to add to the RAID group.

So the problem appeared to be that disk 3 had been detected and was in use, but still it was listed as "not a member" of the storage pool. No actions were available in the UI to fix the situation. Pulling out the disk and inserting it again did not change anything. Googling for help showed that others have encountered similar situations, but no solutions helped me.

I decided to have look "under the hood" to see if I could figure out what was wrong.

ssh admin@mynas
[~] # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md1 : active raid5 sda3[0] sdd3[3] sdb3[1]
      17551701504 blocks super 1.0 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]

md256 : active raid1 sdc2[2](S) sdd2[3](S) sdb2[1] sda2[0]
      530112 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md13 : active raid1 sdc4[24] sda4[0] sdd4[3] sdb4[1]
      458880 blocks super 1.0 [24/4] [UUUU____________________]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md9 : active raid1 sdc1[24] sda1[0] sdd1[3] sdb1[1]
      530048 blocks super 1.0 [24/4] [UUUU____________________]
      bitmap: 1/1 pages [4KB], 65536KB chunk

OK, so /dev/md1 is my RAID5 storage pool. Only /dev/sda3, /dev/sdb3 and /dev/sdd3 are part of the group. /dev/sdc3 is missing. Let's check the group:

[~] # mdadm --misc --detail /dev/md1
/dev/md1:
        Version : 1.0
  Creation Time : Tue Aug 23 05:48:30 2016
     Raid Level : raid5
     Array Size : 17551701504 (16738.61 GiB 17972.94 GB)
  Used Dev Size : 5850567168 (5579.54 GiB 5990.98 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Sat Apr  4 18:10:54 2020
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : 1
           UUID : f82504b7:2c60d9bd:5676ec84:0a5ba214
         Events : 27378

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       4       0        0        4      removed
       3       8       51        3      active sync   /dev/sdd3

OK, so 3 active devices, and /dev/sdc3 is missing, as expected. Let's check if the disk exists and is formatted like the other disks:

[~] # parted
GNU Parted 3.1
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.

(parted) select /dev/sda
select /dev/sda
Using /dev/sda
(parted) print
print
Model: WDC WD60EFRX-68L0BN1 (scsi)
Disk /dev/sda: 6001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system     Name     Flags
 1      20.5kB  543MB   543MB   ext3            primary
 2      543MB   1086MB  543MB   linux-swap(v1)  primary
 3      1086MB  5992GB  5991GB                  primary
 4      5992GB  5993GB  543MB   ext3            primary
 5      5993GB  6001GB  8554MB  linux-swap(v1)  primary

(parted) select /dev/sdc
select /dev/sdc
Using /dev/sdc
(parted) print
print
Model: WDC WD60EFAX-68SHWN0 (scsi)
Disk /dev/sdc: 6001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name     Flags
 1      20.5kB  543MB   543MB   ext3         primary
 2      543MB   1086MB  543MB                primary
 3      1086MB  5992GB  5991GB               primary
 4      5992GB  5993GB  543MB   ext3         primary
 5      5993GB  6001GB  8554MB               primary

(parted) quit
quit


OK, so the new disk seems to be formatted correctly. Let's just try to add the missing disk partition to the RAID group:

[~] # mdadm --manage /dev/md1 --add /dev/sdc3
mdadm: added /dev/sdc3
[~] # mdadm --misc --detail /dev/md1
/dev/md1:
        Version : 1.0
  Creation Time : Tue Aug 23 05:48:30 2016
     Raid Level : raid5
     Array Size : 17551701504 (16738.61 GiB 17972.94 GB)
  Used Dev Size : 5850567168 (5579.54 GiB 5990.98 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Sat Apr  4 18:18:17 2020
          State : active, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

 Rebuild Status : 0% complete

           Name : 1
           UUID : f82504b7:2c60d9bd:5676ec84:0a5ba214
         Events : 27846

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       4       8       35        2      spare rebuilding   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3


Great! The RAID group is recovering! The NAS emitted two beeps and the status light started blinking red/green. In the UI, the storage pool state changed to "Rebuilding".

For some reason the NAS did not correctly add the /dev/sdc3 disk partition to the storage pool. The disk had been correctly partitioned and the partitions formatted, and the other RAID arrays had apparently recovered, but not /dev/md1. Adding /dev/sdc3 manually to /dev/md1 fixed the problem.

QNAP storage pool rebuilding

 One more thing: it looks like /etc/config/mdadm.conf and /etc/config/raidtab are missing. /etc/mdadm.conf and /etc/raidtab existed as symbolic links to the non-existent files. I'm not sure that they are needed, but as a precaution I created them. mdadm.conf is created like this:

[~] # mdadm --detail -scan >> /etc/config/mdadm.conf

and this is the content of raidtab:

[~] # cat /etc/config/raidtab
raiddev /dev/md1
        raid-level      5
        nr-raid-disks   4
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/sda3
        raid-disk       0
        device  /dev/sdb3
        raid-disk       1
        device  /dev/sdc3
        raid-disk       2
        device  /dev/sdd3
        raid-disk       3

 

 

Permalink

I meet the same problem with my raid 1 (two ssd). When I restart my ts551, one ssd failed. The light is on and the system can read notice this disk, but raise "not a member" error when i try to rebuild raid 1. Neither can i erase it. Then I followed these instructions and it worked!
Thanks for your blog!!!

I had all the same issus as described in the post on QNAP 431P-1. One of the disks (disk 2) had failed, that the storage pool was in a "Degraded" state. Disk 2 shoud have been ok, SMART info all ok. Ran a scan for bad blocks, overnight. Came back ok & disk was green. However the raid would not rebuild & that there were no free disks to add to the RAID group. I would have followed the instructions here, but I was unable to configure SSH on in step 3 below - Unable to get a browser connection to enable SSH. By the time I followed steps 4 through 6 it had started rebuilding.

1) I powered off the NAS.
2) pulled all disks out of the NAS
3) Powered on the NAS & waited about 10 to 15 minutes (maybe could be done more quickly)
4) I powered off the NAS.
5) pushed all disks into the NAS
6) Powered on the NAS. Status flashed red & green for a few minutes.

On logging into the NAS it is now rebuilding the RAID.

I'll watch the disk to see if it fails again, in which case I'll have a NEW replacement to hand.

Same issue as well, tried another stuff with "frozen" state of Raid disk but no disk were "frozen"...

Following this procedure simply worked fine for me and now my Raid disk is rebuiliding ;-)

QNAP support helped to get my degraded RAID 5 array back available online, but didn't fix the replacement disk showing as not being a member.
Your information has been excellent, allowing me to add the disk back in as described.
Sincerely grateful for the clear description. All the best, Damian

Permalink

Hi,

Thanks for this tutorial!

I had the same issue with raid building. My Disk3 failed and I swapped it with a new drive. But the rebuilding is not happening. Now my second disk is already in a warning condition. So I am trying hard to get this raid rebuild with the Disk3.

I followed the steps in this tutorial. But the below error pops up.
mdadm --manage /dev/md1 --add /dev/sdc3
mdadm: /dev/md1 has failed so using --add cannot work and might destroy
mdadm: data on /dev/sdc3. You should stop the array and re-assemble it.

[/] # mdadm --misc --detail /dev/md1
/dev/md1:
Version : 1.0
Creation Time : Wed Jan 20 00:45:40 2016
Raid Level : raid5
Array Size : 11691190848 (11149.59 GiB 11971.78 GB)
Used Dev Size : 3897063616 (3716.53 GiB 3990.59 GB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent

Update Time : Wed Apr 26 10:59:13 2023
State : active, FAILED, Rescue
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

Name : 1
UUID : a378f780:be36cd12:3aec18ed:36d7afcb
Events : 81301

Number Major Minor RaidDevice State
5 8 19 0 active sync /dev/sdb3
1 8 51 1 active sync /dev/sdd3
4 8 3 2 faulty /dev/sda3
6 0 0 6 removed

Any fix is appreciated!

Regards,
Amk

(Sorry, I don't check my website that often. ) That doesn't look good. A RAID5 array can recover if one disk fails, but not two. Unless you can get /dev/sda3 to join the array again (maybe it's a false positive and you can get it working again?), I don't see how the array can recover.
Permalink

I encountered the same problem as described above and executes all steps indicated in ssh window. All results are the same apart from the fact it concerns my sda disk instead of sdc as in the example. When trying to add the sda disk I am gettin the following error message:
mdadm --manage /dev/md1 --add /dev/sda3
mdadm: add new device failed for /dev/sda3 as 4: Invalid argument

Can you help me with this?

I'm just guessing here, but: RAID10 is different from RAID5. It's a combination of mirroring and striping. First, you create two md devices, each with two mirrored disks. Then, you create a third md device which is a stripe of the two mirrored md devices. So I would guess you have three md devices, e.g. /dev/md1, /dev/md2, and /dev/md3. You need to check which md devices there are, and which one of these has a failed disk, and then add your new disk to the failed md device.