Fixing RAID5 array after replacing a disk on QNAP TS-453A

Submitted by mhy on Sat, 04/04/2020 - 19:36

I have a QNAP TS-453A NAS with 4 6TB disks configured as a RAID5 storage pool. Currently with firmware 4.4.2.1262.

Yesterday I discovered that one of the disks (disk 3) had failed, and that the storage pool was in a "Degraded" state.

I quickly got a new disk. The old disk was a WD Red 6TB WD60EFRX, the new one is a WD Red 6TB  WD60AFEX. I hot-swapped the disks. According to the documentation, the new disk should be detected automatically, and the storage pool should automatically start rebuilding ("Rebuilding" state). But nothing happened.

I checked the UI, Storage & Snapshots tool. The storage pool was still in degraded state, but all four disks were now green and healthy. However, disk 3 was listed as "not a member" of the storage pool. When I selected to Manage the pool, I could do nothing. The only action that was not disabled was "Rebuild RAID Group", but when I tried that there were no free disks to add to the RAID group.

So the problem appeared to be that disk 3 had been detected and was in use, but still it was listed as "not a member" of the storage pool. No actions were available in the UI to fix the situation. Pulling out the disk and inserting it again did not change anything. Googling for help showed that others have encountered similar situations, but no solutions helped me.

I decided to have look "under the hood" to see if I could figure out what was wrong.

ssh admin@mynas
[~] # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md1 : active raid5 sda3[0] sdd3[3] sdb3[1]
      17551701504 blocks super 1.0 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]

md256 : active raid1 sdc2[2](S) sdd2[3](S) sdb2[1] sda2[0]
      530112 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md13 : active raid1 sdc4[24] sda4[0] sdd4[3] sdb4[1]
      458880 blocks super 1.0 [24/4] [UUUU____________________]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md9 : active raid1 sdc1[24] sda1[0] sdd1[3] sdb1[1]
      530048 blocks super 1.0 [24/4] [UUUU____________________]
      bitmap: 1/1 pages [4KB], 65536KB chunk

OK, so /dev/md1 is my RAID5 storage pool. Only /dev/sda3, /dev/sdb3 and /dev/sdd3 are part of the group. /dev/sd3c is missing. Let's check the group:

[~] # mdadm --misc --detail /dev/md1
/dev/md1:
        Version : 1.0
  Creation Time : Tue Aug 23 05:48:30 2016
     Raid Level : raid5
     Array Size : 17551701504 (16738.61 GiB 17972.94 GB)
  Used Dev Size : 5850567168 (5579.54 GiB 5990.98 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Sat Apr  4 18:10:54 2020
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : 1
           UUID : f82504b7:2c60d9bd:5676ec84:0a5ba214
         Events : 27378

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       4       0        0        4      removed
       3       8       51        3      active sync   /dev/sdd3

OK, so 3 active devices, and /dev/sd3c is missing, as expected. Let's check if the disk exists and is formatted like the other disks:

[~] # parted
GNU Parted 3.1
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.

(parted) select /dev/sda
select /dev/sda
Using /dev/sda
(parted) print
print
Model: WDC WD60EFRX-68L0BN1 (scsi)
Disk /dev/sda: 6001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system     Name     Flags
 1      20.5kB  543MB   543MB   ext3            primary
 2      543MB   1086MB  543MB   linux-swap(v1)  primary
 3      1086MB  5992GB  5991GB                  primary
 4      5992GB  5993GB  543MB   ext3            primary
 5      5993GB  6001GB  8554MB  linux-swap(v1)  primary

(parted) select /dev/sdc
select /dev/sdc
Using /dev/sdc
(parted) print
print
Model: WDC WD60EFAX-68SHWN0 (scsi)
Disk /dev/sdc: 6001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name     Flags
 1      20.5kB  543MB   543MB   ext3         primary
 2      543MB   1086MB  543MB                primary
 3      1086MB  5992GB  5991GB               primary
 4      5992GB  5993GB  543MB   ext3         primary
 5      5993GB  6001GB  8554MB               primary

(parted) quit
quit


OK, so the new disk seems to be formatted correctly. Let's just try to add the missing disk partition to the RAID group:

[~] # mdadm --manage /dev/md1 --add /dev/sdc3
mdadm: added /dev/sdc3
[~] # mdadm --misc --detail /dev/md1
/dev/md1:
        Version : 1.0
  Creation Time : Tue Aug 23 05:48:30 2016
     Raid Level : raid5
     Array Size : 17551701504 (16738.61 GiB 17972.94 GB)
  Used Dev Size : 5850567168 (5579.54 GiB 5990.98 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Sat Apr  4 18:18:17 2020
          State : active, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

 Rebuild Status : 0% complete

           Name : 1
           UUID : f82504b7:2c60d9bd:5676ec84:0a5ba214
         Events : 27846

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       4       8       35        2      spare rebuilding   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3


Great! The RAID group is recovering! The NAS emitted two beeps and the status light started blinking red/green. In the UI, the storage pool state changed to "Rebuilding".

For some reason the NAS did not correctly add the /dev/sd3c disk partition to the storage pool. The disk had been correctly partitioned and the partitions formatted, and the other RAID arrays had apparently recovered, but not /dev/md1. Adding /dev/sd3c manually to /dev/md1 fixed the problem.

QNAP storage pool rebuilding

 One more thing: it looks like /etc/config/mdadm.conf and /etc/config/raidtab are missing. /etc/mdadm.conf and /etc/raidtab existed as symbolic links to the non-existent files. I'm not sure that they are needed, but as a precaution I created them. mdadm.conf is created like this:

[~] # mdadm --detail -scan >> /etc/config/mdadm.conf

and this is the content of raidtab:

[~] # cat /etc/config/raidtab
raiddev /dev/md1
        raid-level      5
        nr-raid-disks   4
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/sda3
        raid-disk       0
        device  /dev/sdb3
        raid-disk       1
        device  /dev/sdc3
        raid-disk       2
        device  /dev/sdd3
        raid-disk       3